Xpra: Ticket #832: libvpx 1.4 support: add YUV444, etc

See https://groups.google.com/a/webmproject.org/forum/#!topic/codec-devel/2zYWenmdUM8.

Of particular interest to us:

VP9 encodes are faster (tile based multithreading) - see http://wiki.webmproject.org/ffmpeg/vp9-encoding-guide
YUV422 and YUV444

The downloads seem to have moved (again) here: http://downloads.webmproject.org/releases/webm/index.html

Sat, 04 Apr 2015 16:57:33 GMT - Antoine Martin: status changed

status changed from new to assigned

Building on OSX with:

./configure   --enable-vp8 --enable-vp9 \
    --enable-realtime-only --enable-runtime-cpu-detect
    --prefix=${JHBUILD_PREFIX} --target=x86-darwin9-gcc

Gives errors to do with sse2 / ssse3 assembler optimizations (similar to this one: disabled sse2 link errors):

    [LD] test_libvpx
Undefined symbols:
  "_vp9_sub_pixel_variance4x4_ssse3", referenced from:
      _vp9_sub_pixel_variance4x4_ssse3$non_lazy_ptr in variance_test.cc.o
...
  "_vp9_sub_pixel_variance8x4_sse2", referenced from:
      _vp9_sub_pixel_variance8x4_sse2$non_lazy_ptr in variance_test.cc.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make[1]: *** [test_libvpx] Error 1

Removing vp9 support, which isn't very useful anyway, still gives errors on vp8:

    [LD] test_libvpx
Undefined symbols:
  "_vp8_denoiser_filter_uv_c", referenced from:
      (anonymous namespace)::VP8DenoiserTest_BitexactCheck_Test::TestBody()in vp8_denoiser_sse2_test.cc.o
  "_vp8_denoiser_filter_uv_sse2", referenced from:
      (anonymous namespace)::VP8DenoiserTest_BitexactCheck_Test::TestBody()in vp8_denoiser_sse2_test.cc.o
  "_vp8_regular_quantize_b_sse4_1", referenced from:
      _vp8_regular_quantize_b_sse4_1$non_lazy_ptr in quantize_test.cc.o
  "_vpx_internal_error", referenced from:
      VP8_TestBitIO_Test::TestBody()      in vp8_boolcoder_test.cc.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make[1]: *** [test_libvpx] Error 1
make: *** [.DEFAULT] Error 2

And switching from yasm to nasm gives new errors:

[AS] vp8/common/x86/subpixel_mmx.asm.o
vp8/common/x86/subpixel_mmx.asm:233: error: beroset-p-637-invalid effective address
vp8/common/x86/subpixel_mmx.asm:389: error: beroset-p-637-invalid effective address
vp8/common/x86/subpixel_mmx.asm:544: error: beroset-p-637-invalid effective address
make[1]: *** [vp8/common/x86/subpixel_mmx.asm.o] Error 1
make: *** [.DEFAULT] Error 2

I think my 10.5.x build system is just getting too old for this.

Built on win32 (see ticket:440#comment:4) using:

cd libvpx-v1.3.0
./configure \
    --enable-vp8 --enable-vp9 \
    --enable-realtime-only --enable-runtime-cpu-detect \
    --target=x86-win32-vs9
make

Then build using visual studio: open the solution, switch to a "release" build and build the solution. Then copy the Win32\Release\*lib to C:\vpx-1.4\lib\Win32\. I got the rest of the files into C:\vpx-1.4\ (include and bin by doing a mingw gcc build first - but this doesn't generate the lib files.. whereas visual studio does)

r8913 will build against 1.4 if present and fallback to 1.3

Sat, 04 Apr 2015 17:45:33 GMT - Antoine Martin:

I have built beta libvpx-xpra-1.4 packages for centos and fedora, which are drop in replacement for the 1.3 packages.

From reading the docs, it seems that:

VP9E_SET_TILE_COLUMNS enables multithreaded encoding and decoding
rc_min_quantizer = rc_max_quantizer = 0 enables lossless mode
g_profile = 1 enables YUV444? How do we know if it is available? (AFAIK, it is not in 1.3..)

Sun, 05 Apr 2015 09:51:18 GMT - Antoine Martin: attachment set

attachment set to vp9-yuv444p.patch

large work in progress patch: we need to support different colorspaces per encoding, which is an API change..

Sun, 05 Apr 2015 12:43:54 GMT - Antoine Martin:

Despite the fact that the vp9 + YUV444P test passes (it only encodes a single frame), we get errors:

with ffmpeg decoding (avcodec):

dec_avcodec.Decoder({'decoder_height': 24614, 'encoding': 'vp9', 'colorspace': 'YUV444P', \
    'actual_colorspace': 'YUV444P', 'height': 300, 'decoder_width': 38, 'width': 300, \
    'version': (56, 1, 100), 'formats': ['YUV420P', 'YUV422P', 'YUV444P'], \
    'frames': 0L, 'type': 'avcodec', 'buffers': 0})\
    .decompress_image(<type 'str'>:1065, {'speed': 33, 'frame': 1, 'quality': 99, \
        'csc': 'YUV444P', 'encoding': 'vp9'}) \
        avcodec_decode_video2 failure: Invalid data found when processing input

with the vpx native decoder:

paint_with_video_decoder: wid=2, vp9 decompression error on 250 bytes of picture data \
    for 300x300 pixels using vpx.Decoder(vp9), \
    options={'speed': 33, 'frame': 2, 'quality': 99, 'csc': 'YUV444P', 'encoding': 'vp9'}
error during vpx_codec_decode: Unspecified internal error

Only shows that the tests need to be improved..

Tue, 07 Apr 2015 12:56:15 GMT - Antoine Martin: owner, priority, status changed

owner changed from Antoine Martin to alas
priority changed from major to critical
status changed from assigned to new

In the end, it was a red herring: increasing the number of frames we test (done in r8941) does not trigger the problem, so the problem was not with the vpx encoder + decoder combination. I eventually stumbled upon those weird messages:

2015-04-07 18:00:24,127 paint_with_video_decoder: colorspace changed from speed to YUV444P
2015-04-07 18:01:54,048 paint_with_video_decoder: colorspace changed from log.py to YUV444P

This should have been more obvious, r8942 should make it so. (and when this happens, we should probably tell the server to restart the video encoder)

Moving on from fish to ducks: if it quacks like a memleak / memcorruption, that's because it is one: fixed in r8940. VP9 + YUV444 now works properly. Except... What really threw my off course is that there was another decoding bug in avcodec, with the exact same symptoms! (and avcodec is the default video decoder used - though this can be changed with the video-decoders config switch) It seems that ffmpeg's avcodec cannot decode vp9 + yuv444p using the same code path that we use for vp8, h264 and h265. So this is now disabled in r8943 (+backport in r8945).

But, it was worth the effort, because I also re-RTFM, and found a few things we needed to do to get good performance out of vp9:

r8947 - using tile columns did not make any difference whatsoever, so the code is there but disabled by default
r8948 + r8949: tuning CPUUSED and disabling VP9E_SET_FRAME_PERIODIC_BOOST makes a massive difference, we also make sure we never used VPX_DL_BEST_QUALITY with vp9.
r8950 also adds support for lossless mode when setting quality=100 - (it could be worth comparing with NVENC's lossless mode, x264 does not have one..)

vp9 is now actually quite usable! On a low-end core i5, I am getting just under 40fps at 1080fps when I select speed=100, with auto-tuning it is closer to 20fps. (and maybe we can improve the heuristics for vp9 so we get >20fps using the default auto settings) Related note: I think there may be a bug with the speed controls from the tray menu. So it is best to use the command line when testing.

In the future, we may also want to set the "tune-content" to "screen" when not dealing with video regions:

/*!brief VP9 encoder content type */
typedef enum {
  VP9E_CONTENT_DEFAULT,
  VP9E_CONTENT_SCREEN,
  VP9E_CONTENT_INVALID
} vp9e_tune_content;

Note for testing: you will need libvpx 1.4.0 or later as I have included an ABI version test to enable the YUV444P feature. There are beta RPM packages available.

@afarr: this one is definitely worth testing (raising to critical), x264 does not seem to be making progress anymore and h265 (at least in software: #445) is not quite ready yet (though the performance problems I was having may have similar solutions - more RTFM for me). If that works well enough for you, we should consider removing vp9 from the hidden list (see r7207).

Wed, 08 Apr 2015 05:39:34 GMT - Antoine Martin:

r8960 fixes builds against libvpx 1.3 (as needed on OSX..)
r8996 fixes builds against libvpx older than 1.3 (but still newer than 1.0)

Tue, 21 Apr 2015 07:41:00 GMT - Antoine Martin:

Note: as part of #840, I've tried libvpx 1.4 on OSX 10.9.5 + xcode 6.2, still fails.

Wed, 06 May 2015 09:48:26 GMT - Antoine Martin:

Note: in trunk (0.16) we now allow the user to select vp9 from the tray (r9275 + r9276)

Tue, 16 Jun 2015 09:25:19 GMT - Antoine Martin:

r9593 removed the scary vp9 warnings in v0.15.x, since it also works fine in that branch

r9789 enables vp9 decoding through avcodec as long as the version of libav is recent enough and it passes the self tests

Note: we still don't enable it when connecting to v0.14.x server, because it is still listed there as a "problematic encoding".

Sat, 01 Aug 2015 01:13:32 GMT - alas:

Tested with osx 10.6.8 0.15.4 r10055 client (no opengl) and with win 8.1 0.15.4 client (yes opengl) against a fedora 20 0.15.4 r10133.

Launching with: xpra attach tcp:10.0.32.53:1201 --encodings=vp9,rgb,webp --quality=100 --speed=100 -d encodings (or the local OS equivalents)... the vp9 performance was abysmal. A couple of frames every couple of seconds in each case.

The vpx is indicated as v1.4.0 in the Encodings output of the bug tool... and checking the server it indicates libvpx-xpra.x86_64 installed of 1.4.0-1.fc20, which seems to be indicated above as being what's required.

Using your clients, but our fedora 20 repo... so I'll also attach the bug tool outputs that seem potentially relevant - maybe I'm just missing something.

Sat, 01 Aug 2015 01:16:41 GMT - alas: attachment set

attachment set to ticket832-Encodings.txt

vp9 test-Encodings win32 client v. fedora 20 server 0.15.4

Sat, 01 Aug 2015 01:17:30 GMT - alas: attachment set

attachment set to ticket832-Network.txt

vp9 tests network win32 0.15.4 client v. fedora 20 0.15.4 server

Sat, 01 Aug 2015 01:17:55 GMT - alas: attachment set

attachment set to ticket832-OpenGL.txt

vp9 tests opengl win32 0.15.4 client v. fedora 20 0.15.4 server

Sat, 01 Aug 2015 01:18:26 GMT - alas: attachment set

attachment set to ticket832-Server_Info.txt

vp9 tests server-info win32 0.15.4 client v. fedora 20 0.15.4 server

Sat, 01 Aug 2015 01:18:57 GMT - alas: attachment set

attachment set to ticket832-System.txt

vp9 tests system win32 0.15.4 client v. fedora 20 0.15.4 server

Sat, 01 Aug 2015 07:13:44 GMT - Antoine Martin:

Easy to reproduce it with the command line given, this only happened when using the command line to set a fixed speed. (the quality was irrelevant here)

r10197 fixes this, backported to v0.15.x in r10199.

Since I have bug report data... looking at your server info:

encoding.dec_webp.version : (0, 3, 1) - is way too old, known to be buggy - do not use. Maybe I should change the build file to just blow up when it finds outdated versions like this.
encoding.vpx.version : v1.3.0 - also too old, you should be on 1.4 to test this ticket (YUV444 and lossless mode)
encoding.x264.version : 142 - upgrading this one would not hurt, though there are no known major bugs with 142

Mon, 03 Aug 2015 22:14:20 GMT - alas:

Odd, when I check the libvpx version on that server, it says it's at 1.4.0:

[jimador@zapopan ~]$ rpm -qa libvpx-xpra
libvpx-xpra-1.4.0-1.fc20.x86_64

Though, if I look for plain libvpx, I do find a 1.3.0:

[jimador@zapopan ~]$ rpm -qa libvpx
libvpx-1.3.0-4.fc20.x86_64

I suppose there's a flag we must've missed to point at libvpx-xpra instead of libvpx?

If you could make a fedora 21 rpm I could re-test with that (smo's out for a bit, and it'd be best if we not rely on my build skills).

Tue, 04 Aug 2015 03:54:00 GMT - Antoine Martin:

Odd, when I check the libvpx version on that server, it says it's at 1.4.0:

That's because you're building against the system libvpx 1.3 and not the xpra libvpx which is 1.4. You need something like:

LDFLAGS=-Wl,-rpath=/usr/lib64/xpra \
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/lib64/xpra/pkgconfig \
  sudo -E  ./setup.py install

Wed, 05 Aug 2015 00:16:07 GMT - alas: status changed

status changed from new to assigned

Interesting... I spun up a new fedora 21 vm and followed the instructions to set up your repo from http://winswitch.org/downloads/rpm-repository.html?dist_select=FedoraBeta (substituting yum install xpra for the last step).

Installed fedora 21 0.16.0 r10216 but, when I was running it as a server and used the bug tool to compare the same encoding versions I'm seeing:

encoding.enc_webp.version : (0, 4, 3) - which I presume is more up to date.
encoding.x264.version : 146 - which I also presume is more up to date.

But -

encoding.vpx.version : v1.3.0 - which seems to also be too old to test this ticket.

And- when I run rpm -qa libvpx-xpra I get no response. I assume just installing a libvpx-xpra from your dists wouldn't be sufficient to convince the xpra build to use the newer library.

I'll try with the above build flags when I get the chance on my own build environment though, and see what happens.

Wed, 05 Aug 2015 01:29:41 GMT - Antoine Martin:

And- when I run rpm -qa libvpx-xpra I get no response

That's the problem.

I assume just installing a libvpx-xpra from your dists wouldn't be sufficient to convince the xpra build to use the newer library.

It should be sufficient.

When building from source, you will also need libvpx-xpra-devel which is the development headers package.

The reason for this is that until now the standard distribution's libvpx version 1.3 was sufficient for our needs (it is only missing VP9 which we did not use), so we didn't require libvpx-xpra (which is available in the repos). r10218 adds libvpx-xpra as a dependency for Fedora up to version 23 (23 doesn't need our package, it has libvpx 1.4 in the default repos), I am making new beta packages for Fedora right now - if this works ok, the change should be backported (no rush though).

Fri, 07 Aug 2015 01:53:14 GMT - alas:

Testing with osx 0.15.4 r10055 client against a fedora 21 0.15.4 r10209 server launched with --encodings=rgb,webp,vp9, the video and audio were pretty good (it felt like the sync was a little further off than with h264, but I don't have all that sensitive of an ear/eye, so hard to say.).

Double checking the server_info from the bug tool to confirm libraries on this thing are sufficiently up to date, I do notice all of these:

client.encoding.csc_atoms        : True
client.encoding.csc_modes        : ('YUV422P', 'BGRX', 'GBRP', 'RGB', 'YUV420P', 'BGRA', 'ARGB', 'XRGB', 'YUV444P')
client.encoding.cython.version   : (0, 3, '0', '22')
client.encoding.dec_webp.version : (0, 4, 3)
client.encoding.default          : vp9
client.encoding.delta_buckets    : 5
client.encoding.enc_webp.version : (0, 4, 3)
client.encoding.full_csc_modes   : {'h264': ('ARGB', 'BGRA', 'BGRX', 'GBRP', 'RGB', 'XRGB', 'YUV420P', 'YUV422P', 'YUV444P'), 'h265': ('BGRX', 'GBRP', 'RGB', 'XRGB', 'YUV420P', 'YUV422P', 'YUV444P'), 'vp9': ('YUV420P',), 'vp8': ('YUV420P',)}

... which seem ok (I'll attach the whole system_info.txt for you to glance at, if anything sounds odd).

I got a swscale warning client side, which I usually ignore because I've never seen it impact anything with the h264 encodings ([swscaler @ 0x7f496c018e00] Warning: data is not aligned! This can lead to a speedloss) ... but shortly after I started noticing the sorts of distortions of picture that I reported in #902, despite the fact I wasn't using h264 to get the frameloss error. (I wasn't able to get a screenshot, the distortion would occasionally persist for a good while, but always resolved when I moved the mouse.)

Scrolling seemed to produce the distortion (including what seemed like subregion "tearing" of video regions, as well as distortion of lossless text areas making text blurry... as well as what seemed like a color distortion turning expected white space to a reddish hue). Resizing sometimes triggered the distortion as well. I was able to capture a couple of seconds of logs server-side with -d encoding (my best guess of relevant flag), which I'll also attach (I turned on the debugging and the firefox window was distorted, so I immediately turned the debugging off).

The line that looked particularly suspicious to me was: using PIL fallback for webp: enc_webp=<module 'xpra.codecs.webp.encode' from '/usr/lib64/python2.7/site-packages/xpra/codecs/webp/encode.so'>, stride=1440, pixel format=RGB, but perhaps that's expected.

Note, once I triggered the distortion in a firefox window (which happened pretty quickly), I was able to trigger it rather easily even with an xterm.

Fri, 07 Aug 2015 01:54:41 GMT - alas: attachment set

attachment set to ticket832_server-d-encoding-log-output.txt

server -d encoding output, working vp9 on fedora 21 server, distortions of lossless

Fri, 07 Aug 2015 01:56:21 GMT - alas: attachment set

attachment set to fedora21-comment-15_Server-Info.txt

fedora 21 server, comment 15, server_info

Wed, 19 Aug 2015 05:38:33 GMT - Antoine Martin:

Mon, 05 Oct 2015 11:36:01 GMT - Antoine Martin: attachment set

attachment set to vp9-x264-x265-encoding-speed-1024x752.png

vp9 x264 x265 encoding speed

Mon, 05 Oct 2015 11:54:46 GMT - Antoine Martin:

As per this excellent presentation from ffmpeg developer Ronald S. Bultje: VideoLAN Dev Days 2015: VP9 encoding/decoding performance vs. H.264/HEVC)

Here is the key graph for us from this presentation:

And the key finding: In practice, that means that if your CPU usage target for x264 is anything faster than veryslow, you basically want to keep using x264, since at that same CPU usage target, x265 will give worse quality for the same bitrate than x264. The story for libvpx is slightly better than for x265, but it’s clear that these next-gen codecs have a lot of work left in this area.

Which is pretty much what I had found and why:

we still default to h264
we use ffmpeg for decoding ahead of libvpx
as or r10763, we don't bother with the slowest vpx settings at all

vp9 isn't doing too badly, but it is just not fast enough to give us 30fps at 1080p.

Caveats:

vp9 does better at high res (4k)
there are cases where the bandwidth savings outweigh the number of fps
as cpus get faster and the code is improved (more gains to be made on vp9 and x265), the balance will continue to tip in their favour
threading: newer codecs take better advantage of multiple core / threads (though apparently less so for libvpx - it is claimed)

Mon, 12 Oct 2015 09:56:41 GMT - Antoine Martin:

This should have been caught earlier: r10818 fixes the quantizer calculations (backported in r10819) which inverted the quality scale (0 was giving high quality, 100 low quality.. except for vp9 with 100 which would still give lossless!). Not as important as the speed setting, but still - bad!

Wed, 28 Oct 2015 01:37:28 GMT - alas: owner, status changed

owner changed from alas to Antoine Martin
status changed from assigned to new

Ok, testing with 0.16.0 r10983 windows client against 0.16.0 r11031 fedora 21 server - I can definitely confirm that --quality=1 is significantly worse than --quality=100 with vp9, as well as mp3.

I do find that h264 is a little better, even on a 4K monitor, when I'm using a window/video that's "small-ish" (less than half the monitor with --desktop-scaling=1.4) ... with both --speed=auto and --speed=90 - but with either of those settings vp9 does actually perform better with fullscreen on the 4K monitor.

Oddly, I am noticing that with --speed=100 the mp3 seems to perform better with fullscreen on the 4K.

With --quality=100 neither mp3 nor vp9 does very well with fullscreen - tearing and dropped frames/ jumpiness abound.

Overall though, to the extent that my eye can judge such things I'd say it is behaving as the charts lead you to expect - vp9 behaving pretty well and edging out mp3's performance at larger sizes.

With --speed=auto I was also getting about 20 fps (xpra info | grep fps=), and with --speed=90 it bumped up to about 25 fps... though, interestingly, testing with chromium I turned on the browser's fps output and it seemed to be under the impression that it was playing at 50-60 fps (blue digits on a black background made it hard to tell for sure what it thought)... so I guess we're still losing some in the encoding?

Do we need to do any more testing on this?

Wed, 28 Oct 2015 02:41:07 GMT - Antoine Martin: status changed; resolution set

status changed from new to closed
resolution set to fixed

with either of those settings vp9 does actually perform better with fullscreen on the 4K monitor ... vp9 behaving pretty well and edging out h264's performance at larger sizes.

We should take the size into account when choosing a codec: #1014

neither mp3 nor vp9 does very well with fullscreen

I assume you mean h264 and vp9 here?

Oddly, I am noticing that with --speed=100 the h264 seems to perform better with fullscreen on the 4K.

If by better you mean "more frames", that is expected: only x264 can encode fast enough to deal with huge amounts of pixels. (see comment:17)

With --quality=100 neither mp3 nor vp9 does very well with fullscreen - tearing and dropped frames/ jumpiness abound.

That's expected: quality=100 means lossless mode, which will take a lot of bandwidth and cpu.

though, interestingly, testing with chromium I turned on the browser's fps output and it seemed to be under the impression that it was playing at 50-60 fps..

It is, we're just not sending as many screen updates as that because we cannot keep up.

Closing at last, we'll probably revisit some of this in #1014.

Fri, 13 Nov 2015 11:04:09 GMT - Antoine Martin:

Notes:

the 0.16 builds will be using libvpx 1.5 as of r11204
we give a higher "speed" score to "vp9" if we detect libvpx >= 1.5 since this release made some significant performance improvements

Tue, 02 Aug 2016 16:24:55 GMT - Antoine Martin:

vp9 is actually reasonably fast and efficient, as long as we tune the speed parameter within a very narrow range (a value between 4 and 8 rather than 0 and 8 as before, the whole API range is -8 to 8): done in r13175.

Sat, 23 Jan 2021 05:07:20 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/832