Xpra: Ticket #595: latest nvidia drivers break YUV444 encoding

Not sure which versions work and which versions do not (this seems to correspond to the changes which also changed the list of license keys, see NVENC developer key?), so r6741 allows us to turn it off via the XPRA_NVENC_YUV444P=0 env var.

Since the YUV444P mode is completely undocumented, and nvidia developers even claimed that it wasn't possible to use it (it is, well it was...) - this is not going to be fun.

Sun, 15 Jun 2014 13:34:48 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to Antoine Martin
status changed from new to assigned

r6781 added nvidia driver version logging to make it easier to instantly see which version is loaded. Maybe we can also use it to give warnings about which versions are known to break and / or disable YUV444 when we know it isn't going to work.

More tests with Fedora 20, kernel 3.14.7-200.fc20.x86_64 and a GTX 760 OC:

304.121 - not supported, libnvidia-encode.so not provided
319.76 works but requires an older kernel (ie: 3.11.10-301.fc20)
325.15 does not build, even with older 3.11 kernel
331.79 works but requires different keys than earlier builds
334.21 works but requires an older kernel (ie: 3.11.10-301.fc20) or a patched installer
337.25 breaks YUV444 support, requires the "newer" set of keys
340.17 fails completely with failed to get preset config for ... This indicates that an invalid struct version was used by the client. - looks to me like this is meant to be used with a newer SDK.

Mon, 16 Jun 2014 04:45:59 GMT - Antoine Martin: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

r6811 disables YUV444 when we detect a buggy nvidia driver version. Can also be re-enabled via env var if needed.

Tue, 17 Jun 2014 16:19:22 GMT - Antoine Martin: status changed; resolution deleted

status changed from closed to reopened
resolution fixed deleted

Looks like other versions break... re-opening.

Tue, 17 Jun 2014 16:21:11 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to Smo
status changed from reopened to new

r6831 blacklists some more versions and disables YUV444 mode.

Please check:

that the list in comment:1 is correct
that the blacklist in r6831 is also correct, you can verify by force enabling it via the env var: you should get errors.

We may need to blacklist some more versions..

Thu, 19 Jun 2014 20:53:06 GMT - Smo:

Confirmed

331.79 Works with new key 340.17 Fails completely with same error

Will test with 331.79 for now

Thu, 17 Jul 2014 20:08:36 GMT - Antoine Martin:

smo: 331.79 is not as good as older versions: no YUV444...

Tue, 29 Jul 2014 10:56:40 GMT - Antoine Martin: priority changed

priority changed from major to blocker

(raising: blocker for release)

Mon, 04 Aug 2014 22:14:03 GMT - Smo:

Testing some new drivers from

http://www.nvidia.ca/object/unix.html

Trying

340.24

Produces this output

2014-08-04 18:12:57,188 nvenc: found nvidia kernel module version 340.24
2014-08-04 18:12:57,206 CUDA initialization (this may take a few seconds)
2014-08-04 18:12:59,396 CUDA 6.0.0 / PyCUDA 2013.1.1, found 2 device(s):
2014-08-04 18:12:59,564   + GeForce GTX 750 Ti @ 0000:83:00.0 (memory: 98% free, compute: 5.0)
2014-08-04 18:12:59,689   + GeForce GTX 650 @ 0000:09:00.0 (memory: 97% free, compute: 3.0)
2014-08-04 18:13:00,111 pulseaudio server started with pid 5087
2014-08-04 18:13:00,120 started child 'xterm -fg white -bg black' with pid 5089
2014-08-04 18:13:00,148 xpra server version 0.14.0 (r7043)
2014-08-04 18:13:00,148 running with pid 5049
2014-08-04 18:13:00,368 xpra is ready.
2014-08-04 18:13:00,546 failed to get preset config for default (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / B2DFB705-4EBD-4C49-9B5F-24A777D3E587): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,546 failed to get preset config for low-latency (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 49DF21C5-6DFA-4FEB-9787-6ACC9EFFB726): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,546 failed to get preset config for hp (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 60E4C59F-E846-4484-A56D-CD45BE9FDDF6): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for hq (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 34DBA71D-A77B-4B8F-9C3E-B6D5DA24C012): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for bd (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 82E3E450-BDBB-4E40-989C-82A90DF9EF32): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for low-latency-hq (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / C5F733B9-EA97-4CF9-BEC2-BF78A74FD105): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for low-latency-hp (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 67082A44-4BAD-48FA-98EA-93056D150A58): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for None (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 7ADD423D-D035-4F6F-AEA5-50885658643C): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 nvenc: found some unknown presets: 7ADD423D-D035-4F6F-AEA5-50885658643C
2014-08-04 18:13:00,548 failed to get preset config for hp (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 60E4C59F-E846-4484-A56D-CD45BE9FDDF6): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,583 Warning: nvenc video encoder failed: could not find preset hp

Wed, 06 Aug 2014 19:08:31 GMT - Smo:

Tried today with new beta driver 343.13 on CentOS 7

Identical messages as comment:8

Sun, 17 Aug 2014 14:19:55 GMT - Antoine Martin: status changed; resolution set

status changed from new to closed
resolution set to worksforme

I believe the version blacklisting works.

If we find newer versions of the driver that need to be added to the blacklist, add them here.

Tue, 19 Aug 2014 03:32:12 GMT - Antoine Martin: milestone changed

milestone changed from 0.15 to 0.14

Was done for milestone 0.14

Thu, 21 Aug 2014 03:54:59 GMT - Antoine Martin:

Worth noting that the CUDA 6.5 SDK is not compatible with the 331.79 drivers (6.0 is fine).

You get:

UserWarning: Failed to import the CUDA driver interface, with an error message  \
indicating that the version of your CUDA header does not match the version of your CUDA driver

Mon, 13 Oct 2014 04:56:38 GMT - Antoine Martin:

My findings so far:

the Maxwell based 750 Ti really is much faster than the Kepler based 760 OC! (despite its lower clock speed, lower number of CUDA cores, etc..)
almost no difference in terms of performance between SDK v3 and v4 - as expected since this is all done in hardware (using the same settings - v4 does have interesting new options though)
newer kernels (3.16) perform better than older ones (3.11) by about 10%, with the same CUDA and same driver versions
driver version 340.x has different performance characteristics than previous versions (tested on GTX750 Ti only): more consistent speed, even with smaller encoding sizes, though a slightly lower performance at 1080p - it looks like the overheads are lower somehow
as expected quality=100 with speed=0 is the slowest setting (10 to 20% slower - except with driver 340.x, see above), all the other settings perform very similarly
I can saturate the 2GB memory of my GTX 760 OC with just 22 1080p contexts: cuCtxCreate failed: out of memory (also being used for my own desktop, so it was only ~75% free to begin with, so 2GB per card is OK for 1080p I think)
we have a big problem with parallel encoding (maybe I just tested wrong previously?): the threaded, parallel encoding tests using 10 threads show that we are quickly reaching the context limit (Tested with both the GTX 760 and 750 Ti, the latter is a bit faster as per above - only tested with drivers 340.x and 343.x):
- at 1080p: sequential encoding goes at about 128MPixels/s, with 10 threads the speed drops to a 15MPixels/s. This only gives us 8fps, which is just not good enough! (the 750 Ti does 12 fps for 10 contexts)
- at 720p: from 128MPixels/s down to 9MPixels/s with the 760 OC, and 184 down to 20 with the 750 Ti. That's ~11fps and 24 fps respectively. (and that's a best case scenario)

Before I put this on the wiki somewhere, I want to record my testing with all the combinations of CUDA, kernel, drivers, SDK... compatibility issues:

NVENC v4 requires CUDA 6.5 (you get invalid struct version was used with 6.0 and earlier)
driver branch 331 does not work with CUDA 6.5, errors out with undefined symbol: cuMemHostRegister_v2 (CUDA 6.0 OK, not tested 5.5)
driver branch 331.x does not support 4k monitors
driver branch 340.x does support 4k
drivers that require the "new" keys do not support YUV444: forcing it with XPRA_NVENC_YUV444P=1 gives us the same performance as those that do work, but with data that cannot be used by the client (and managed to crash the client once! fuzzing of sorts..)
driver versions that require "old" keys: 331.67, ... "new" keys: 331.89, 331.104, 340.*, 343.*

For testing, use the test_nvenc3 and test_nvenc4 scripts.

Note: pycuda must be rebuilt against the version of CUDA being tested - which takes time..

As of r7944, we can build both nvenc modules (v3 and v4) at the same time with:

./setup.py --with-nvenc3 --with-nvenc3

Which should be auto detected already since you will need the pkgconfig file for each SDK. To choose which CUDA SDK to build against use:

./setup.py --with-cuda=6.5

You will need at least 6.5 to build the nvenc4 module, nvenc3 works with 5.0 onwards.

Testing with a GTX 750 Ti.

Fedora kernel 3.11 with NVENC SDK 3:
- driver 319.76: card is "not supported"
- driver 331.67:
  - CUDA 6.5: No
  - CUDA 6.0: OK
    - NV12 (in MPixels/s):
      - 1920x1080: 181 to 218
      - 1024x768: 123 to 158
    - YUV444:
      - 1920x1080: 40 to 44
      - 1024x768: 34 to 42
- driver 331.89:
  - CUDA 6.5: No
  - CUDA 6.0: OK
    - NV12:
      - 1920x1080: 182 to 210
      - 1024x768: 141 to 197
    - YUV444: broken
- driver 331.104:
  - CUDA 6.5: No
  - CUDA 6.0: OK
    - NV12:
      - 1920x1080: 188 to 226
      - 1024x768: 141 to 201
    - YUV444: broken
- driver 340.46:
  - CUDA 6.0: No
  - CUDA 6.5:

Fedora kernel 3.16 with NVENC SDK 3:
- driver 319.x: cannot build
- driver 331.67:
  - CUDA 6.5: ?
  - CUDA 6.0:
    - NV12:
      - 1920x1080: 212 to 240
      - 1024x768: 158 to 201
    - YUV444:
      - 1920x1080: 46 to 47
      - 1024x768: 34 to 44
- driver 331.89:
  - CUDA 6.5: ?
  - CUDA 6.0:
    - NV12:
      - 1920x1080: 212 to 240
      - 1024x768: 158 to 203
    - YUV444: broken
- driver 331.104:
  - CUDA 6.5: ?
  - CUDA 6.0:
    - NV12:
      - 1920x1080: 212 to 243
      - 1024x768: 158 to 204
    - YUV444: broken
- driver 340.46:
  - CUDA 6.5: ?
  - CUDA 6.0:
    - NV12:
      - 1920x1080: 206 to 214
      - 1024x768: 205 to 207
    - YUV444: broken

Fedora kernel 3.16 with NVENC SDK 4:
- ALL 331.x drivers tested (331.67, 331.89, 331.104):
  - CUDA 6.0: version mismatch
  - CUDA 6.5: ?
- driver 340.46:
  - CUDA 6.0: No
  - CUDA 6.5:
    - NV12:
      - 1920x1080: 205 to 215
      - 1024x768: 204 to 208
    - YUV444: new support code pending

Testing with a GTX 760 OC:

Fedora NVENC SDK 4 kernel 3.16
- drivers 331.x (331.79, 331.89):
  - CUDA 6.0: "version mismatch"
  - CUDA 6.5: No
- drivers 343.22:
  - CUDA 6.0: ?
  - CUDA 6.5:
    - NV12:
      - 1920x1080: 125 to 131
      - 1024x768: 83 to 121
Fedora NVENC SDK 3 kernel 3.16
- drivers 343.22:
  - CUDA 6.0: no
  - CUDA 6.5:
    - NV12:
      - 1920x1080: 125 to 133
      - 1024x768: 84 to 116

Will edit this ticket more after reboots and code updates..

Sat, 23 Jan 2021 05:00:23 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/595