Xpra: Ticket #466: nvenc improvements: YUV444P mode and bandwidth auto tuning

split from #370:


Lower priority:



Sat, 04 Jan 2014 05:33:36 GMT - Antoine Martin: owner, status, description changed


Tue, 04 Feb 2014 03:15:51 GMT - Antoine Martin: description changed


Wed, 19 Feb 2014 11:47:32 GMT - Antoine Martin: attachment set

YUV444 for NVENC: using 3 pass encoding (one for each of Y, U and V)


Wed, 19 Feb 2014 14:23:35 GMT - Antoine Martin: description changed

Updated TODO list:


Sat, 22 Feb 2014 12:05:31 GMT - Antoine Martin:

r5515 caused a big memory leak client side, fixed in r5542


Sun, 02 Mar 2014 13:02:56 GMT - Antoine Martin:

Important YUV444P fix in r5667: so this is what the undocumented colourPlaneId does!


r5664 and r5666 also allow us to tune the bitrate based on the usual "speed" setting and the encoder input size, using an exponential scale to prefer low bandwidth (see changeset for details).


Testing with glxspheres64 and -d nvenc, auto-scaling turned off with XPRA_SCALING=0:


So YUV420P is much faster than the 3-pass YUV444P mode, as expected.

Note: r5668 enables YUV444P for quality>=50%, but with r5669 we don't bother with it when downscaling.


Wed, 19 Mar 2014 07:28:25 GMT - Antoine Martin: owner, status, summary changed

This will have to do for this release, most of the important remaining items are too intrusive to change this late in the release cycle.

Remaining items moved to #538 and #564

smo: please test:


Thu, 15 May 2014 20:42:02 GMT - Smo:

I'm not able to run glxspheres64 because of the nature of the setup. Is there there something else that I could try that doesn't involve GL?

I'm hoping to close this as it seems to work well for me but I want to post some information from my setup before closing.


Sat, 17 May 2014 09:09:25 GMT - Antoine Martin:

I only use glxspheres and glxgears often because they produce lots of frames without requiring any external data, but playing a video will do just as well.

Note: you may be able to run GL stuff against software mesa rendering, without needing an X11 server running and with the nvidia libGL installed on the system, by using LD_SO_PRELOAD tricks.


Tue, 10 Jun 2014 05:20:28 GMT - Smo: status changed; resolution set

Needs more testing with newer NVIDIA drivers / cuda sdk but will close now until there is something to comment on.


Tue, 10 Jun 2014 08:24:26 GMT - Antoine Martin:

Did you measure the bitrate and performance as per comment:6?


FYI: r6699 allows us to specify multiple license keys in CSV format:

XPRA_NVENC_CLIENT_KEY="key1,key2" /usr/bin/xpra start ...

Which makes it easier to deal with the constant nvidia license key driver breakage


Sun, 24 Aug 2014 06:39:57 GMT - Antoine Martin: status changed; resolution deleted

Please test with nvenc v4, see #653


Mon, 27 Oct 2014 17:51:44 GMT - Antoine Martin: owner, status changed

I am taking this ticket back as YUV444 in nvenc4 is completely different from SDK v4 and is going to require quite a few changes - which should give us a nice performance improvement. Will re-assign for testing + benchmarking afterwards.


Tue, 27 Jan 2015 06:12:36 GMT - Antoine Martin: owner changed

Moving the new YUV444 mode to a new ticket so this can get more testing, together with #653.

smo: not sure who should test this, but it's been ready for months, time to get on it.


Thu, 12 Mar 2015 17:14:23 GMT - Smo:

Here are some performance numbers from 2 cards

quality=100 (YUV444P mode)
GTX 650
compress_image(..) returning 54939 bytes (1.1%), complete compression for frame 6875 took 9.3ms
compress_image(..) returning 54939 bytes (1.1%), complete compression for frame 6876 took 8.8ms
GTX 750 ti
compress_image(..) returning 164794 bytes (3.4%), complete compression for frame 64 took 10.2ms
compress_image(..) returning 164816 bytes (3.4%), complete compression for frame 63 took 13.3ms
GTX 970
compress_image(..) returning 325881 bytes (4.5%), complete compression for frame 151 took 14.5ms
compress_image(..) returning 321035 bytes (4.5%), complete compression for frame 152 took 14.8ms
quality=50 (YUV420P mode)
GTX 650
compress_image(..) returning 15659 bytes (0.3%), complete compression for frame 310 took 8.6ms
compress_image(..) returning 15617 bytes (0.3%), complete compression for frame 311 took 8.5ms
GTX 750 ti
compress_image(..) returning 10193 bytes (0.2%), complete compression for frame 1085 took 11.7ms
compress_image(..) returning 10193 bytes (0.2%), complete compression for frame 1086 took 11.0ms
GTX 970
compress_image(..) returning 18628 bytes (0.4%), complete compression for frame 178 took 9.2ms
compress_image(..) returning 18356 bytes (0.4%), complete compression for frame 179 took 8.3ms

I have 1 more card to test and I will update this when I do some more testing.


Sat, 14 Mar 2015 10:11:17 GMT - Antoine Martin:

Interesting to see the GTX 970 going more slowly than I expected, more slowly than when I had tested it IIRC.

Some things worth mentioning:


Thu, 23 Apr 2015 20:14:07 GMT - Smo: status changed; resolution set

I agree there are many factors when trying to benchmark. It would be a good idea to come up with a better way.

I'm closing this for now as it is working.


Sat, 23 Jan 2021 04:56:25 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/466