Xpra: Ticket #370: nvenc hardware accelerated encoding

See also #451 (libva accelerated encoding)

Pointers:

Worth mentioning:



Sat, 06 Jul 2013 08:23:00 GMT - Antoine Martin: attachment set

stub libva files to get going


Tue, 16 Jul 2013 05:43:47 GMT - Antoine Martin: status, milestone changed

More pointers (for libva - not nvenc):


Thu, 18 Jul 2013 05:32:30 GMT - Antoine Martin: description changed


Mon, 09 Sep 2013 15:53:27 GMT - Antoine Martin: attachment set

stubs for implementing an nvenc encoder


Fri, 13 Sep 2013 09:02:12 GMT - ahuillet:

Attached is a gdb command file for tracing NVENC calls made by the sample app from the SDK.

Use:

$ gdb --args ./nvEncoder -config=config.txt -outFile=bba.h264 inFile=../YUV/1080p/HeavyHandIdiot.3sec.yuv
(gdb) b NvEncodeAPICreateInstance
Function "NvEncodeAPICreateInstance" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (NvEncodeAPICreateInstance) pending.
(gdb) r
Starting program: /home/ahuillet/nvenc_3.0_sdk/Samples/nvEncodeApp/nvEncoder -config=config.txt -outFile=bba.h264 inFile=../YUV/1080p/HeavyHandIdiot.3sec.yuv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
>> GetNumberEncoders() has detected 8 CUDA capable GPU device(s) <<
  [ GPU #0 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #1 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #2 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #3 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #4 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #5 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #6 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
  [ GPU #7 - < GRID K1 > has Compute SM 3.0, NVENC Available ]
Breakpoint 1, 0x00007ffff6d179a0 in NvEncodeAPICreateInstance () from /lib64/libnvidia-encode.so.1
(gdb) n
(gdb) n
(gdb) source ~/trace_nvenc
Breakpoint 2 at 0x7ffff6d175f0
...

Then continue the execution. Results found in attachment/ticket/370/nvenc-trace.txt (crazy long ticket comment cleaned up by totaam)


Fri, 13 Sep 2013 09:02:36 GMT - ahuillet: attachment set


Fri, 13 Sep 2013 09:20:59 GMT - ahuillet:

After cleanup, this is the sequence of calls for setup:

nvEncOpenEncodeSessionEx
nvEncGetEncodeGUIDCount
nvEncGetEncodeGUIDs
nvEncGetEncodePresetCount
nvEncGetEncodePresetGUIDs
nvEncGetEncodePresetConfig
nvEncGetEncodeProfileGUIDCount
nvEncGetEncodeProfileGUIDs
nvEncGetInputFormatCount
nvEncGetInputFormats
nvEncInitializeEncoder

Then, create the input and output buffers:

nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
[repeat N times]
nvEncRegisterAsyncEvent

Then:

> NVENC Encoder[0] configuration parameters for configuration #0
> GPU Device ID             = 0
> Frames                    = 0 frames
> ConfigFile                = (null)
> Frame at which 0th configuration will happen = 0
> maxWidth,maxHeight        = [1920,1080]
> Width,Height              = [1920,1080]
> Video Output Codec        = 4 - H.264 Codec
> Average Bitrate           = 6000000 (bps/sec)
> Peak Bitrate              = 0 (bps/sec)
> Rate Control Mode         = 1 - VBR (Variable Bitrate)
> Frame Rate (Num/Denom)    = (30000/1001) 29.9700 fps
> GOP Length                = 30
> Set Initial RC      QP    = 0
> Initial RC QP (I,P,B)     = I(0), P(0), B(0)
> Number of B Frames        = 2
> Display Aspect Ratio X    = 1920
> Display Aspect Ratio Y    = 1080
> Video codec profile       = 100
> Video codec Level         = 0
> FieldEncoding             = 0
> Number slices per Frame   = 1
> Encoder Preset            = 3 - High Quality (HQ) Preset
> NVENC API Interface       = 2 - CUDA
Input Filesize: 230227968 bytes
[ Source Input File ] = "../YUV/1080p/HeavyHandIdiot.3sec.yuv
[ # of Input Frames ] = 74
 ** Start Encode <../YUV/1080p/HeavyHandIdiot.3sec.yuv>, Frames [0,74] **
Loading Frames [0,73] into system memory queue (74 frames)
nvEncReconfigureEncoder
Encoding Frames [0,73]
and the actual encoding process, probably with async trace
nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
nvEncCreateInputBuffer
nvEncCreateBitstreamBuffer
nvEncRegisterAsyncEvent
nvEncCreateInputBuffer

Fri, 13 Sep 2013 13:30:51 GMT - Antoine Martin:

r4328 adds the stub encoder which successfully initializes nvenc (and nothing else yet..) and more: r4329 + r4330


Wed, 18 Sep 2013 07:41:30 GMT - Antoine Martin:

Lots more (too many changesets to list, see r4349 and earlier)

We now get valid h264 data out of it and have tests in and out.

What still needs to be done:

Preferably on the GPU to save CPU (see #384)


Fri, 27 Sep 2013 09:41:09 GMT - Antoine Martin:

With correct padding, r4375 seems to work - though I have seen some of those:

[h264 @ 0x7fa4687d4e20] non-existing PPS 0 referenced
[h264 @ 0x7fa4687d4e20] decode_slice_header error

On the client side...

When we instantiate the client-side decoder with the window size (rounded down to even size) whereas the data we get from nvenc has dimensions rounded up to 32... Not sure if it is a problem, or if we can/should send the actual encoder size to the client. Thinking more about it, the vertical size must be rounded up to 32 otherwise nvenc does not find the U and V planes where it wants them... but the horizontal size I am not so sure (as it seemed to work before without padding)

The padding could be useful when resizing a window: we don't need (at least not server side..) a full encoder re-init unless the new size crosses one of the 32-padded boundaries.

Note: it looks like the buffer formats advertised as being supported come in lists of bitmasks (the docs claims it is a plain list) - we use NV12_PL


Fri, 27 Sep 2013 14:41:30 GMT - Antoine Martin: attachment set

abandoned work on adding stride attributes to csc so nvenc can specify the padding to 32 generically


Fri, 27 Sep 2013 14:43:08 GMT - Antoine Martin: attachment set

abandoned work on adding stride attributes to csc so nvenc can specify the padding to 32 generically (again with missing server file)


Sat, 28 Sep 2013 10:26:46 GMT - Antoine Martin: attachment set

use two buffers CUDA side so we can use a kernel to copy (and convert) from one to the other


Sat, 28 Sep 2013 10:27:22 GMT - Antoine Martin: attachment set

use pycuda to remove lots of code... except this does not work because we need a context pointer for nvenc :(


Sat, 28 Sep 2013 11:58:59 GMT - Antoine Martin: attachment set

"working" pycuda version with an empty kernel


Sat, 28 Sep 2013 13:32:49 GMT - Antoine Martin: attachment set

with kernel doing something - causes crashes..


Sat, 28 Sep 2013 13:49:46 GMT - Antoine Martin:

With attachment/ticket/370/nvenc-pycuda-with-kernel2.patch, not too much work is left:

The TLS issues still need resolving:

Other things we may want to do:


Sun, 06 Oct 2013 11:05:50 GMT - Antoine Martin: attachment set

use (py)cuda to copy from input buffer (already in NV12 format) to output buffer (nvenc buffer)


Sun, 06 Oct 2013 16:33:34 GMT - Antoine Martin:

r4414 adds the pycuda code (simplifies things) and does the BGRA to NV12 CSC using a custom CUDA kernel. Total encoding time is way down.

New issues:


Here's how I run the server when testing:

PATH=$PATH:/usr/local/cuda-5.5/bin/ \
LD_LIBRARY_PATH=/usr/local/cuda-5.5/lib64 \
XPRA_NVENC_DEBUG=1 \
XPRA_DAMAGE_DEBUG=1 \
XPRA_VIDEOPIPELINE_DEBUG=1 \
XPRA_ENCODER_TYPE=nvenc \
xpra start :10

Fri, 25 Oct 2013 03:47:14 GMT - Antoine Martin:

For building, here is the nvenc.pc pkgconfig file I've used on Fedora 19:

prefix=/opt/nvenc_3.0_sdk
exec_prefix=${prefix}
core_includedir=${prefix}/Samples/core/include
api_includedir=${prefix}/Samples/nvEncodeApp/inc
libdir=/usr/lib64/nvidia
Name: nvenc
Description: NVENC
Version: 1.0
Requires:
Conflicts:
Libs: -L${libdir} -lnvidia-encode
Cflags: -I${core_includedir} -I${api_includedir}

Note: this refers to unversioned libraries, which you may need to create, here for a 64-bit build:

cd /usr/lib64/nvidia/
ln -sf libnvidia-encode.so.1 libnvidia-encode.so
ln -sf libcuda.so.1 libcuda.so
#etc..
cd /usr/lib64
ln -sf nvidia/libcuda.so ./

(or you can add the version to the pkgconfig file)


Fri, 25 Oct 2013 04:00:11 GMT - Antoine Martin: attachment set

trace from comment:3


Fri, 25 Oct 2013 04:31:41 GMT - Antoine Martin:

Instructions for installing NVENC support from scratch on Fedora 19:

Do install CUDA, you can skip the rest, you *must* tell it not to install the broken drivers it wants to install.

Be very careful not to place cuda ahead of the regular LD_LIBRARY_PATH as this can cause big problems with some libraries (ie: libopencl)

Finally, you can test that xpra builds with cuda/nvenc support:

./setup.py --with-nvenc --with-csc_nvcuda build

And that you can run the cuda/nvenc tests:

mkdir tmp && cd tmp
cp -apr ../tests ./
PYTHONPATH=. ./tests/xpra/codecs/test_csc_nvcuda.py
PYTHONPATH=. ./tests/xpra/codecs/test_nvenc.py

Fri, 25 Oct 2013 06:49:07 GMT - Antoine Martin:

Strangely enough, the test encoder fails on a GTX 760 and not with a graceful error:

$ gdb ./nvEncoder
(..)
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/nvenc_3.0_sdk/Samples/nvEncodeApp/nvEncoder...(no debugging symbols found)...done.
(gdb) break OpenEncodeSession
Function "OpenEncodeSession" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (OpenEncodeSession) pending.
(gdb) run -configFile=HeavyHand_1080p.txt -outfile=HeavyHandIdiot.3sec.264
Starting program: /opt/nvenc_3.0_sdk/Samples/nvEncodeApp/./nvEncoder -configFile=HeavyHand_1080p.txt -outfile=HeavyHandIdiot.3sec.264
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
>> GetNumberEncoders() has detected 1 CUDA capable GPU device(s) <<
  [ GPU #0 - < GeForce GTX 760 > has Compute SM 3.0, NVENC Available ]
>> InitCUDA() has detected 1 CUDA capable GPU device(s)<<
  [ GPU #0 - < GeForce GTX 760 > has Compute SM 3.0, Available NVENC ]
>> Select GPU #0 - < GeForce GTX 760 > supports SM 3.0 and NVENC
[New Thread 0x7ffff5bce700 (LWP 16417)]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000000000040b12f in CNvEncoder::OpenEncodeSession(int, char const**, unsigned int) ()
#2  0x000000000040dcb2 in CNvEncoderH264::EncoderMain(EncoderGPUInfo, EncoderAppParams, int, char const**) ()
#3  0x0000000000401f7b in main ()

What's even more strange is that our test code fails even earlier:

Traceback (most recent call last):
  File "./tests/xpra/codecs/test_nvenc.py", line 136, in <module>
    main()
  File "./tests/xpra/codecs/test_nvenc.py", line 128, in main
    test_encode_one()
  File "./tests/xpra/codecs/test_nvenc.py", line 17, in test_encode_one
    test_encoder(encoder_module)
  File "/home/spikesdev/src/tmp/tests/xpra/codecs/test_encoder.py", line 62, in test_encoder
    e.init_context(actual_w, actual_h, src_format, encoding, 20, 0, options)
  File "encoder.pyx", line 1179, in xpra.codecs.nvenc.encoder.Encoder.init_context (xpra/codecs/nvenc/encoder.c:5739)
  File "encoder.pyx", line 1217, in xpra.codecs.nvenc.encoder.Encoder.init_cuda (xpra/codecs/nvenc/encoder.c:6686)
  File "encoder.pyx", line 1232, in xpra.codecs.nvenc.encoder.Encoder.init_nvenc (xpra/codecs/nvenc/encoder.c:6813)
  File "encoder.pyx", line 1649, in xpra.codecs.nvenc.encoder.Encoder.open_encode_session (xpra/codecs/nvenc/encoder.c:12790)
  File "encoder.pyx", line 1102, in xpra.codecs.nvenc.encoder.raiseNVENC (xpra/codecs/nvenc/encoder.c:4935)
Exception: getting API function list - returned 15: This indicates that an invalid struct version was used by the client.

I am pretty sure that when I tested on a GTX 450, I got past this point and it failed when creating the context instead (since that card does not support nvenc), that's why there is the XPRA_NVENC_FORCE flag in the code.

Edit: this is a problem with the newer drivers which are incompatible with NVENC SDK v3. (SDK v2 works though!)


Mon, 28 Oct 2013 08:40:38 GMT - Antoine Martin: attachment set

changes needed to build against the NVENC SDK version 2


Mon, 28 Oct 2013 08:41:42 GMT - Antoine Martin: attachment set

example pkgconfig file for NVENC SDK version 2


Mon, 28 Oct 2013 10:27:18 GMT - Antoine Martin: attachment set

updated (smaller) patch to apply on top of r4620


Mon, 28 Oct 2013 13:19:35 GMT - Antoine Martin:

As of r4621 the code supports both SDK v2 and v3, use whichever works with your current driver version.


Fri, 01 Nov 2013 09:34:43 GMT - Antoine Martin:

Took me a while to figure this out:

Looks like nvidia forgot to test backwards compatibility with their "upgrade".

r4651 makes V3 the default again - "newer is better", right?

Anyway, to install 3.19 on a system with kernel 3.10 or newer (ie: Fedora 19) is a PITA:

which will fail at the DKMS stage if building against a kernel version 3.11 or newer..


Fri, 01 Nov 2013 11:25:40 GMT - Antoine Martin:

Important fix in r4652 which will need to be backported to v0.10.x


Tue, 05 Nov 2013 08:01:11 GMT - Antoine Martin: description, summary changed

Remaining tasks for nvenc:


At the moment, running out of contexts does this:

2013-11-05 14:40:58,590 setup_pipeline failed for (65, None, 'BGRX', codec_spec(nvenc))
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 605, in setup_pipeline
    self._video_encoder.init_context(enc_width, enc_height, enc_in_format, encoder_spec.encoding, quality, speed, self.encoding_options)
  File "encoder.pyx", line 1291, in xpra.codecs.nvenc.encoder.Encoder.init_context (xpra/codecs/nvenc/encoder.c:5883)
  File "encoder.pyx", line 1329, in xpra.codecs.nvenc.encoder.Encoder.init_cuda (xpra/codecs/nvenc/encoder.c:6830)
  File "encoder.pyx", line 1344, in xpra.codecs.nvenc.encoder.Encoder.init_nvenc (xpra/codecs/nvenc/encoder.c:6957)
  File "encoder.pyx", line 1828, in xpra.codecs.nvenc.encoder.Encoder.open_encode_session (xpra/codecs/nvenc/encoder.c:13775)
  File "encoder.pyx", line 1203, in xpra.codecs.nvenc.encoder.raiseNVENC (xpra/codecs/nvenc/encoder.c:5070)
Exception: opening session - returned 2: This indicates that devices pass by the client is not supported.
2013-11-05 14:40:58,593 error processing damage data: failed to setup a video pipeline for h264 encoding with source format BGRX

Wed, 06 Nov 2013 02:14:27 GMT - Antoine Martin:

Everything about nvenc is now on the wiki here


Sun, 10 Nov 2013 06:52:48 GMT - Antoine Martin:

Updates:


Mon, 11 Nov 2013 09:54:41 GMT - Antoine Martin: owner, status changed

More details edited in comment:18, this is good enough for testing.

At this point the encoder should work and give us the decent quality (we need YUV444P for best quality support) with much lower latency, it also supports efficient scaling.

Please test it and try to break it. (please read Using NVENC first) Trying different resolutions, types of clients, etc.. Measuring fps, with and without, server load, bandwidth, etc... Be aware that only newer clients can take advantage of nvenc at present (r4722 needs backporting). It may also take a few seconds for nvenc to beat x264 in our internal scoring system which decides the combination of encoder and csc modules to use.


Things that will probably be addressed in a follow up ticket for the next milestone:

Lower priority still:

Those have been moved to #466


Mon, 11 Nov 2013 10:18:44 GMT - Antoine Martin: attachment set

this crash occurred as I killed xpra with SIGINT.. hopefully rare and due to SIGINT


Wed, 12 Feb 2014 19:21:00 GMT - Smo: status changed; resolution set

Tested and working with fedora 20 server.

Started with this command line.

LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/cuda/lib64:/usr/lib64/nvidia \
    xpra --bind-tcp=0.0.0.0:1400 --start-child="xterm -fg white -bg black" \
         --no-daemon --encryption=AES --password-file=./passtest start :14

Thu, 13 Feb 2014 12:20:55 GMT - Antoine Martin:

see #517


Sun, 29 Mar 2015 07:17:20 GMT - Antoine Martin:

Note for those landing here: NVENC is not safe to use in versions older than 0.15 because of a context leak due to threading.


Sat, 23 Jan 2021 04:53:08 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/370