xpra icon
Bug tracker and wiki

Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#110 closed enhancement (fixed)

x264 and vpx improvements

Reported by: Antoine Martin Owned by: ahuillet
Priority: minor Milestone: 0.3
Component: core Version: 0.2.0
Keywords: Cc:

Description (last modified by Antoine Martin)

#94 added x264 and vpx to trunk for the 0.2.x release.

This ticket tracks remaining improvements which are not required for release but provide improvements we should deal with sooner or later:

  • releasing the gil: automated testing revealed that x264 and vpx cause a jump in client and server latency. Unfortunately, because we use PyObject_AsReadBuffer to access the python data to encode/decode, we cannot simply add with nogil:. It may be worth trying out multiprocessing, so the encoder/decoder can take advantage of multicore cpus (at the cost of copying the data before processing it..) - mostly done already.
  • add hooks for rate control (set target bandwidth) - this can be in update_batch_delay
  • add hooks to change color subsampling (currently hardcoded to 4x YUV420)
  • check if automatic lossless refresh is needed or if ratecontrol does it for free
  • fix C compiler warning during build:
    xpra/x264\codec.c(1623) : warning C4013: 'change_encoding_speed' undefined; assuming extern returning int
    

(done)

  • re-activate encoding latency detection with a smarter mechanism - partly done as part of #153
  • update to newer versions of x264 since we are using the one build we know that does not crash (see r855) - done
  • OpenGL - see #147 / #150
  • swscale memalign warning - see #159
  • ffmpeg win32 build - see #168

Attachments (6)

xpra-x264-no-gil.patch (5.1 KB) - added by Antoine Martin 8 years ago.
allows us to drop the gil when calling the compress part of x264 encoding
xpra-x264-no-gil-v2.patch (6.1 KB) - added by Antoine Martin 8 years ago.
update patch with memleak free_csc_image fix and whitespace
xpra-x264-heisenbug.patch (3.0 KB) - added by Antoine Martin 8 years ago.
adds debugging to the codepaths used by x264 in the client
xpra-x264-green-on-black.png (9.5 KB) - added by Antoine Martin 8 years ago.
example of colour subsampling with green on black text - making it hard to read
gil-vs-nogil.csv (52.7 KB) - added by Antoine Martin 8 years ago.
test results comparing gil with nogil - these were used to generate the graphs
xpra-videodecode-thread.patch (23.1 KB) - added by Antoine Martin 8 years ago.
this moves picture decoding to a separate thread

Download all attachments as: .zip

Change History (45)

comment:1 Changed 8 years ago by Antoine Martin

Description: modified (diff)
Status: newassigned

comment:2 Changed 8 years ago by ahuillet

Description: modified (diff)

comment:3 Changed 8 years ago by Antoine Martin

Description: modified (diff)

comment:4 Changed 8 years ago by Antoine Martin

Description: modified (diff)

comment:5 Changed 8 years ago by Antoine Martin

Description: modified (diff)

Changed 8 years ago by Antoine Martin

Attachment: xpra-x264-no-gil.patch added

allows us to drop the gil when calling the compress part of x264 encoding

comment:6 Changed 8 years ago by Norman Rasmussen

btw, in your current patch you should call free_csc_image at the end of compress_image instead of x264_picture_clean, otherwise you'll leak memory.

Changed 8 years ago by Antoine Martin

Attachment: xpra-x264-no-gil-v2.patch added

update patch with memleak free_csc_image fix and whitespace

comment:7 Changed 8 years ago by Antoine Martin

applied to both x264 and vpx in r877

Unfortunately, testing revealed that rather than decreasing the average latency, this change actually increased it!
That's because xpra is now capable of streaming more frames to the client (more than doubling the fps in some cases), despite this the maximum latency is actually the same or lower which is good.

Something else must be causing this high latency. See xmatrix test with x264 for example: the latency is as high as 0.4s with an average of 0.1s, whereas other tests are more stable with an average around 0.02s to 0.03s.

comment:8 Changed 8 years ago by ahuillet

As far as I can see, the time it takes to encode a single frame has increased dramatically with this change. On my test case, encoding a frame went from about 50 milliseconds for CSC + compression to about 160 milliseconds (15 for CSC, the rest for compression), this with the very same quality settings.

So it seems that this change actually increased the time it takes to compress an image (time spent doing CSC + compression).

comment:9 Changed 8 years ago by Antoine Martin

Owner: changed from ahuillet to Antoine Martin
Status: assignedaccepted

We are talking cross-purpose here...

When I am talking about latency, I am talking about how long it takes for the server to respond to a ping packet. That is, how long before the main thread can run unimpeded. As mentioned above, this is important for interactivity (responding to keyboard and mouse events, processing damage requests - even if just batching them, etc)

What you are talking about is the elapsed time during the compression. Which affects how long it takes for a frame to reach the client (but.. see below)

Since we have a number of threads competing for the CPU in the server, it seems very likely that the compression was sharing the CPU with other tasks. This is what allows the new code to push more FPS (in all cases so far). There is nothing to worry about unless this also causes the overall performance to decrease, or if the "pixel-latency" (the time it takes for a screen update to reach the client). More parellelism means that the pixel-latency may not have increased, or if it has, that's at least partly because of the higher FPS.
In fact you can revert back to the old (blocking the gil) behaviour simply by removing the line: with nogil:. The performance will go back to what we had before.

The trade-off is one of FPS vs "pixel-latency".
There are things we can do with this:

  • offer the option to choose one over the other
  • cap the FPS and favour "pixel-latency" when the threshold is reached?
  • both? or?..

comment:10 Changed 8 years ago by ahuillet

150ms to encode a frame is unacceptable. You need to encode a frame in 50ms or less in interactive cases.

I understand you're talking about a tradeoff (though I'm not sure where it lies exactly), but in my opinion:

  • we should favor "pixel-latency" over FPS because the first one is easily noticed by users and the second one will depend on bandwidth in many real world cases
  • anything above 30FPS is useless

comment:11 Changed 8 years ago by Antoine Martin

I think you may be missing the point on this supposed tradeoff (apologies if not): the new measured time includes the time it takes to do other things because of the new opportunity for parallelism. Those things are necessary and will happen for each frame, whether they are run in parallel with operations on other frames or not. Was your test running at the same FPS or faster with the change? A higher FPS will lead to more contention. This is what I meant when I said about the test results: increased FPS, lower server "interactivity-latency".

A better measure would be the time from the damage request reception (itself delayed when we hold the gil... but we have to start somewhere, and this is the earliest we can measure) until the frame is sent by the network layer. I will try to cook something so we can have a end-to-end measurements.
I suspect that this is actually unchanged, if not better.

comment:12 Changed 8 years ago by ahuillet

I did the following test: I'm measuring the time it takes to encode a frame. And just this time. Latency is going to be >= to this time, as far as I know, because the time it takes for an update to reach the user's eyes is necessarily greater than the time it takes to encode the image of the update.
So when this value goes from 50 to 150ms, I think it proves something is wrong, does it not?

You're right about the "better" measurement you're mentioning - though my own test results are backed up by the (biased...) impression that playing FreedroidRPG lost *a lot* in terms of interactivity (it used to be playable, it isn't any longer at 40 updates per second).

comment:13 Changed 8 years ago by Antoine Martin

As discussed on IRC, all of the above is true. But the important factor is that the new code is capable of pushing more frames, which increases contention.
When running at the same FPS, I am convinced the new code is at least as fast the old one.
Let's add measurements and do some tests!

comment:14 Changed 8 years ago by Antoine Martin

r927:

  • releases the gil for client-side csc code
  • fixes compile time warnings due to casts

comment:15 Changed 8 years ago by Antoine Martin

This was recorded in #132 but really belongs here: see r886 commit message on how we now report the time it takes for a damage event to go from the X11 server event queue to an outgoing network packet.

New work item: keep x264 encoder spec when resizing. If we had changed the compression parameters, we should keep the values.

comment:16 Changed 8 years ago by Antoine Martin

A new one:

xpra/x264/x264lib.c: In function ‘init_decoder’:
xpra/x264/x264lib.c:105:2: warning: ‘avcodec_open’ is deprecated (declared at /usr/include/ffmpeg/libavcodec/avcodec.h:4155) [-Wdeprecated-declarations]
gcc -pthread -shared -Wl,-z,relro build/temp.linux-x86_64-2.7/xpra/x264/codec.o build/temp.linux-x86_64-2.7/xpra/x264/x264lib.o -L. -lx264 -lavcodec -lpthread -lswscale -lpython2.7 -o build/lib.linux-x86_64-2.7/xpra/x264/codec.so -Wall

Surely there must be a newer method we can use if this one is deprecated?

Changed 8 years ago by Antoine Martin

Attachment: xpra-x264-heisenbug.patch added

adds debugging to the codepaths used by x264 in the client

comment:17 Changed 8 years ago by Antoine Martin

As of r1127 we have a crash on win32.

Use the attached patch and "-d all" to get the last log output before the crash... if you can get the crash with the patch...

comment:18 Changed 8 years ago by Antoine Martin

Eventually crashed with "The application failed to initialize properly (0xc000012d). Click on OK to terminate the application." (dwwin.exe - Application Error)

comment:19 Changed 8 years ago by Antoine Martin

memleak from comments above fixed in r1128 by not using the buggy win32 API.

comment:20 Changed 8 years ago by Antoine Martin

And running in gdb I got this little beauty:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe58b7fe700 (LWP 21035)]
0x0000003efe07d403 in _int_malloc () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003efe07d403 in _int_malloc () from /lib64/libc.so.6
#1  0x0000003efe07fcb3 in malloc () from /lib64/libc.so.6
#2  0x0000003f3462b98d in x264_malloc () from /lib64/libx264.so.120
#3  0x0000003f3462341a in x264_frame_pop_unused () from /lib64/libx264.so.120
#4  0x0000003f3468ab4e in x264_encoder_encode () from /lib64/libx264.so.120
#5  0x00007fe5943e1a36 in compress_image (ctx=<optimized out>, pic_in=pic_in@entry=0x7fe578f9d3a0, out=out@entry=0x7fe58b7fca18, outsz=outsz@entry=0x7fe58b7fca14)
    at xpra/x264/x264lib.c:166
#6  0x00007fe5943df32e in __pyx_f_4xpra_4x264_5codec_7Encoder_do_compress_image (__pyx_v_self=0x24c2fd0, __pyx_v_pic_in=0x7fe578f9d3a0) at xpra/x264/codec.c:2492
#7  0x00007fe5943e0599 in __pyx_pf_4xpra_4x264_5codec_7Encoder_4compress_image (__pyx_v_rowstride=<optimized out>, __pyx_v_input=<optimized out>, __pyx_v_self=0x24c2fd0)
    at xpra/x264/codec.c:2416
#8  __pyx_pw_4xpra_4x264_5codec_7Encoder_5compress_image (__pyx_v_self=0x24c2fd0, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at xpra/x264/codec.c:2308
#9  0x000000375a2dd0f1 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#11 0x000000375a2dc123 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#13 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#14 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#17 0x000000375a26d8c7 in ?? () from /lib64/libpython2.7.so.1.0
#18 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#19 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#20 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#23 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#24 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#25 0x000000375a258090 in ?? () from /lib64/libpython2.7.so.1.0
#26 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#27 0x000000375a2d71d7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#28 0x000000375a30b442 in ?? () from /lib64/libpython2.7.so.1.0
#29 0x0000003efe807d14 in start_thread () from /lib64/libpthread.so.0
#30 0x0000003efe0f199d in clone () from /lib64/libc.so.6
(gdb) 

comment:21 Changed 8 years ago by Antoine Martin

and by shrinking the glxgears window to 0x0, I get this one:

[Switching to Thread 0x7fa3ae0f3700 (LWP 22048)]
0x0000003efe035965 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003efe035965 in raise () from /lib64/libc.so.6
#1  0x0000003efe037118 in abort () from /lib64/libc.so.6
#2  0x0000003efe07568b in __libc_message () from /lib64/libc.so.6
#3  0x0000003efe07c80e in _int_free () from /lib64/libc.so.6
#4  0x0000003f3462bb1d in x264_picture_clean () from /lib64/libx264.so.120
#5  0x00007fa3b28fca52 in free_csc_image (image=0x7fa398adb260) at xpra/x264/x264lib.c:147
#6  compress_image (ctx=<optimized out>, pic_in=pic_in@entry=0x7fa398adb260, out=out@entry=0x7fa3ae0f1a18, outsz=outsz@entry=0x7fa3ae0f1a14) at xpra/x264/x264lib.c:177
#7  0x00007fa3b28fa32e in __pyx_f_4xpra_4x264_5codec_7Encoder_do_compress_image (__pyx_v_self=0x2a308a0, __pyx_v_pic_in=0x7fa398adb260) at xpra/x264/codec.c:2492
#8  0x00007fa3b28fb599 in __pyx_pf_4xpra_4x264_5codec_7Encoder_4compress_image (__pyx_v_rowstride=<optimized out>, __pyx_v_input=<optimized out>, __pyx_v_self=0x2a308a0)
    at xpra/x264/codec.c:2416
#9  __pyx_pw_4xpra_4x264_5codec_7Encoder_5compress_image (__pyx_v_self=0x2a308a0, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at xpra/x264/codec.c:2308
#10 0x000000375a2dd0f1 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x000000375a2dc123 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#13 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#14 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#15 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#18 0x000000375a26d8c7 in ?? () from /lib64/libpython2.7.so.1.0
#19 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#20 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#23 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#24 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#25 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x000000375a258090 in ?? () from /lib64/libpython2.7.so.1.0
#27 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#28 0x000000375a2d71d7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#29 0x000000375a30b442 in ?? () from /lib64/libpython2.7.so.1.0
#30 0x0000003efe807d14 in start_thread () from /lib64/libpthread.so.0
#31 0x0000003efe0f199d in clone () from /lib64/libc.so.6
(gdb) 

comment:22 Changed 8 years ago by Antoine Martin

When resizing to zero height:

x264 [error]: invalid width x height (292x0)
[swscaler @ 0x7f4b385c3e40] 292x0 -> 292x0 is invalid scaling dimension
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed

comment:23 Changed 8 years ago by Antoine Martin

which seems to cause this too:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4b3ffff700 (LWP 5472)]

#0  0x0000003f3468d1c9 in x264_encoder_close () from /lib64/libx264.so.120
#1  0x00007f4b50f7f816 in clean_encoder (ctx=<optimized out>) at xpra/x264/x264lib.c:81
#2  0x00007f4b50f7ca32 in __pyx_pf_4xpra_4x264_5codec_7Encoder_2clean (__pyx_v_self=<optimized out>)
    at xpra/x264/codec.c:2226
#3  __pyx_pw_4xpra_4x264_5codec_7Encoder_3clean (__pyx_v_self=<xpra.x264.codec.Encoder at remote 0x25458a0>, 
    unused=<optimized out>) at xpra/x264/codec.c:2190
#4  0x000000375a2dce46 in call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffdc58)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4082
#5  PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38001850, for file /usr/lib64/python2.7/site-packages/xpra/server_source.py, line 1362, in video_encode (self=<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#6  0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=<optimized out>, argcount=argcount@entry=11, kws=0x241c0b0, kwcount=0, defs=0x0, defcount=0, 
    closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#7  0x000000375a2dc123 in fast_function (nk=<optimized out>, na=11, n=<optimized out>, pp_stack=
    0x7f4b3fffdf08, func=<function at remote 0x2420b90>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4194
#8  call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffdf08)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#9  PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38001570, for file /usr/lib64/python2.7/site-packages/xpra/server_source.py, line 1320, in make_data_packet (self=<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, ...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#10 0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=args@entry=0x24d6388, argcount=13, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=
    0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#11 0x000000375a26d7b6 in function_call (func=<function at remote 0x2420b18>, arg=
    (<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x255f090>}, _last_client_delta=(0, 0), _client_latency=<collections.deque at remote 0x...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.3/Objects/funcobject.c:526
#12 0x000000375a249a7e in PyObject_Call (func=func@entry=<function at remote 0x2420b18>, arg=<optimized out>, 
    kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#13 0x000000375a2d9efb in ext_do_call (nk=0, na=<optimized out>, flags=<optimized out>, pp_stack=
    0x7f4b3fffe288, func=<function at remote 0x2420b18>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4411
#14 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38000ef0, for file /usr/lib64/python2.7/site-packages/xpra/server_source.py, line 1200, in data_to_packet (self=<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, ma...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2779
#15 0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=args@entry=0x25415e8, argcount=1, kws=kws@entry=0x7f4b5a7e7068, kwcount=kwcount@entry=0, 
    defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#16 0x000000375a26d8c7 in function_call (func=<function at remote 0x24209b0>, arg=
    (<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x255f090>}, _last_client_delta=(0, 0), _client_latency=<collections.deque at remote 0x...(truncated), kw={}) at /usr/src/debug/Python-2.7.3/Objects/funcobject.c:526
#17 0x000000375a249a7e in PyObject_Call (func=func@entry=<function at remote 0x24209b0>, arg=<optimized out>, 
    kw=kw@entry={}) at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#18 0x000000375a2d9efb in ext_do_call (nk=0, na=<optimized out>, flags=<optimized out>, pp_stack=
    0x7f4b3fffe608, func=<function at remote 0x24209b0>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4411
#19 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2779
#20 0x000000375a2dcd61 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=
    0x7f4b3fffe808, func=<function at remote 0x7f4b52fcded8>)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184
#21 call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffe808)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#22 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#23 0x000000375a2dcd61 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=
    0x7f4b3fffea08, func=<function at remote 0x7f4b52fcf0c8>)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184
#24 call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffea08)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#25 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38000910, for file /usr/lib64/python2.7/threading.py, line 524, in __bootstrap (self=<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__v...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#26 0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=args@entry=0x2541568, argcount=1, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=
    0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#27 0x000000375a26d7b6 in function_call (func=<function at remote 0x7f4b52fcdf50>, arg=
    (<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at remote 0x2...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.3/Objects/funcobject.c:526
#28 0x000000375a249a7e in PyObject_Call (func=func@entry=<function at remote 0x7f4b52fcdf50>, arg=arg@entry=
    (<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at remote 0x2...(truncated), kw=kw@entry=0x0)
    at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#29 0x000000375a258090 in instancemethod_call (func=<function at remote 0x7f4b52fcdf50>, arg=
    (<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at remote 0x2...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.3/Objects/classobject.c:2578
#30 0x000000375a249a7e in PyObject_Call (func=func@entry=<instancemethod at remote 0x25349b0>, arg=arg@entry=
    (), kw=<optimized out>) at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#31 0x000000375a2d71d7 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x25349b0>, arg=(), 
    kw=<optimized out>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3967
#32 0x000000375a30b442 in t_bootstrap (boot_raw=0x251e820)
    at /usr/src/debug/Python-2.7.3/Modules/threadmodule.c:614
#33 0x0000003efe807d14 in start_thread () from /lib64/libpthread.so.0
#34 0x0000003efe0f199d in clone () from /lib64/libc.so.6

comment:24 Changed 8 years ago by Antoine Martin

  • zero height fixed in r1139
  • error during cleanup fixed in r1137

comment:25 Changed 8 years ago by Antoine Martin

r1140 uses lossless encoding for small regions and one pixel wide/high windows

comment:26 Changed 8 years ago by Antoine Martin

Description: modified (diff)

comment:27 Changed 8 years ago by Antoine Martin

Owner: changed from Antoine Martin to ahuillet
Status: acceptedassigned

More important changes:

  • r1194 uses absolute values for changing encoding speed
  • r1196 hooks the set_encoding_speed function into the calculate_batch_delay
  • r1190 adds set_encoding_quality and implements almost everything needed for lossless auto refresh

What is still needed:

  • x264 code to override the current quality settings for the current frame: TODO code section (we may want to override the encoding speed too - would be fairly cheap/easy to do if the API allows it) - from the docs of x264_encoder_reconfig: "if the change should apply to some particular frame, use x264_picture_t->param instead."
  • x264 code to implement set_encoding_quality (with a range of 0 to 100 like set_encoding_speed)

Maybe also:

  • review the heuristics used by set_encoding_speed and set_encoding_quality
  • "add hooks to change color subsampling (currently hardcoded to 4x YUV420)" - please explain how this affects bandwidth/cpu/etc
Last edited 8 years ago by Antoine Martin (previous) (diff)

comment:28 Changed 8 years ago by ahuillet

Overriding quality: done in r1240 (untested)
Setting quality: done in r1241 (untested)

comment:29 Changed 8 years ago by Antoine Martin

You need also r1242 to make the quality range valid (tried it on a hunch), and r1243 to tune it properly.

It works!

comment:30 Changed 8 years ago by ahuillet

Color subsampling is set to 420, which means that we discard three quarters of the chroma information. This makes colored text (such as the red links on Trac) look ugly.
Some users may want us not to discard so much chroma information, and therefore use 422 (half info discarded) or 444 (no info discarded). Obviously, this has an impact on the image size.

422 = 66% of 444 size
420 = 50% of 444 size

The only impact is that + image quality. As long as users don't ask for it, I don't think we should implement it.

comment:31 Changed 8 years ago by Antoine Martin

Can we change this on the fly too?

I mean, we could increase to 422 or even 444 when quality reaches above 80% and 90%, no? (and also be less aggressive with quality and/or take this setting into account when calculating the new quality setting to prevent the yoyo effect)

It looks like this is also used decoder side. So we would need to pass the information with each x264 packet and do what is needed client side.

Last edited 8 years ago by Antoine Martin (previous) (diff)

comment:32 Changed 8 years ago by ahuillet

I don't think there is much interest in changing the color subsampling to be honest. We should stick to 420 except for those windows for which we "know" that 444 is necessary. Those windows might be for example a Trac page in firefox... but in my opinion those are corner cases and I don't think we should deal with them just now.

comment:33 Changed 8 years ago by Antoine Martin

I don't think we can "know" when it's necessary, just that when we find that are not constrained by bandwidth, switching to 444 gives better visual results at very little cost.

comment:34 Changed 8 years ago by ahuillet

"very little" = 50%.

That's my definition for "huge", not "little". :)

comment:35 Changed 8 years ago by Antoine Martin

Misunderstanding here, I said: "when we find that are not constrained by bandwidth..." - using more bandwidth is not an issue, no matter how much more we use. Could be 100% more, it still would not matter.

Changed 8 years ago by Antoine Martin

example of colour subsampling with green on black text - making it hard to read

comment:36 Changed 8 years ago by Antoine Martin

avcodec_open deprecation warning fixed in r1299

comment:37 Changed 8 years ago by Antoine Martin

  • 422 and 444 csc modes done in r1305 (many other goodies in there too), with little visible effect on picture quality (green on black terminal) unfortunately!
  • moved the win32 bits to the win32 dir in r1304

comment:38 Changed 8 years ago by Antoine Martin

Resolution: fixed
Status: assignedclosed

green on black fixed in r1323

That's enough for this ticket, there are other tickets for related issues:

  • OpenGL - see #147 / #150
  • swscale memalign warning - see #159
  • ffmpeg win32 build - see #168
  • user control over quality vs fps tradeoff - #178

comment:39 Changed 8 years ago by Antoine Martin

I have validated the NOGIL change with complete performance tests against 0.5.0 and the results are unequivocal so I added them to svn here (more to come) - and I removed the ability to turn 'nogil' off in r1359

Note: the only slight decrease in performance is with the rss-* screensavers and that is because we have problems with those tests that need fixing separately (high latency/load)

Last edited 8 years ago by Antoine Martin (previous) (diff)

Changed 8 years ago by Antoine Martin

Attachment: gil-vs-nogil.csv added

test results comparing gil with nogil - these were used to generate the graphs

Changed 8 years ago by Antoine Martin

this moves picture decoding to a separate thread

Note: See TracTickets for help on using tickets.