Xpra: Ticket #110: x264 and vpx improvements

#94 added x264 and vpx to trunk for the 0.2.x release.

This ticket tracks remaining improvements which are not required for release but provide improvements we should deal with sooner or later:

releasing the gil: automated testing revealed that x264 and vpx cause a jump in client and server latency. Unfortunately, because we use PyObject_AsReadBuffer to access the python data to encode/decode, we cannot simply add with nogil:. It may be worth trying out multiprocessing, so the encoder/decoder can take advantage of multicore cpus (at the cost of copying the data before processing it..) - mostly done already.
add hooks for rate control (set target bandwidth) - this can be in update_batch_delay
add hooks to change color subsampling (currently hardcoded to 4x YUV420)
check if automatic lossless refresh is needed or if ratecontrol does it for free

fix C compiler warning during build:

xpra/x264\codec.c(1623) : warning C4013: 'change_encoding_speed' undefined; assuming extern returning int

(done)

re-activate encoding latency detection with a smarter mechanism - partly done as part of #153
update to newer versions of x264 since we are using the one build we know that does not crash (see r855) - done
OpenGL - see #147 / #150
swscale memalign warning - see #159
ffmpeg win32 build - see #168

Thu, 19 Apr 2012 06:03:31 GMT - Antoine Martin: status, description changed

status changed from new to assigned
description modified (diff)

Thu, 19 Apr 2012 06:08:04 GMT - ahuillet: description changed

description modified (diff)

Fri, 18 May 2012 07:56:49 GMT - Antoine Martin: description changed

description modified (diff)

Fri, 18 May 2012 08:02:16 GMT - Antoine Martin: description changed

description modified (diff)

Mon, 21 May 2012 11:57:32 GMT - Antoine Martin: description changed

description modified (diff)

Sat, 26 May 2012 11:11:23 GMT - Antoine Martin: attachment set

attachment set to xpra-x264-no-gil.patch

allows us to drop the gil when calling the compress part of x264 encoding

Sat, 26 May 2012 22:35:25 GMT - Norman Rasmussen:

btw, in your current patch you should call free_csc_image at the end of compress_image instead of x264_picture_clean, otherwise you'll leak memory.

Sun, 27 May 2012 14:54:17 GMT - Antoine Martin: attachment set

attachment set to xpra-x264-no-gil-v2.patch

update patch with memleak free_csc_image fix and whitespace

Sun, 27 May 2012 20:01:41 GMT - Antoine Martin:

applied to both x264 and vpx in r877

Unfortunately, testing revealed that rather than decreasing the average latency, this change actually increased it! That's because xpra is now capable of streaming more frames to the client (more than doubling the fps in some cases), despite this the maximum latency is actually the same or lower which is good.

Something else must be causing this high latency. See xmatrix test with x264 for example: the latency is as high as 0.4s with an average of 0.1s, whereas other tests are more stable with an average around 0.02s to 0.03s.

Tue, 29 May 2012 15:16:36 GMT - ahuillet:

As far as I can see, the time it takes to encode a single frame has increased dramatically with this change. On my test case, encoding a frame went from about 50 milliseconds for CSC + compression to about 160 milliseconds (15 for CSC, the rest for compression), this with the very same quality settings.

So it seems that this change actually increased the time it takes to compress an image (time spent doing CSC + compression).

Tue, 29 May 2012 18:08:01 GMT - Antoine Martin: owner, status changed

owner changed from ahuillet to Antoine Martin
status changed from assigned to accepted

We are talking cross-purpose here...

When I am talking about latency, I am talking about how long it takes for the server to respond to a ping packet. That is, how long before the main thread can run unimpeded. As mentioned above, this is important for interactivity (responding to keyboard and mouse events, processing damage requests - even if just batching them, etc)

What you are talking about is the elapsed time during the compression. Which affects how long it takes for a frame to reach the client (but.. see below)

Since we have a number of threads competing for the CPU in the server, it seems very likely that the compression was sharing the CPU with other tasks. This is what allows the new code to push more FPS (in all cases so far). There is nothing to worry about unless this also causes the overall performance to decrease, or if the "pixel-latency" (the time it takes for a screen update to reach the client). More parellelism means that the pixel-latency may not have increased, or if it has, that's at least partly because of the higher FPS. In fact you can revert back to the old (blocking the gil) behaviour simply by removing the line: with nogil:. The performance will go back to what we had before.

The trade-off is one of FPS vs "pixel-latency". There are things we can do with this:

offer the option to choose one over the other
cap the FPS and favour "pixel-latency" when the threshold is reached?
both? or?..

Tue, 29 May 2012 18:17:40 GMT - ahuillet:

150ms to encode a frame is unacceptable. You need to encode a frame in 50ms or less in interactive cases.

I understand you're talking about a tradeoff (though I'm not sure where it lies exactly), but in my opinion:

we should favor "pixel-latency" over FPS because the first one is easily noticed by users and the second one will depend on bandwidth in many real world cases
anything above 30FPS is useless

Tue, 29 May 2012 19:05:34 GMT - Antoine Martin:

I think you may be missing the point on this supposed tradeoff (apologies if not): the new measured time includes the time it takes to do other things because of the new opportunity for parallelism. Those things are necessary and will happen for each frame, whether they are run in parallel with operations on other frames or not. Was your test running at the same FPS or faster with the change? A higher FPS will lead to more contention. This is what I meant when I said about the test results: increased FPS, lower server "interactivity-latency".

A better measure would be the time from the damage request reception (itself delayed when we hold the gil... but we have to start somewhere, and this is the earliest we can measure) until the frame is sent by the network layer. I will try to cook something so we can have a end-to-end measurements. I suspect that this is actually unchanged, if not better.

Tue, 29 May 2012 19:25:24 GMT - ahuillet:

I did the following test: I'm measuring the time it takes to encode a frame. And just this time. Latency is going to be >= to this time, as far as I know, because the time it takes for an update to reach the user's eyes is necessarily greater than the time it takes to encode the image of the update. So when this value goes from 50 to 150ms, I think it proves something is wrong, does it not?

You're right about the "better" measurement you're mentioning - though my own test results are backed up by the (biased...) impression that playing FreedroidRPG lost *a lot* in terms of interactivity (it used to be playable, it isn't any longer at 40 updates per second).

Tue, 29 May 2012 19:52:03 GMT - Antoine Martin:

As discussed on IRC, all of the above is true. But the important factor is that the new code is capable of pushing more frames, which increases contention. When running at the same FPS, I am convinced the new code is at least as fast the old one. Let's add measurements and do some tests!

Sat, 16 Jun 2012 16:25:25 GMT - Antoine Martin:

r927:

releases the gil for client-side csc code
fixes compile time warnings due to casts

Mon, 25 Jun 2012 12:21:07 GMT - Antoine Martin:

This was recorded in #132 but really belongs here: see r886 commit message on how we now report the time it takes for a damage event to go from the X11 server event queue to an outgoing network packet.

New work item: keep x264 encoder spec when resizing. If we had changed the compression parameters, we should keep the values.

Tue, 03 Jul 2012 06:57:03 GMT - Antoine Martin:

A new one:

xpra/x264/x264lib.c: In function ‘init_decoder’:
xpra/x264/x264lib.c:105:2: warning: ‘avcodec_open’ is deprecated (declared at /usr/include/ffmpeg/libavcodec/avcodec.h:4155) [-Wdeprecated-declarations]
gcc -pthread -shared -Wl,-z,relro build/temp.linux-x86_64-2.7/xpra/x264/codec.o build/temp.linux-x86_64-2.7/xpra/x264/x264lib.o -L. -lx264 -lavcodec -lpthread -lswscale -lpython2.7 -o build/lib.linux-x86_64-2.7/xpra/x264/codec.so -Wall

Surely there must be a newer method we can use if this one is deprecated?

Wed, 18 Jul 2012 16:56:08 GMT - Antoine Martin: attachment set

attachment set to xpra-x264-heisenbug.patch

adds debugging to the codepaths used by x264 in the client

Wed, 18 Jul 2012 16:56:35 GMT - Antoine Martin:

As of r1127 we have a crash on win32.

Use the attached patch and "-d all" to get the last log output before the crash... if you can get the crash with the patch...

Wed, 18 Jul 2012 16:58:37 GMT - Antoine Martin:

Eventually crashed with "The application failed to initialize properly (0xc000012d). Click on OK to terminate the application." (dwwin.exe - Application Error)

Wed, 18 Jul 2012 18:24:50 GMT - Antoine Martin:

memleak from comments above fixed in r1128 by not using the buggy win32 API.

Wed, 18 Jul 2012 19:29:46 GMT - Antoine Martin:

And running in gdb I got this little beauty:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe58b7fe700 (LWP 21035)]
0x0000003efe07d403 in _int_malloc () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003efe07d403 in _int_malloc () from /lib64/libc.so.6
#1  0x0000003efe07fcb3 in malloc () from /lib64/libc.so.6
#2  0x0000003f3462b98d in x264_malloc () from /lib64/libx264.so.120
#3  0x0000003f3462341a in x264_frame_pop_unused () from /lib64/libx264.so.120
#4  0x0000003f3468ab4e in x264_encoder_encode () from /lib64/libx264.so.120
#5  0x00007fe5943e1a36 in compress_image (ctx=<optimized out>, pic_in=pic_in@entry=0x7fe578f9d3a0, out=out@entry=0x7fe58b7fca18, outsz=outsz@entry=0x7fe58b7fca14)
    at xpra/x264/x264lib.c:166
#6  0x00007fe5943df32e in __pyx_f_4xpra_4x264_5codec_7Encoder_do_compress_image (__pyx_v_self=0x24c2fd0, __pyx_v_pic_in=0x7fe578f9d3a0) at xpra/x264/codec.c:2492
#7  0x00007fe5943e0599 in __pyx_pf_4xpra_4x264_5codec_7Encoder_4compress_image (__pyx_v_rowstride=<optimized out>, __pyx_v_input=<optimized out>, __pyx_v_self=0x24c2fd0)
    at xpra/x264/codec.c:2416
#8  __pyx_pw_4xpra_4x264_5codec_7Encoder_5compress_image (__pyx_v_self=0x24c2fd0, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at xpra/x264/codec.c:2308
#9  0x000000375a2dd0f1 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#11 0x000000375a2dc123 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#13 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#14 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#17 0x000000375a26d8c7 in ?? () from /lib64/libpython2.7.so.1.0
#18 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#19 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#20 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#23 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#24 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#25 0x000000375a258090 in ?? () from /lib64/libpython2.7.so.1.0
#26 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#27 0x000000375a2d71d7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#28 0x000000375a30b442 in ?? () from /lib64/libpython2.7.so.1.0
#29 0x0000003efe807d14 in start_thread () from /lib64/libpthread.so.0
#30 0x0000003efe0f199d in clone () from /lib64/libc.so.6
(gdb)

Wed, 18 Jul 2012 19:43:12 GMT - Antoine Martin:

and by shrinking the glxgears window to 0x0, I get this one:

[Switching to Thread 0x7fa3ae0f3700 (LWP 22048)]
0x0000003efe035965 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003efe035965 in raise () from /lib64/libc.so.6
#1  0x0000003efe037118 in abort () from /lib64/libc.so.6
#2  0x0000003efe07568b in __libc_message () from /lib64/libc.so.6
#3  0x0000003efe07c80e in _int_free () from /lib64/libc.so.6
#4  0x0000003f3462bb1d in x264_picture_clean () from /lib64/libx264.so.120
#5  0x00007fa3b28fca52 in free_csc_image (image=0x7fa398adb260) at xpra/x264/x264lib.c:147
#6  compress_image (ctx=<optimized out>, pic_in=pic_in@entry=0x7fa398adb260, out=out@entry=0x7fa3ae0f1a18, outsz=outsz@entry=0x7fa3ae0f1a14) at xpra/x264/x264lib.c:177
#7  0x00007fa3b28fa32e in __pyx_f_4xpra_4x264_5codec_7Encoder_do_compress_image (__pyx_v_self=0x2a308a0, __pyx_v_pic_in=0x7fa398adb260) at xpra/x264/codec.c:2492
#8  0x00007fa3b28fb599 in __pyx_pf_4xpra_4x264_5codec_7Encoder_4compress_image (__pyx_v_rowstride=<optimized out>, __pyx_v_input=<optimized out>, __pyx_v_self=0x2a308a0)
    at xpra/x264/codec.c:2416
#9  __pyx_pw_4xpra_4x264_5codec_7Encoder_5compress_image (__pyx_v_self=0x2a308a0, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at xpra/x264/codec.c:2308
#10 0x000000375a2dd0f1 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x000000375a2dc123 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#13 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#14 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#15 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#18 0x000000375a26d8c7 in ?? () from /lib64/libpython2.7.so.1.0
#19 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#20 0x000000375a2d9efb in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x000000375a2dcd61 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#23 0x000000375a2ddb2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#24 0x000000375a26d7b6 in ?? () from /lib64/libpython2.7.so.1.0
#25 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x000000375a258090 in ?? () from /lib64/libpython2.7.so.1.0
#27 0x000000375a249a7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#28 0x000000375a2d71d7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#29 0x000000375a30b442 in ?? () from /lib64/libpython2.7.so.1.0
#30 0x0000003efe807d14 in start_thread () from /lib64/libpthread.so.0
#31 0x0000003efe0f199d in clone () from /lib64/libc.so.6
(gdb)

Thu, 19 Jul 2012 16:37:11 GMT - Antoine Martin:

When resizing to zero height:

x264 [error]: invalid width x height (292x0)
[swscaler @ 0x7f4b385c3e40] 292x0 -> 292x0 is invalid scaling dimension
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed
error processing damage data: colourspace conversion failed
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1200, in data_to_packet
    packet = self.make_data_packet(*item)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1320, in make_data_packet
    data = self.video_encode(x264_encoders, x264Encoder, wid, x, y, w, h, coding, data, rowstride)
  File "/usr/lib64/python2.7/site-packages/xpra/server_source.py", line 1381, in video_encode
    err, _, data = encoder.compress_image(data, rowstride)
  File "codec.pyx", line 174, in xpra.x264.codec.Encoder.compress_image (xpra/x264/codec.c:2404)
AssertionError: colourspace conversion failed

Thu, 19 Jul 2012 16:39:55 GMT - Antoine Martin:

which seems to cause this too:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4b3ffff700 (LWP 5472)]
#0  0x0000003f3468d1c9 in x264_encoder_close () from /lib64/libx264.so.120
#1  0x00007f4b50f7f816 in clean_encoder (ctx=<optimized out>) at xpra/x264/x264lib.c:81
#2  0x00007f4b50f7ca32 in __pyx_pf_4xpra_4x264_5codec_7Encoder_2clean (__pyx_v_self=<optimized out>)
    at xpra/x264/codec.c:2226
#3  __pyx_pw_4xpra_4x264_5codec_7Encoder_3clean (__pyx_v_self=<xpra.x264.codec.Encoder at remote 0x25458a0>,
    unused=<optimized out>) at xpra/x264/codec.c:2190
#4  0x000000375a2dce46 in call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffdc58)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4082
#5  PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38001850, for file /usr/lib64/python2.7/site-packages/xpra/server_source.py, line 1362, in video_encode (self=<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#6  0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=<optimized out>, argcount=argcount@entry=11, kws=0x241c0b0, kwcount=0, defs=0x0, defcount=0,
    closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#7  0x000000375a2dc123 in fast_function (nk=<optimized out>, na=11, n=<optimized out>, pp_stack=
    0x7f4b3fffdf08, func=<function at remote 0x2420b90>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4194
#8  call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffdf08)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#9  PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38001570, for file /usr/lib64/python2.7/site-packages/xpra/server_source.py, line 1320, in make_data_packet (self=<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, ...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#10 0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=args@entry=0x24d6388, argcount=13, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=
    0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#11 0x000000375a26d7b6 in function_call (func=<function at remote 0x2420b18>, arg=
    (<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x255f090>}, _last_client_delta=(0, 0), _client_latency=<collections.deque at remote 0x...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.3/Objects/funcobject.c:526
#12 0x000000375a249a7e in PyObject_Call (func=func@entry=<function at remote 0x2420b18>, arg=<optimized out>,
    kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#13 0x000000375a2d9efb in ext_do_call (nk=0, na=<optimized out>, flags=<optimized out>, pp_stack=
    0x7f4b3fffe288, func=<function at remote 0x2420b18>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4411
#14 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38000ef0, for file /usr/lib64/python2.7/site-packages/xpra/server_source.py, line 1200, in data_to_packet (self=<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, ma...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2779
#15 0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=args@entry=0x25415e8, argcount=1, kws=kws@entry=0x7f4b5a7e7068, kwcount=kwcount@entry=0,
    defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#16 0x000000375a26d8c7 in function_call (func=<function at remote 0x24209b0>, arg=
    (<ServerSource(_encoding='x264', _client_decode_time={1: <collections.deque at remote 0x254fa60>, 2: <collections.deque at remote 0x254fc20>}, _closed=False, _sequence=319, _damage_in_latency=<collections.deque at remote 0x254f520>, _batch_configs={1: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=1, last_updated=<float at remote 0x2631b00>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x2635768>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x2541e50>, 2: <DamageBatchConfig(recalculate_delay=<float at remote 0x23d5cb8>, wid=2, last_updated=<float at remote 0x262ec40>, encoding=None, max_pixels=52428800, always=False, enabled=True, max_delay=15000, delay=<float at remote 0x263bf28>, last_delays=<collections.deque at remote 0x242f130>, time_unit=1, max_events=50, min_delay=5) at remote 0x255f090>}, _last_client_delta=(0, 0), _client_latency=<collections.deque at remote 0x...(truncated), kw={}) at /usr/src/debug/Python-2.7.3/Objects/funcobject.c:526
#17 0x000000375a249a7e in PyObject_Call (func=func@entry=<function at remote 0x24209b0>, arg=<optimized out>,
    kw=kw@entry={}) at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#18 0x000000375a2d9efb in ext_do_call (nk=0, na=<optimized out>, flags=<optimized out>, pp_stack=
    0x7f4b3fffe608, func=<function at remote 0x24209b0>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4411
#19 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2779
#20 0x000000375a2dcd61 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=
    0x7f4b3fffe808, func=<function at remote 0x7f4b52fcded8>)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184
#21 call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffe808)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#22 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#23 0x000000375a2dcd61 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=
    0x7f4b3fffea08, func=<function at remote 0x7f4b52fcf0c8>)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184
#24 call_function (oparg=<optimized out>, pp_stack=0x7f4b3fffea08)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#25 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7f4b38000910, for file /usr/lib64/python2.7/threading.py, line 524, in __bootstrap (self=<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__v...(truncated), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#26 0x000000375a2ddb2f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=
    0x0, args=args@entry=0x2541568, argcount=1, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=
    0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#27 0x000000375a26d7b6 in function_call (func=<function at remote 0x7f4b52fcdf50>, arg=
    (<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at remote 0x2...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.3/Objects/funcobject.c:526
#28 0x000000375a249a7e in PyObject_Call (func=func@entry=<function at remote 0x7f4b52fcdf50>, arg=arg@entry=
    (<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at remote 0x2...(truncated), kw=kw@entry=0x0)
    at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#29 0x000000375a258090 in instancemethod_call (func=<function at remote 0x7f4b52fcdf50>, arg=
    (<Thread(_Thread__ident=139961173014272, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e50>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e50>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e50>) at remote 0x2541b90>, _Thread__name='data_to_packet', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x2338e30>, acquire=<built-in method acquire of thread.lock object at remote 0x2338e30>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x2338e30>) at remote 0x2541b10>) at remote 0x2541ad0>, _Thread__stderr=<file at remote 0x7f4b5a7ff270>, _Thread__target=<instancemethod at remote 0x2534960>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(), _Thread__stopped=False, _Thread__initialized=True) at remote 0x2...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.3/Objects/classobject.c:2578
#30 0x000000375a249a7e in PyObject_Call (func=func@entry=<instancemethod at remote 0x25349b0>, arg=arg@entry=
    (), kw=<optimized out>) at /usr/src/debug/Python-2.7.3/Objects/abstract.c:2529
#31 0x000000375a2d71d7 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x25349b0>, arg=(),
    kw=<optimized out>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3967
#32 0x000000375a30b442 in t_bootstrap (boot_raw=0x251e820)
    at /usr/src/debug/Python-2.7.3/Modules/threadmodule.c:614
#33 0x0000003efe807d14 in start_thread () from /lib64/libpthread.so.0
#34 0x0000003efe0f199d in clone () from /lib64/libc.so.6

Fri, 20 Jul 2012 02:50:17 GMT - Antoine Martin:

zero height fixed in r1139
error during cleanup fixed in r1137

Fri, 20 Jul 2012 03:54:18 GMT - Antoine Martin:

r1140 uses lossless encoding for small regions and one pixel wide/high windows

Mon, 23 Jul 2012 17:24:15 GMT - Antoine Martin: description changed

description modified (diff)

Fri, 27 Jul 2012 17:37:04 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to ahuillet
status changed from accepted to assigned

More important changes:

r1194 uses absolute values for changing encoding speed
r1196 hooks the set_encoding_speed function into the calculate_batch_delay
r1190 adds set_encoding_quality and implements almost everything needed for lossless auto refresh

What is still needed:

x264 code to override the current quality settings for the current frame: TODO code section (we may want to override the encoding speed too - would be fairly cheap/easy to do if the API allows it) - from the docs of x264_encoder_reconfig: "if the change should apply to some particular frame, use x264_picture_t->param instead."
x264 code to implement set_encoding_quality (with a range of 0 to 100 like set_encoding_speed)

Maybe also:

review the heuristics used by set_encoding_speed and set_encoding_quality
"add hooks to change color subsampling (currently hardcoded to 4x YUV420)" - please explain how this affects bandwidth/cpu/etc

Tue, 31 Jul 2012 19:48:32 GMT - ahuillet:

Overriding quality: done in r1240 (untested) Setting quality: done in r1241 (untested)

Tue, 31 Jul 2012 20:51:27 GMT - Antoine Martin:

You need also r1242 to make the quality range valid (tried it on a hunch), and r1243 to tune it properly.

It works!

Wed, 01 Aug 2012 06:32:29 GMT - ahuillet:

Color subsampling is set to 420, which means that we discard three quarters of the chroma information. This makes colored text (such as the red links on Trac) look ugly. Some users may want us not to discard so much chroma information, and therefore use 422 (half info discarded) or 444 (no info discarded). Obviously, this has an impact on the image size.

422 = 66% of 444 size 420 = 50% of 444 size

The only impact is that + image quality. As long as users don't ask for it, I don't think we should implement it.

Wed, 01 Aug 2012 06:41:41 GMT - Antoine Martin:

Can we change this on the fly too?

I mean, we could increase to 422 or even 444 when quality reaches above 80% and 90%, no? (and also be less aggressive with quality and/or take this setting into account when calculating the new quality setting to prevent the yoyo effect)

It looks like this is also used decoder side. So we would need to pass the information with each x264 packet and do what is needed client side.

Wed, 01 Aug 2012 11:25:21 GMT - ahuillet:

I don't think there is much interest in changing the color subsampling to be honest. We should stick to 420 except for those windows for which we "know" that 444 is necessary. Those windows might be for example a Trac page in firefox... but in my opinion those are corner cases and I don't think we should deal with them just now.

Wed, 01 Aug 2012 11:28:42 GMT - Antoine Martin:

I don't think we can "know" when it's necessary, just that when we find that are not constrained by bandwidth, switching to 444 gives better visual results at very little cost.

Wed, 01 Aug 2012 11:32:16 GMT - ahuillet:

"very little" = 50%.

That's my definition for "huge", not "little". :)

Wed, 01 Aug 2012 13:28:50 GMT - Antoine Martin:

Misunderstanding here, I said: "when we find that are not constrained by bandwidth..." - using more bandwidth is not an issue, no matter how much more we use. Could be 100% more, it still would not matter.

Sun, 05 Aug 2012 09:57:03 GMT - Antoine Martin: attachment set

attachment set to xpra-x264-green-on-black.png

example of colour subsampling with green on black text - making it hard to read

Sun, 05 Aug 2012 17:46:21 GMT - Antoine Martin:

avcodec_open deprecation warning fixed in r1299

Mon, 06 Aug 2012 19:19:12 GMT - Antoine Martin:

422 and 444 csc modes done in r1305 (many other goodies in there too), with little visible effect on picture quality (green on black terminal) unfortunately!
moved the win32 bits to the win32 dir in r1304

Wed, 15 Aug 2012 03:28:25 GMT - Antoine Martin: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

green on black fixed in r1323

That's enough for this ticket, there are other tickets for related issues:

OpenGL - see #147 / #150
swscale memalign warning - see #159
ffmpeg win32 build - see #168
user control over quality vs fps tradeoff - #178

Fri, 24 Aug 2012 04:15:28 GMT - Antoine Martin:

I have validated the NOGIL change with complete performance tests against 0.5.0 and the results are unequivocal so I added them to svn here (more to come) - and I removed the ability to turn 'nogil' off in r1359

Note: the only slight decrease in performance is with the rss-* screensavers and that is because we have problems with those tests that need fixing separately (high latency/load)

Fri, 24 Aug 2012 04:54:09 GMT - Antoine Martin: attachment set

attachment set to gil-vs-nogil.csv

test results comparing gil with nogil - these were used to generate the graphs

Fri, 14 Sep 2012 13:09:31 GMT - Antoine Martin: attachment set

attachment set to xpra-videodecode-thread.patch

this moves picture decoding to a separate thread

Sat, 23 Jan 2021 04:45:49 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/110