Xpra: Ticket #902: getting h264 Missing reference picture errors client-side, while page display quality is pixelated and distorted.

Using 0.14.x clients (anywhere from 0.14.20 to 0.14.25, win32, osx, fedora 19/20) against a fedora 20/21 0.14.x server (0.14.20 to 0.14.25) ... I am intermittently seeing h264 errors client side when launching from a command line, which look something like:

[h264 @ 06375a60] Missing reference picture, default is 14
[h264 @ 06375a60] Missing reference picture, default is 14
[h264 @ 06375a60] Missing reference picture, default is 14
[h264 @ 06375a60] Missing reference picture, default is 16
[h264 @ 06375a60] Missing reference picture, default is 16
[h264 @ 06375a60] Missing reference picture, default is 16
...

... and when that error starts displaying, the page refresh paints are often distorted, pixelated, and unreadable.

I have only been able to reproduce this while using a customized webkit browser, and maximizing/restoring the window will fix the problem, though once it has been seen, it will often return - unless I restart the server.

I haven't been able to reproduce it reliably, though it seems to surface more often when I am switching a video to fullscreen, then quickly back than other times... though I also nearly had a reliable reproduction when using the clipboard to copy/paste blocks of about 150K at a time in pretty quick succession. Never quite a reliable repro though.

I have finally managed to get some server side logs with -d encoding enabled, though no smoking gun that I can see. There are a number of things that look odd though.

I'll post the relevant 6 minutes of the 2 hours of logs (even that is not something I'd like to have to pore over again). Here are some of the points that look interesting:

15:30:05  15:30:05,380 update_quality() info={'backlog_factor': 100, 'latency_factor': 181, 'min_quality': 30, 'batch_factor': 72}, quality=81
15:30:05  15:30:05,381 update_speed() info={'damage_latency.abs_factor': 0, 'damage_latency.avg': 18, 'decoding_latency.target': 8000000, 'damage_latency.target': 35, 'min_speed': 0, 'frame_delay': 17, 'decoding_latency.factor': 135, 'damage_latency.ref': 35, 'damage_latency.rel_factor': 48, 'low_limit': 801024}, speed=95
15:30:05  15:30:05,381 update_encoding_options(False) want_alpha=False, lossless threshold: 94 / 20, small_as_rgb=149299, get_best_encoding=<bound method WindowVideoSource.get_best_encoding_video of WindowVideoSource(1 : (1043, 768))>
15:30:05  15:30:05,381 update_encoding_options(False) csc_encoder=swscale(BGRX 1040x668 - YUV444P 1040x668), video_encoder=x264_encoder(YUV444P - 1040x668)
15:30:05  15:30:05,384 check_pipeline_score(False) best=(45, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264))
15:30:05  15:30:05,384 check_pipeline_score(False) found better csc encoder: (45, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264))
15:30:05  15:30:05,384 check_pipeline_score(False) found better video encoder: (45, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264))
15:30:05  15:30:05,386 setup_pipeline([(45, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264)), (44, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 1042, 768, codec_spec(x264)), (43, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV422P', (1, 1), 1042, 768, codec_spec(x264)), (42, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV420P', (1, 1), 1042, 768, codec_spec(x264)), (39, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'RGB', (1, 1), 1042, 768, codec_spec(x264)), (27, (1, 1), (1, 1), 1042, 768, codec_spec(cython), 'YUV420P', (1, 1), 1042, 768, codec_spec(x264))], 1043, 768, 'BGRX')
15:30:05  15:30:05,386 setup_pipeline: trying (45, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264))
15:30:05  15:30:05,387 setup_pipeline: csc=swscale(BGRX 1042x768 - YUV444P 1042x768), info={'src_height': 768, 'src_width': 1042, 'dst_height': 768, 'dst_width': 1042, 'version': (3, 1, 101), 'flags': ['SWS_FAST_BILINEAR'], 'src_format': 'BGRX', 'formats': ['RGB', 'BGR', 'XRGB', 'BGRX', 'YUV420P', 'YUV422P', 'YUV444P', 'GBRP'], 'frames': 0L, 'dst_format': 'YUV444P'}, setup took 0.30ms
15:30:05  15:30:05,390 setup_pipeline: video encoder=x264_encoder(YUV444P - 1042x768), info: {'profile': 'high444', 'speed': 95, 'height': 768, 'width': 1042, 'version': 142, 'src_format': 'YUV444P', 'formats': ['YUV422P', 'RGB', 'BGRX', 'BGR', 'YUV420P', 'BGRA', 'YUV444P'], 'frames': 0L, 'quality': 81, 'preset': 'superfast'}, setup took 2.75ms
15:30:05  15:30:05,453 send_delayed_regions: bytes_cost=695744, bytes_threshold=480614, pixel_count=694720
15:30:05  15:30:05,454 send_delayed_regions: using full window update 1043x768 with png
15:30:05  15:30:05,454 process_damage_regions: wid=1, adding png pixel data to queue, elapsed time: 0.5 ms, request rgb time: 0.1 ms
15:30:05  15:30:05,454 make_data_packet: image=XShmImageWrapper(BGRX: 0, 0, 1043, 768), damage data: (1, 0, 0, 1043, 768, 'png')
15:30:05  15:30:05,508 sending 1043x768 BGRX as png, mode=RGB, options={'compress_level': 3}
15:30:05  15:30:05,558 packet decoding sequence 40873 for window 1 1x668 took 24.0ms
15:30:05  15:30:05,558 send_delayed for wid 1, delaying again because of backlog: 34 packets, batch delay is 14.1015819536, elapsed time is 257.0 ms
15:30:05  15:30:05,812 packet decoding sequence 40874 for window 1 1040x668 took 277.0ms
15:30:05  15:30:05,812 send_delayed for wid 1, delaying again because of backlog: 36 packets, batch delay is 14.1015819536, elapsed time is 511.0 ms
15:30:05  15:30:05,850 packet decoding sequence 40875 for window 1 1x668 took 114.0ms
15:30:05  15:30:05,850 send_delayed for wid 1, delaying again because of backlog: 35 packets, batch delay is 14.1015819536, elapsed time is 549.0 ms
15:30:05  15:30:05,945 packet decoding sequence 40876 for window 1 1040x668 took 63.0ms
15:30:05  15:30:05,946 send_delayed for wid 1, delaying again because of backlog: 34 packets, batch delay is 14.1015819536, elapsed time is 645.0 ms
15:30:05  15:30:05,974 packet decoding sequence 40877 for window 1 1x668 took 79.0ms
15:30:05  15:30:05,975 send_delayed for wid 1, delaying again because of backlog: 33 packets, batch delay is 14.1015819536, elapsed time is 674.0 ms
15:30:06  15:30:06,018 packet decoding sequence 40878 for window 1 1040x668 took 52.0ms
15:30:06  15:30:06,018 send_delayed for wid 1, delaying again because of backlog: 32 packets, batch delay is 14.1015819536, elapsed time is 717.0 ms
...

That portion seemed to have a delay that overran several attempted writes...

...
15:30:07  15:30:07,071 send_delayed for wid 1, delaying again because of backlog: 6 packets, batch delay is 17.6274487211, elapsed time is 1770.0 ms
15:30:07  15:30:07,155 packet decoding sequence 40905 for window 1 1x668 took 148.0ms
15:30:07  15:30:07,155 send_delayed for wid 1, delaying again because of backlog: 5 packets, batch delay is 17.6274487211, elapsed time is 1854.0 ms
15:30:07  15:30:07,368 packet decoding sequence 40906 for window 1 1040x668 took 226.0ms
15:30:07  15:30:07,369 send_delayed for wid 1, delaying again because of backlog: 4 packets, batch delay is 17.6274487211, elapsed time is 2067.0 ms
15:30:07  15:30:07,383 update_quality() info={'backlog_factor': 100, 'latency_factor': 51, 'min_quality': 30, 'batch_factor': 51}, quality=69
15:30:07  15:30:07,384 update_speed() info={'damage_latency.abs_factor': 0, 'damage_latency.avg': 19, 'decoding_latency.target': 8000000, 'damage_latency.target': 35, 'min_speed': 0, 'frame_delay': 18, 'decoding_latency.factor': 76, 'damage_latency.ref': 35, 'damage_latency.rel_factor': 53, 'low_limit': 801024}, speed=89
15:30:07  15:30:07,384 update_encoding_options(False) want_alpha=False, lossless threshold: 92 / 20, small_as_rgb=110223, get_best_encoding=<bound method WindowVideoSource.get_best_encoding_video of WindowVideoSource(1 : (1043, 768))>
15:30:07  15:30:07,384 update_encoding_options(False) csc_encoder=swscale(BGRX 1042x768 - YUV444P 1042x768), video_encoder=x264_encoder(YUV444P - 1042x768)
15:30:07  15:30:07,388 check_pipeline_score(False) best=(58, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264))
15:30:07  15:30:07,411 packet decoding sequence 40907 for window 1 1x668 took 359.0ms
15:30:07  15:30:07,411 send_delayed for wid 1, delaying again because of backlog: 3 packets, batch delay is 19.7070924366, elapsed time is 2110.0 ms
15:30:07  15:30:07,486 packet decoding sequence 40908 for window 1 1040x668 took 332.0ms
...

Got more of the same about 24 seconds later:

15:30:30  15:30:30,950 send_delayed for wid 1, delaying again because of backlog: 15 packets, batch delay is 10.9151248029, elapsed time is 1413.0 ms
15:30:31  15:30:31,022 packet decoding sequence 40955 for window 1 1042x768 took 165.0ms
15:30:31  15:30:31,022 send_delayed for wid 1, delaying again because of backlog: 14 packets, batch delay is 10.9151248029, elapsed time is 1485.0 ms
15:30:31  15:30:31,026 packet decoding sequence 40956 for window 1 1x768 took 28.0ms
15:30:31  15:30:31,027 send_delayed for wid 1, delaying again because of backlog: 13 packets, batch delay is 10.9151248029, elapsed time is 1489.0 ms
15:30:31  15:30:31,030 update_quality() info={'backlog_factor': 100, 'latency_factor': 55, 'min_quality': 30, 'batch_factor': 72}, quality=71
15:30:31  15:30:31,030 update_speed() info={'damage_latency.abs_factor': 0, 'damage_latency.avg': 16, 'decoding_latency.target': 8000000, 'damage_latency.target': 35, 'min_speed': 0, 'frame_delay': 13, 'decoding_latency.factor': 60, 'damage_latency.ref': 35, 'damage_latency.rel_factor': 39, 'low_limit': 801024}, speed=81
15:30:31  15:30:31,030 update_encoding_options(False) want_alpha=False, lossless threshold: 91 / 20, small_as_rgb=90152, get_best_encoding=<bound method WindowVideoSource.get_best_encoding_video of WindowVideoSource(1 : (1043, 768))>
15:30:31  15:30:31,030 update_encoding_options(False) csc_encoder=None, video_encoder=x264_encoder(BGRX - 1042x768)
15:30:31  15:30:31,033 check_pipeline_score(False) best=(54, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 1042, 768, codec_spec(x264))
15:30:31  15:30:31,124 packet decoding sequence 40957 for window 1 1042x768 took 81.0ms
15:30:31  15:30:31,125 send_delayed for wid 1, delaying again because of backlog: 12 packets, batch delay is 14.1149936069, elapsed time is 1588.0 ms

Some 20 seconds later, it seemed to switch to lossless mode:

15:30:49  15:30:49,527 sending 1043x768 BGRX as png, mode=RGB, options={'compress_level': 3}
15:30:49  15:30:49,829 packet decoding sequence 41005 for window 1 1043x768 took 127.0ms
15:30:50  15:30:50,289 update_quality() info={'backlog_factor': 100, 'latency_factor': 157, 'min_quality': 30, 'batch_factor': 102}, quality=95
15:30:50  15:30:50,290 update_speed() info={'damage_latency.abs_factor': 0, 'damage_latency.avg': 18, 'decoding_latency.target': 8000000, 'damage_latency.target': 35, 'min_speed': 0, 'frame_delay': 14, 'decoding_latency.factor': 47, 'damage_latency.ref': 35, 'damage_latency.rel_factor': 39, 'low_limit': 801024}, speed=71
15:30:50  15:30:50,290 update_encoding_options(False) want_alpha=False, lossless threshold: 89 / 20, small_as_rgb=81715, get_best_encoding=<bound method WindowVideoSource.get_best_encoding_video of WindowVideoSource(1 : (1043, 768))>
15:30:50  15:30:50,290 update_encoding_options(False) csc_encoder=None, video_encoder=x264_encoder(BGRX - 1042x768)
15:30:50  15:30:50,293 check_pipeline_score(False) best=(50, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 1042, 768, codec_spec(x264))
15:30:50  15:30:50,780 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 0, 0, 1043, 29, {})
15:30:50  15:30:50,781 damage(0, 0, 1043, 29, {}) wid=1, scheduling batching expiry for sequence 41006 in 9.0 ms
15:30:50  15:30:50,791 send_delayed for wid 1, batch delay is 9.3, elapsed time is 11.0 ms
15:30:50  15:30:50,791 send_delayed_regions: bytes_cost=31271, bytes_threshold=480614, pixel_count=30247
15:30:50  15:30:50,792 get_best_encoding_video(..) temporarily switching to lossless mode for    30247 pixels: low pixel count
15:30:50  15:30:50,793 process_damage_regions: wid=1, adding rgb24 pixel data to queue, elapsed time: 12.7 ms, request rgb time: 0.8 ms
15:30:50  15:30:50,793 make_data_packet: image=XShmImageWrapper(BGRX: 0, 0, 1043, 29), damage data: (1, 0, 0, 1043, 29, 'rgb24')
15:30:50  15:30:50,793 rgb_encode using level=1, lz4 compressed 1043x29 in rgb24/BGRX: 120988 bytes down to 14825
15:30:50  15:30:50,796 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 893, 4, 113, 20, {})
15:30:50  15:30:50,796 damage(893, 4, 113, 20, {}) wid=1, scheduling batching expiry for sequence 41007 in 9.0 ms
15:30:50  15:30:50,805 send_delayed for wid 1, batch delay is 9.3, elapsed time is 9.0 ms
15:30:50  15:30:50,805 send_delayed_regions: bytes_cost=3284, bytes_threshold=480614, pixel_count=2260
15:30:50  15:30:50,805 get_best_encoding_video(..) temporarily switching to lossless mode for     2260 pixels: low pixel count
15:30:50  15:30:50,806 process_damage_regions: wid=1, adding rgb24 pixel data to queue, elapsed time: 10.8 ms, request rgb time: 0.7 ms
15:30:50  15:30:50,806 make_data_packet: image=XShmImageWrapper(BGRX: 893, 4, 113, 20), damage data: (1, 893, 4, 113, 20, 'rgb24')
15:30:50  15:30:50,807 rgb_encode: BGRX pixels re-stride saving 90% from 4172 (83440 bytes) to 452 (9040 bytes)
15:30:50  15:30:50,807 rgb_encode using level=1, lz4 compressed 113x20 in rgb24/BGRX: 9040 bytes down to 2856

... which it did again some 2 seconds later, and intermittently for a while. And then a couple some seconds later still, it seemed to start using existing delayed h264 bits:

15:30:54  15:30:54,462 send_delayed_regions: bytes_cost=700508, bytes_threshold=480614, pixel_count=695388
15:30:54  15:30:54,463 send_delayed_regions: using full window update 1043x768 with h264
15:30:54  15:30:54,464 process_damage_regions: wid=1, adding h264 pixel data to queue, elapsed time: 79.5 ms, request rgb time: 0.7 ms
15:30:54  15:30:54,464 process_damage_regions: wid=1, adding rgb24 pixel data to queue, elapsed time: 80.3 ms, request rgb time: 0.1 ms
15:30:54  15:30:54,465 make_data_packet: image=XShmImageWrapper(BGRX: 0, 0, 1042, 768), damage data: (1, 0, 0, 1042, 768, 'h264')
15:30:54  15:30:54,466 video_encode('h264', XShmImageWrapper(BGRX: 0, 0, 1042, 768), {})
15:30:54  15:30:54,466 video_encode('h264', XShmImageWrapper(BGRX: 0, 0, 1042, 768), {}) image size: 1042x768, encoder/csc size: 1042x768
15:30:54  15:30:54,474 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 972, 535, 1, 1, {})
15:30:54  15:30:54,475 damage(972, 535, 1, 1, {}) wid=1, scheduling batching expiry for sequence 41059 in 19.0 ms
15:30:54  15:30:54,475 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 972, 575, 1, 1, {})
15:30:54  15:30:54,476 damage(972, 575, 1, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,477 video_encode encoder: h264 1042x768 result is 101035 bytes (71.4 MPixels/s), client options={'quality': 69, 'frame': 51L, 'pts': 25426, 'csc': 'YUV444P', 'speed': 76}
15:30:54  15:30:54,478 make_data_packet: image=XShmImageWrapper(BGRX: 1042, 0, 1, 768), damage data: (1, 1042, 0, 1, 768, 'rgb24')
15:30:54  15:30:54,479 rgb_encode: BGRX pixels re-stride saving 100% from 4172 (3204096 bytes) to 4 (3072 bytes)
15:30:54  15:30:54,479 rgb_encode using level=1, lz4 compressed 1x768 in rgb24/BGRX: 3072 bytes down to 44
15:30:54  15:30:54,486 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 1, 75, 1041, 460, {})
15:30:54  15:30:54,487 damage(1, 75, 1041, 460, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,488 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 1, 535, 971, 1, {})
15:30:54  15:30:54,488 damage(1, 535, 971, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,489 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 973, 535, 69, 1, {})
15:30:54  15:30:54,489 damage(973, 535, 69, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,489 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 1, 536, 1041, 39, {})
15:30:54  15:30:54,490 damage(1, 536, 1041, 39, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,490 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 1, 575, 971, 1, {})
15:30:54  15:30:54,491 damage(1, 575, 971, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,491 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 973, 575, 69, 1, {})
15:30:54  15:30:54,492 damage(973, 575, 69, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,492 damage(WindowModel(0xc00003 - "Co-hosts withdraw from Miss USA pageant over Trump comments - Jun. 30, 2015"), 1, 576, 1041, 167, {})
15:30:54  15:30:54,493 damage(1, 576, 1041, 167, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:30:54  15:30:54,495 send_delayed for wid 1, delaying again because of backlog: 6 packets, batch delay is 19.7330148766, elapsed time is 21.0 ms
15:30:54  15:30:54,537 packet decoding sequence 41049 for window 1 1042x768 took 151.0ms
15:30:54  15:30:54,537 send_delayed for wid 1, delaying again because of backlog: 7 packets, batch delay is 19.7330148766, elapsed time is 63.0 ms
15:30:54  15:30:54,544 packet decoding sequence 41050 for window 1 1x768 took 8.0ms
15:30:54  15:30:54,544 send_delayed for wid 1, delaying again because of backlog: 6 packets, batch delay is 19.7330148766, elapsed time is 70.0 ms
...

But then it started really happening a lot:

15:31:52  15:31:52,017 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 972, 495, 1, 1, {})
15:31:52  15:31:52,018 damage(972, 495, 1, 1, {}) wid=1, scheduling batching expiry for sequence 41664 in 7.0 ms
15:31:52  15:31:52,019 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 972, 535, 1, 1, {})
15:31:52  15:31:52,019 damage(972, 535, 1, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,020 packet decoding sequence 41661 for window 1 1040x668 took 51.0ms
15:31:52  15:31:52,028 send_delayed for wid 1, delaying again because of backlog: 1 packets, batch delay is 7.51134355742, elapsed time is 10.0 ms
15:31:52  15:31:52,036 send_delayed_regions: bytes_cost=2050, bytes_threshold=480614, pixel_count=2
15:31:52  15:31:52,038 process_damage_regions: wid=1, adding rgb24 pixel data to queue, elapsed time: 20.3 ms, request rgb time: 0.8 ms
15:31:52  15:31:52,038 sending 1043x768 BGRX as png, mode=RGB, options={'compress_level': 3}
15:31:52  15:31:52,039 make_data_packet: image=XShmImageWrapper(BGRX: 972, 495, 1, 41), damage data: (1, 972, 495, 1, 41, 'rgb24')
15:31:52  15:31:52,039 rgb_encode: BGRX pixels re-stride saving 100% from 4172 (171052 bytes) to 4 (164 bytes)
15:31:52  15:31:52,039 rgb_encode using level=1, lz4 compressed 1x41 in rgb24/BGRX: 164 bytes down to 75
15:31:52  15:31:52,042 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 972, 455, 1, 1, {})
15:31:52  15:31:52,042 damage(972, 455, 1, 1, {}) wid=1, scheduling batching expiry for sequence 41665 in 7.0 ms
15:31:52  15:31:52,047 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 1, 75, 1041, 380, {})
15:31:52  15:31:52,048 damage(1, 75, 1041, 380, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,048 packet decoding sequence 41662 for window 1 1x668 took 18.0ms
15:31:52  15:31:52,049 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 1, 455, 971, 1, {})
15:31:52  15:31:52,049 damage(1, 455, 971, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,050 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 973, 455, 69, 1, {})
15:31:52  15:31:52,050 damage(973, 455, 69, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,054 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 1, 456, 1041, 39, {})
15:31:52  15:31:52,054 damage(1, 456, 1041, 39, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,054 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 1, 495, 971, 1, {})
15:31:52  15:31:52,054 damage(1, 495, 971, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,055 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 973, 495, 69, 1, {})
15:31:52  15:31:52,055 damage(973, 495, 69, 1, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,055 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 1, 496, 1041, 39, {})
15:31:52  15:31:52,055 damage(1, 496, 1041, 39, {}) wid=1, using existing delayed h264 regions created 0.0ms ago
15:31:52  15:31:52,055 damage(WindowModel(0xc00003 - "Univision CEO defends dropping Trump, denies apologizing - Jun. 29, 2015"), 1, 535, 971, 1, {})
...

I'm not sure which, if any, is at all illustrative. I'll post a screenshot of what the degradation looks like for you if you'd like.

Even when running the client with -d encoding I have not seen any output client side.



Wed, 01 Jul 2015 01:53:02 GMT - alas: attachment set

h264 missing reference picture bug server side logs


Wed, 01 Jul 2015 15:30:57 GMT - Antoine Martin: owner changed

Please include the version info, both from the server and client. (Encoding_info.exe on win32, xpra info on the server. It could be an ffmpeg bug, see ticket:905#comment:1 for example: this bug occurs with ffmpeg versions 2.4 but not 2.6 onwards..

maximizing/restoring the window will fix the problem


Yes, this will clear up the video region and start again with a new encoder context.


[h264 @ 06375a60] Missing reference picture, default is 14


This should have triggered a refresh already, we have code in the server which does this when the client reports a decoding failure:

    def client_decode_error(self, error):
        WindowSource.client_decode_error(self, error)
        #maybe the stream is now corrupted..
        self.cleanup_codecs()

But the problem is that it looks like the error was not caught and therefore not reported back to the server (no stacktrace in the client, and the decoding errors continue from that point onwards). Maybe the decoder gives us a partial frame? So maybe r9784 will help here, I will backport this fix too.

If that's not enough, r9785 adds the ability to see debug level messages out of libav (applies to both avcodec2 and swscale as they both use libav's logging functionality which we now hook into). Use it with -d libav, maybe this will tell us more. (in trunk only)

The problem with the fix in r9784 is that if it does fix things, we are far less likely to investigate the root cause of the problem, since this will just refresh the video pipeline and start again... which will make it look OK, but really isn't: we haven't dealt with the root cause of the error, which is going to be something probably found on the server side.


I haven't been able to reproduce it reliably, though it seems to surface more often when I am switching a video to fullscreen, then quickly back than other times... though I also nearly had a reliable reproduction when using the clipboard to copy/paste blocks of about 150K at a time in pretty quick succession. Never quite a reliable repro though.


Being able to reproduce would really help, I am thinking of adding some environment tunables to make it easier to simulate unusual / stress conditions: delays caused by large clipboard copies, network delays, etc..


15:30:05 15:30:05,454 send_delayed_regions: using full window update 1043x768 with png PNG is bad, but shouldn't be this bad, even on a loaded server: 1043x768 isn't even that big.


Some 20 seconds later, it seemed to switch to lossless mode: 15:30:50 15:30:50,792 get_best_encoding_video(..) temporarily switching to lossless mode for 30247 pixels: low pixel count


It isn't really switching to lossless mode, it is using a lossless encoding for one screen update because the number of pixels is considered "small", this threshold varies based on some rather complicated heuristics. We just use a lossless encoding rather than sending the update twice (once as a lossy jpg or webp, then again quickly after using a lossless encoding as part of the auto-refresh).

r9788 tunes the code to try to make it less likely that we'll use a lossless encoding for something that might be the video region. Note: even if we are using a lossless encoding here, that should not interfere with the h264 stream which will continue next time we decide to use the video encoding (usually because we matched the video region). So this change is most definitely not a fix either - but if the error is easier or harder to hit, it does tell us something about what triggers it.

... which it did again some 2 seconds later, and intermittently for a while.


It will do that whenever something changes on screen which isn't part of the video region (assuming there is one), for example.

And then a couple some seconds later still, it seemed to start using existing delayed h264 bits:}}}


The "delayed regions" is the screen update "batching": it is what allows us to accumulate screen updates together, reducing the processing and allowing us to detect and process the video region as a rectangle (and therefore allows us to use a video encoding) rather than individual chunks.


Even when running the client with -d encoding I have not seen any output client side.


Client side you probably want -d paint.


Unrelated, but added to my TODO list (shows that having logs is always useful):

check_pipeline_score(False) best=(45, (1, 1), (1, 1), 1042, 768, codec_spec(swscale), 'YUV444P', (1, 1), 1042, 768, codec_spec(x264))

We should not be using swscale + x264 with YUV444P: x264 can process BGRX directly, and this is the pipeline option that should have won the scoring. This is a performance hit. (will also need backporting)


Fri, 03 Jul 2015 00:40:42 GMT - alas:

Testing with a windows client 0.14.21 (our port from the 0.14.21 stable) against a 0.14.25 fedora 21 server (again, our port), I was able to get the following clip of client side logs with -d paint (unfortunately, didn't notice when the error started, so the logs are from a middle section of the error):

...

Edit: I've moved this to an attachment: attachment/ticket/902/missing-reference-picture-draw-log.txt

At that point I disconnected to grab the info as best I could. I will attach a server-side xpra info file though... I'm sure there will be some useful stuff in there.


Fri, 03 Jul 2015 00:45:03 GMT - alas: attachment set

xpra info for h264 repro


Mon, 06 Jul 2015 08:04:19 GMT - Antoine Martin: attachment set

moving inline log to attachment


Mon, 06 Jul 2015 08:21:21 GMT - Antoine Martin:

0.14.21 is way too old. You should be running something with at least some of the changes to see if that improves things: trunk later than r9784 or 0.15.x and 0.14.x later than r9826. Remember: this should still cause an error client side, but we should recover from it more gracefully. That's the first step. We still want to fix the server side issue eventually..

Now, I am still missing the client side version info... Though I have some of it from the server info, which does show some very interesting things. Although the server is mostly up to date, the client version information shows:

etc

Which brings this question: did you manage to reproduce this issue with an official xpra.org build? Because that should be the first step, especially when this build is so out of date.


Assuming that the answer is yes, we could also try to force a specific csc mode, to see if somehow it could be related to video pipeline restarts:

XPRA_FORCE_CSC_MODE=YUV420P xpra start --video-encoders=x264

This should ensure that we always use the same pipeline (YUV444PPP + x264).

It would be useful to have the server side "encoding" log, and from the client side, the draw_region lines just before the error.


Wed, 19 Aug 2015 05:22:05 GMT - Antoine Martin: priority, component changed

(raising and fixing component)


Tue, 13 Oct 2015 00:12:35 GMT - alas:

I have not been able to reproduce this with 0.15 or 0.16, and haven't even been able to reproduce on 0.14 without using our customized webkit browser as an app.

Perhaps it is another example of misbehavior of the app, similar to what was dismissed in #967?

I'll leave you to close when ready... but I would mention that I do occasionally still see some mention of h264 issues, but that they seem to sort themselves out considerably faster, and without usually even being able to tell that anything went wrong if the logs weren't running. Did you make a temporary work-around to grab a new reference picture when the client starts reporting this error?


Tue, 13 Oct 2015 00:44:26 GMT - Antoine Martin: priority changed

Perhaps it is another example of misbehavior of the app, similar to what was dismissed in #967?


No, applications cannot cause bugs, only trigger them. The underlying bugs either exist or they don't.

I have not been able to reproduce this with 0.15 or 0.16, and haven't even been able to reproduce on 0.14 without using our customized webkit browser as an app.


Then the bug is present in 0.14, and maybe later versions too - until we figure out what the bug is, it's difficult to know if it is fixed or just not triggering in later versions.


Did you make a temporary work-around to grab a new reference picture when the client starts reporting this error?


All decoding errors are reported to the server, and the server ensures that things are in a clean state before sending the next screen update. For h264, this means dropping all reference frames and starting again from 0. (this same code is also present in 0.14.x ..)

Lowering the priority since 0.15.x onwards doesn't seem to be (as badly?) affected.


Sat, 05 Dec 2015 11:29:27 GMT - Antoine Martin: status changed; resolution set

No update for 8 weeks, closing as I assume that this is not reproducible with the latest builds. Hopefully this is not just a case of the new refresh code hiding the bug. (which should still be visible as an error in the logs)


Sat, 23 Jan 2021 05:09:18 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/902