Hopefully this can help us with hard to trigger bugs and for catching corner cases that are difficult to encounter under normal circumstances. Things like error handlers, etc..
For example:
XPRA_PROTOCOL_FAULT_INJECTION_RATE=1000 xpra ...
will cause the xpra protocol layer to send an invalid packet every 1000 packets - this should cause the other end to hangup
XPRA_WRAPPER_FAULT_INJECTION_RATE=500 xpra ...
does the same thing but for the wrapper layer
More to come, for #902.
r9894 does the same thing for the client, pretending that we failed to paint something (which should trigger a refresh):
$ XPRA_PAINT_FAULT_INJECTION_RATE=100 xpra attach --no-mmap (...) injecting paint fault for h264 packet 100 injecting paint fault for h264 packet 200 ...
r9899 adds the ability to fail to report the failures to the server with:
XPRA_PAINT_FAULT_INJECTION_RATE=100 XPRA_PAINT_FAULT_INJECTION_TELL=0 xpra attach
Which makes the server stop sending window updates and eventually print:
delayed_region_timeout: region is 15025ms old, bad connection? ...
r9824 also added the ability to simulate "hello" initial packet parsing errors with:
XPRA_SIMULATE_SERVER_HELLO_ERROR=1 xpra start ...
Which tests the hello error handling.
This will do for now. As of r10587, all the fault injection environment variables are documented here: wiki/Usage/EnvironmentOptions.
@afarr: the idea here is to make it easier to exercise code which would not normally fire. we want to check:
PS: not doing an xpra control interface as this is too intricate
r10992 also makes it possible to fail individual codecs:
XPRA_CODEC_FAIL_IMPORT=enc_pillow,dec_pillow ./xpra/codecs/loader.py
XPRA_CODEC_FAIL_SELFTEST=enc_pillow,dec_pillow ./xpra/codecs/loader.py
This can be useful for testing different decoders/encoders, in particular disabling pillow support. (see also #1010)
XPRA_CODEC_FAIL_IMPORT=enc_pillow,dec_pillow ./xpra/codecs/loader.py
run on the server (0.16.0 r11031 fedora 21) outputs:
[root@jimador codecs]# XPRA_CODEC_FAIL_IMPORT=enc_pillow,dec_pillow python loader.py codecs and csc modules found: * csc_cython : csc_cython/colorspace_converter.so * csc_opencl : * csc_swscale : csc_swscale/colorspace_converter.so * dec_avcodec2 : dec_avcodec2/decoder.so * dec_pillow : * dec_vpx : vpx/decoder.so * dec_webp : webp/decode.so * enc_pillow : * enc_vpx : vpx/encoder.so * enc_webp : webp/encode.so * enc_x264 : enc_x264/encoder.so * enc_x265 : * nvenc4 : * nvenc5 : codecs versions: * PIL : 3.0.0 * avcodec2 : 56.41.100 * buffer_api : 1 * cython : 1.0.23.1 * dec_pillow : 3.0.0 * dec_webp : 0.4.3 * enc_pillow : 3.0.0 * enc_webp : 0.4.3 * numpy : 1.8.2 * swscale : 3.1.101 * vpx : 1.4.0 * x264 : 146
... presumably as expected.
Likewise, XPRA_CODEC_FAIL_SELFTEST=enc_pillow,dec_pillow ./xpra/codecs/loader.py
outputs:
[root@jimador codecs]# XPRA_CODEC_FAIL_SELFTEST=enc_pillow,dec_pillow python loader.py failed to import Pillow encoder (enc_pillow) codec found in fail selftest list failed to import Pillow decoder (dec_pillow) codec found in fail selftest list codecs and csc modules found: * csc_cython : csc_cython/colorspace_converter.so * csc_opencl : * csc_swscale : csc_swscale/colorspace_converter.so * dec_avcodec2 : dec_avcodec2/decoder.so * dec_pillow : * dec_vpx : vpx/decoder.so * dec_webp : webp/decode.so * enc_pillow : * enc_vpx : vpx/encoder.so * enc_webp : webp/encode.so * enc_x264 : enc_x264/encoder.so * enc_x265 : * nvenc4 : * nvenc5 : codecs versions: * PIL : 3.0.0 * avcodec2 : 56.41.100 * buffer_api : 1 * cython : 1.0.23.1 * dec_pillow : 3.0.0 * dec_webp : 0.4.3 * enc_pillow : 3.0.0 * enc_webp : 0.4.3 * numpy : 1.8.2 * swscale : 3.1.101 * vpx : 1.4.0 * x264 : 146
XPRA_PAINT_FAULT_INJECTION_RATE=100
on client side shows the expected injecting paint fault warnings, but also triggered the following traceback on a 0.16.0 r10983 windows 8.1 client:
2015-10-26 18:19:27,586 draw error Traceback (most recent call last): File "xpra\client\ui_client_base.pyc", line 2475, in _do_draw File "xpra\client\client_window_base.pyc", line 539, in draw_region File "xpra\client\window_backing_base.pyc", line 516, in draw_region File "xpra\client\window_backing_base.pyc", line 301, in paint_rgb32 File "xpra\client\window_backing_base.pyc", line 183, in process_delta AssertionError: delta bucket 2 data does not match: expected (1051, 1, 2078) but got (1051, 1, 2075)
Disconnecting then running with both XPRA_PAINT_FAULT_INJECTION_RATE=100 XPRA_PAINT_FAULT_INJECTION_TELL=0 xpra attach
- I did very quickly get not only the expected 2015-10-26 18:21:03,944 delayed_region_timeout: region is 15041ms old, bad connection?
error (and a black screen in place of browser app contents until the video stopped "playing") - I also got an error that might be of interest:
2015-10-26 18:23:51,826 Error: expiring missing damage acks: [96]
.
Running server with XPRA_SIMULATE_SERVER_HELLO_ERROR=1
, with no encryption or with encryption and password-file and encryption-keyfile triggers errors that I expect are expected:
Server-side:
2015-10-26 18:29:51,038 Connection lost 2015-10-26 18:29:51,250 server error processing new connection from Protocol(tcp socket: 10.0.32.136:2200 <- 10.0.11.162:64264): Simulating a server error Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/server/server_core.py", line 657, in _process_hello raise Exception("Simulating a server error") Exception: Simulating a server error 2015-10-26 18:29:51,251 Disconnecting client '10.0.11.162:64264': 2015-10-26 18:29:51,252 server error (error accepting new connection) 2015-10-26 18:29:51,253 Connection lost
And client side (in yellow, which I take it indicates it is a simulated error?):
2015-10-26 18:29:34,960 server failure: disconnected before the session could be established 2015-10-26 18:29:34,960 server requested disconnect: server error (error accepting new connection) 2015-10-26 18:29:34,994 Connection lost
When terminating the server running with encryption and password-file and encryption-keyfile, with a control-C, I got an unusual traceback:
^C/usr/lib64/python2.7/threading.py:1163: RuntimeWarning: tp_compare didn't return -1 or -2 for exception return _active[_get_ident()] Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/child_reaper.py", line 175, in reap self.poll() File "/usr/lib64/python2.7/site-packages/xpra/child_reaper.py", line 113, in poll self.add_dead_process(procinfo) File "/usr/lib64/python2.7/site-packages/xpra/child_reaper.py", line 154, in add_dead_process cb(procinfo.process) File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 386, in pulseaudio_ended soundlog.warn("Warning: the pulseaudio server process has terminated after %i seconds", int(elapsed)) File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 329, in warn self.log(logging.WARN, msg, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 318, in log global_logging_handler(self.logger.log, level, msg, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 100, in standard_logging log(level, msg, *args, **kwargs) File "/usr/lib64/python2.7/logging/__init__.py", line 1224, in log self._log(level, msg, args, **kwargs) File "/usr/lib64/python2.7/logging/__init__.py", line 1278, in _log record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info, func, extra) File "/usr/lib64/python2.7/logging/__init__.py", line 1252, in makeRecord rv = LogRecord(name, level, fn, lno, msg, args, exc_info, func) File "/usr/lib64/python2.7/logging/__init__.py", line 291, in __init__ self.threadName = threading.current_thread().name File "/usr/lib64/python2.7/threading.py", line 1163, in currentThread return _active[_get_ident()] KeyboardInterrupt
XPRA_PROTOCOL_FAULT_INJECTION_RATE=1000
did indeed cause the client to disconnect with the following output (in yellow):
2015-10-26 18:43:26,890 Received uninterpretable nonsense: invalid packet header: '50726f746f636f6c' read buffer='Protocol JUNK! added by fault injection code' 2015-10-26 18:43:26,890 packet no 369 data: 'Protocol JUNK! added by fault injection code' 2015-10-26 18:43:26,890 sound-sink stopping 2015-10-26 18:43:26,951 Connection lost
Server-side though, I got the fault injection warning in yellow, but then an error in red (though, the yellow does explain that, rather):
2015-10-26 18:43:43,167 injecting fault in Protocol(tcp socket: 10.0.32.136:2200 <- 10.0.11.162:64292) 2015-10-26 18:43:43,874 internal error: read connection tcp socket: 10.0.32.136:2200 <- 10.0.11.162:64292 reset 2015-10-26 18:43:43,874 [Errno 104] Connection reset by peer
XPRA_WRAPPER_FAULT_INJECTION_RATE=500
doesn't do "the same thing but for the wrapper layer" - instead it seems to target the sound channel (?):
Server side:
2015-10-26 18:47:11,597 injecting fault in Protocol(Pipe(sound-source)) 2015-10-26 18:47:11,599 sound-source injecting fault in Protocol(Pipe()) 2015-10-26 18:47:11,602 sound-source gibberish received: 2015-10-26 18:47:11,603 sound-source "invalid packet header: '5772617070657220' read buffer='Wrapper JUNK! added by fa"... 2015-10-26 18:47:11,603 sound-source stopping 2015-10-26 18:47:11,604 sound-source stopping on gibberish: 2015-10-26 18:47:11,603 sound-source stopping 2015-10-26 18:47:11,604 "invalid packet header: '5772617070657220' read buffer='Wrapper JUNK! added by fa"...
And client side all I see is 2015-10-26 18:46:55,338 sound-sink stopping
.
Not sure if the wrapper layer fault injection is meant to only target the sound or not... in either case, I'll assign this back to you to wait on more new debugging tricks.
For the codec stuff XPRA_CODEC_FAIL_IMPORT
and XPRA_CODEC_FAIL_SELFTEST
see ticket:1010#comment:4
With XPRA_PAINT_FAULT_INJECTION_TELL=0
, you are meant to get both:
region is 15041ms old, bad connection?
and Error: expiring missing damage acks: [96]
: the missing ack is making us delay the sending of new updates until we hit the timeout of 15 seconds.
And client side (in yellow, which I take it indicates it is a simulated error?):
0.16 includes a coloured stream logger, which is used with the server when running with {{{daemon=no) and with the client when running from a console.
The definition of the colours can be found in the source: colorstreamhandler.py.
When terminating the server running with encryption and password-file and encryption-keyfile, with a control-C, I got an unusual traceback:
^C/usr/lib64/python2.7/threading.py:1163: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
I believe that this only happens when running with daemon=no
and using control-c, and it shouldn't be too much of a problem: if the server doesn't immediately exit, just hit control-c again.
So I'm leaving this one as it is.
Server-side though, I got the fault injection warning in yellow, but then an error in red (though, the yellow does explain that, rather): ..
The warning tells you it is sending garbage to the client, the client then drops the connection and the server sees that as an error. All is well.
Not sure if the wrapper layer fault injection is meant to only target the sound or not... in either case, I'll assign this back to you to wait on more new debugging tricks.
The wrapper is generic but is currently only used for the sound subprocess.
I think this is enough for now, I have added a link to this ticket from wiki/Debugging under "fault injection", and I will try to add more fault injection as I go along.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/910