Xpra: Ticket #1595: Uncaught exceptions in HTML5 Client

While testing #1593 (now closed) I kept running into the same error over and over again that led to a number of client crashes.

My server is a trunk r16446 and the server and client machines are both running Fedora 25.

I started my server with xpra start :13 --html=on --no-daemon --bind-tcp=0.0.0.0: --start-new-commands=yes --start-child=firefox --start-via-proxy=no and connected via the HTML5 client using Firefox.

To get the crash I've found that just simply scrolling up and down rather vigorously on the Fedora homepage https://start.fedoraproject.org will usually cause the crash after a bit of time. Otherwise I've ran into it intermittently while looking for other bugs. Just before the crash the logs will usually print:

uncaught exception: invalid packet header format: 140:   (unknown)
uncaught exception: we can't handle this protocol flag yet, sorry

Sadly enabling the debugging option doesn't print anything more useful.



Fri, 21 Jul 2017 18:07:49 GMT - J. Max Mena:

Slight update:

In the time that I took to write this ticket I managed to repro it just by switching tabs in Firefox.


Sat, 22 Jul 2017 05:54:27 GMT - Antoine Martin: owner changed

led to a number of client crashes ... connected via the HTML5 client using Firefox.

Please describe the "client crash": do you seen the spinners, do you get disconnected or does the browser actually crash?

To get the crash I've found that just simply scrolling up and down rather vigorously...

That doesn't cause any problems here, no matter how vigorously or how long I scroll. (tested with both Fedora 25 and 26)

Just before the crash the logs will usually print: uncaught exception: invalid packet header format: 140: (unknown)

Is the invalid packet header always 140? Did you get any other codes?

In the time that I took to write this ticket I managed to repro it just by switching tabs in Firefox.

Doesn't do anything here.

Since I very much doubt that the server is suddenly starting to send invalid packet headers to the HTML5 client, something else is either corrupting the websocket stream or Firefox is just dropping it - for whatever reason. Websockify is very well tested and used by many other projects, so that's unlikely to be mishandling the packets. There could be a firewall or a proxy or something on your network, try to see if using wss helps, or maybe a secure tunnel - anything to rule that out.

If that still doesn't help, please try to provide:

Since I couldn't reproduce it the "normal" way, r16448 makes it possible to send invalid raw packets:

XPRA_SEND_INVALID_PACKET_DATA="Whatever" XPRA_SEND_INVALID_PACKET=10 xpra ...

This will send "Whatever" as a raw packet 10 seconds after the connection is made.

With this in place and a client that sends lots of screen updates (ie: glxgears), I noticed that we could get stuck in the packet processing class whenever we receive a broken packet (trying to process the same one over and over again) - so r16449 ensures we trigger the close / re-connection, and ignore any further events from this dead connection. With this in place, I see a quick screen flicker as the client re-connects (every 10 seconds with the env var above)


Mon, 24 Jul 2017 22:05:08 GMT - J. Max Mena: status changed; resolution set

Please describe the "client crash": do you seen the spinners, do you get disconnected or does the browser actually crash?


I get disconnected; the browser doesn't crash.

That being said, now that I've upped my server to r16498, I have been able to reproduce the crash I got in the original post, but instead of my client getting disconnected entirely, I get a slight screen flicker. Looks like that fix is what did it.

Closing.


Tue, 25 Jul 2017 01:30:34 GMT - Antoine Martin: status changed; resolution deleted

This is not fixed, the flicker is caused by the workaround. We still don't know why the invalid packet occurs.


Tue, 25 Jul 2017 17:57:08 GMT - J. Max Mena:

Okay I upped my server to r16516 and ran it with -d network until it disconnected me.

Here's the logs:

2017-07-25 10:42:11,857 will process ui packet sound-control
2017-07-25 10:42:11,861 peer: (0, -1, -1)
2017-07-25 10:42:11,861 peer: (0, -1, -1)
2017-07-25 10:42:11,861 new_connection((1, <socket._socketobject object at 0x7f5ec46fd590>)) sock=<socket._socketobject object at 0x7f5eb34a0c90>, timeout=0.1, sockname=('192.168.1.138', 2200), address=('192.168.1.140', 50304), peername=('192.168.1.140', 50304). timeout=0.1
2017-07-25 10:42:11,862 192.168.1.140: Plain non-SSL (ws://) WebSocket connection
2017-07-25 10:42:11,863 192.168.1.140: Version hybi-13, base64: 'False'
2017-07-25 10:42:11,863 io_thread_loop(read, <bound method Protocol._read of Protocol(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50304)>) loop starting
2017-07-25 10:42:11,874 read_parse_thread_loop starting
2017-07-25 10:42:11,875 processing packet hello
2017-07-25 10:42:11,875 will process ui packet hello
2017-07-25 10:42:11,875 enable_compressor_from_caps(..) options=['lz4', 'lzo', 'zlib']
2017-07-25 10:42:11,875 enable_compressor(lz4): <function lz4_compress at 0x7f5ec59abc80>
2017-07-25 10:42:11,875 get_enabled_encoders(['rencode', 'bencode', 'yaml']) enabled=['rencode', 'bencode']
2017-07-25 10:42:11,875 enable_encoder_from_caps(..) options=['rencode', 'bencode']
2017-07-25 10:42:11,875 get_enabled_encoders(['bencode', 'rencode', 'yaml']) enabled=['rencode', 'bencode']
2017-07-25 10:42:11,875 enable_encoder(bencode): <function do_bencode at 0x7f5ec59b0b18>
2017-07-25 10:42:11,876 Handshake complete; enabling connection
2017-07-25 10:42:11,876 disconnect_client(Protocol(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242), new client, ('this session does not allow sharing',))
2017-07-25 10:42:11,876 disconnect_protocol(Protocol(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242), new client, ('this session does not allow sharing',))
2017-07-25 10:42:11,876 Disconnecting client 192.168.1.140:50242:
2017-07-25 10:42:11,876  new client (this session does not allow sharing)
2017-07-25 10:42:11,876 flush_then_close(['disconnect', 'new client', 'this session does not allow sharing'], None) closed=False
2017-07-25 10:42:11,876 flush_then_close: wait_for_write_lock()
2017-07-25 10:42:11,876 flush_then_close: acquired the write lock
2017-07-25 10:42:11,876 flush_then_close: queue is now empty, sending the last packet and closing
2017-07-25 10:42:11,876 cleanup_protocol(Protocol(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242))
2017-07-25 10:42:11,877 flush_then_close: packet_queued() closed=False
2017-07-25 10:42:11,878 flush_then_close: wait_for_packet_sent() queue.empty()=True, closed=False
2017-07-25 10:42:11,879 flush_then_close: wait_for_packet_sent() close_and_release()
2017-07-25 10:42:11,879 Protocol.close() closed=False, connection=tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242
2017-07-25 10:42:11,882 Protocol.close() calling <bound method WebSocketConnection.close of tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242>
2017-07-25 10:42:11,882 tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242.close() for socket={'fileno': 16, 'type': 'UNIX', 'family': 'DGRAM', 'timeout': 0, 'proto': 0}
2017-07-25 10:42:11,882 tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50242.close() done
2017-07-25 10:42:11,882 terminate_queue_threads()
2017-07-25 10:42:11,883 Protocol.close() done
2017-07-25 10:42:11,883 parse thread: empty marker, exiting
2017-07-25 10:42:11,883 flush_then_close: done, callback=None
2017-07-25 10:42:11,883 io_thread_loop(write, <bound method Protocol._write of Protocol(None)>) loop ended, closed=True
2017-07-25 10:42:11,884 cleanup_source(X11ServerSource(1 : Protocol(None))) remaining sources: []
2017-07-25 10:42:11,884 xpra client 1 disconnected.
2017-07-25 10:42:11,885 accept_client(Protocol(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50304), {'named_cursors': 0, 'encoding.h264.deblocking-filter': 0, 'bell': 1, 'lz4': 1, 'yaml': 0, 'keyboard': 1, 'file-transfer': 1, 'sound.receive': 1, 'digest': ['hmac', 'hmac+md5', 'xor', 'hmac+md5', 'hmac+sha1', 'hmac+sha256'], 'xkbmap_print': '', 'platform.platform': '5.0 (X11)', 'clipboard.want_targets': 1, 'uuid': '038d816b-d4f0-ae56-ba2a-e4638c455cc6', 'encodings.core': ['jpeg', 'png', 'rgb', 'rgb32'], 'encoding.vp8+webm.score-delta': 50, 'encoding.mpeg4+mp4.score-delta': 50, 'encoding.h264+mp4.YUV420P.level': '3.0', 'sound.server_driven': 1, 'session-type.full': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0', 'platform': 'Posix', 'encoding.scrolling': 1, 'encodings.rgb_formats': ['RGBX', 'RGBA'], 'encodings.cursor': ['png'], 'randr_notify': 1, 'sound.send': 0, 'screen_sizes': [['HTML', 1877, 1109, 497, 293, [['Canvas', 0, 0, 1877, 1109, 497, 293]], 0, 0, 1877, 1109]], 'username': '', 'auto_refresh_delay': 500, 'encoding.h264.YUV420P.profile': 'baseline', 'encoding.transparency': 1, 'xkbmap_layout': 'us', 'notifications': 1, 'xkbmap_query': '', 'encoding.rgb24zlib': 1, 'metadata.supported': ['fullscreen', 'maximized', 'above', 'below', 'title', 'size-hints', 'class-instance', 'transient-for', 'window-type', 'decorations', 'override-redirect', 'tray', 'modal', 'opacity'], 'lzo': 0, 'encoding.x264.YUV420P.profile': 'baseline', 'clipboard_enabled': 1, 'rencode': 0, 'window.raise': 1, 'encoding.client_options': 1, 'platform.name': 'Posix', 'desktop_mode_size': [1877, 1109], 'client_type': 'HTML5', 'window.initiate-moveresize': 1, 'steal': 1, 'dpi': 96, 'encoding.video_scaling': 1, 'zlib': 1, 'desktop_size': [1877, 1109], 'encoding.h264+mp4.score-delta': 50, 'share': 0, 'lz4.js.version': '0.5.1', 'sound.decoders': [], 'cursors': 1, 'encodings': ['jpeg', 'png', 'rgb', 'rgb32'], 'encoding.h264.cabac': 0, 'generic-rgb-encodings': 1, 'xkbmap_keycodes': [[8, 'BackSpace', 8, 0, 0], [9, 'Tab', 9, 0, 0], [12, 'KP_Begin', 12, 0, 0], [13, 'Return', 13, 0, 0], [16, 'Shift_L', 16, 0, 0], [17, 'Control_L', 17, 0, 0], [18, 'Alt_L', 18, 0, 0], [19, 'Pause', 19, 0, 0], [20, 'Caps_Lock', 20, 0, 0], [27, 'Escape', 27, 0, 0], [31, 'Mode_switch', 31, 0, 0], [32, 'space', 32, 0, 0], [33, 'Prior', 33, 0, 0], [34, 'Next', 34, 0, 0], [35, 'End', 35, 0, 0], [36, 'Home', 36, 0, 0], [37, 'Left', 37, 0, 0], [38, 'Up', 38, 0, 0], [39, 'Right', 39, 0, 0], [40, 'Down', 40, 0, 0], [42, 'Print', 42, 0, 0], [45, 'Insert', 45, 0, 0], [46, 'Delete', 46, 0, 0], [48, '0', 48, 0, 0], [49, '1', 49, 0, 0], [50, '2', 50, 0, 0], [51, '3', 51, 0, 0], [52, '4', 52, 0, 0], [53, '5', 53, 0, 0], [54, '6', 54, 0, 0], [55, '7', 55, 0, 0], [56, '8', 56, 0, 0], [57, '9', 57, 0, 0], [58, 'colon', 58, 0, 0], [59, 'semicolon', 59, 0, 0], [60, 'less', 60, 0, 0], [61, 'equal', 61, 0, 0], [62, 'greater', 62, 0, 0], [63, 'question', 63, 0, 0], [64, 'at', 64, 0, 0], [65, 'a', 65, 0, 0], [66, 'b', 66, 0, 0], [67, 'c', 67, 0, 0], [68, 'd', 68, 0, 0], [69, 'e', 69, 0, 0], [70, 'f', 70, 0, 0], [71, 'g', 71, 0, 0], [72, 'h', 72, 0, 0], [73, 'i', 73, 0, 0], [74, 'j', 74, 0, 0], [75, 'k', 75, 0, 0], [76, 'l', 76, 0, 0], [77, 'm', 77, 0, 0], [78, 'n', 78, 0, 0], [79, 'o', 79, 0, 0], [80, 'p', 80, 0, 0], [81, 'q', 81, 0, 0], [82, 'r', 82, 0, 0], [83, 's', 83, 0, 0], [84, 't', 84, 0, 0], [85, 'u', 85, 0, 0], [86, 'v', 86, 0, 0], [87, 'w', 87, 0, 0], [88, 'x', 88, 0, 0], [89, 'y', 89, 0, 0], [90, 'z', 90, 0, 0], [91, 'Menu', 91, 0, 0], [92, 'Menu', 92, 0, 0], [93, 'KP_Enter', 93, 0, 0], [96, '0', 96, 0, 0], [97, '1', 97, 0, 0], [98, '2', 98, 0, 0], [99, '3', 99, 0, 0], [100, '4', 100, 0, 0], [101, '5', 101, 0, 0], [102, '6', 102, 0, 0], [103, '7', 103, 0, 0], [104, '8', 104, 0, 0], [105, '9', 105, 0, 0], [106, 'KP_Multiply', 106, 0, 0], [107, 'KP_Add', 107, 0, 0], [109, 'KP_Subtract', 109, 0, 0], [110, 'KP_Delete', 110, 0, 0], [111, 'KP_Divide', 111, 0, 0], [112, 'F1', 112, 0, 0], [113, 'F2', 113, 0, 0], [114, 'F3', 114, 0, 0], [115, 'F4', 115, 0, 0], [116, 'F5', 116, 0, 0], [117, 'F6', 117, 0, 0], [118, 'F7', 118, 0, 0], [119, 'F8', 119, 0, 0], [120, 'F9', 120, 0, 0], [121, 'F10', 121, 0, 0], [122, 'F11', 122, 0, 0], [123, 'F12', 123, 0, 0], [124, 'F13', 124, 0, 0], [125, 'F14', 125, 0, 0], [126, 'F15', 126, 0, 0], [127, 'F16', 127, 0, 0], [128, 'F17', 128, 0, 0], [129, 'F18', 129, 0, 0], [130, 'F19', 130, 0, 0], [131, 'F20', 131, 0, 0], [132, 'F21', 132, 0, 0], [133, 'F22', 133, 0, 0], [134, 'F23', 134, 0, 0], [135, 'F24', 135, 0, 0], [144, 'Num_Lock', 144, 0, 0], [145, 'Scroll_Lock', 145, 0, 0], [160, 'dead_circumflex', 160, 0, 0], [161, 'exclam', 161, 0, 0], [162, 'quotedbl', 162, 0, 0], [163, 'numbersign', 163, 0, 0], [164, 'dollar', 164, 0, 0], [165, 'percent', 165, 0, 0], [166, 'ampersand', 166, 0, 0], [167, 'underscore', 167, 0, 0], [168, 'parenleft', 168, 0, 0], [169, 'parenright', 169, 0, 0], [170, 'asterisk', 170, 0, 0], [171, 'plus', 171, 0, 0], [172, 'bar', 172, 0, 0], [173, 'minus', 173, 0, 0], [174, 'braceleft', 174, 0, 0], [175, 'braceright', 175, 0, 0], [176, 'asciitilde', 176, 0, 0], [186, 'semicolon', 186, 0, 0], [187, 'equal', 187, 0, 0], [188, 'comma', 188, 0, 0], [189, 'minus', 189, 0, 0], [190, 'period', 190, 0, 0], [191, 'slash', 191, 0, 0], [192, 'grave', 192, 0, 0], [219, 'bracketleft', 219, 0, 0], [220, 'backslash', 220, 0, 0], [221, 'bracketright', 221, 0, 0], [222, 'apostrophe', 222, 0, 0]], 'encoding.icons.max_size': [30, 30], 'raw_window_icons': 1, 'system_tray': 1, 'namespace': 1, 'sound.bundle-metadata': 1, 'encoding.generic': 1, 'version': '2.1', 'encoding.rgb_zlib': 1, 'encoding.full_csc_modes': {'h264': ['YUV420P'], 'mpeg4+mp4': ['YUV420P'], 'h264+mp4': ['YUV420P'], 'vp8+webm': ['YUV420P']}, 'encoding.h264.YUV420P.level': '2.1', 'encoding.rgb_lz4': 1, 'bencode': 1, 'encoding.h264+mp4.YUV420P.profile': 'main', 'notify-startup-complete': 1, 'encoding.h264.score-delta': -20, 'argv': ['http://192.168.1.138:2200/'], 'server-window-resize': 1, 'clipboard.selections': ['CLIPBOARD', 'PRIMARY'], 'clipboard.greedy': 1, 'printing': 1, 'file-size-limit': 10, 'encoding.color-gamut': '', 'windows': 1, 'encodings.window-icon': ['png'], 'compression_level': 1, 'session-type': 'Firefox', 'platform.processor': 'Linux x86_64', 'encoding.csc_atoms': 1})
2017-07-25 10:42:11,886 set_socket_timeout(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50304, None)
2017-07-25 10:42:11,887 HTML5 Posix Firefox client version 2.1
2017-07-25 10:42:11,887  automatic picture encoding enabled
2017-07-25 10:42:11,887  also available:
2017-07-25 10:42:11,887   jpeg, png, rgb32
2017-07-25 10:42:11,887 will process ui packet connection-lost
2017-07-25 10:42:11,888 Protocol.close() closed=True, connection=None
2017-07-25 10:42:11,888  client root window size is 1877x1109 with 1 display:
2017-07-25 10:42:11,888   HTML (497x293 mm - DPI: 95x96)
2017-07-25 10:42:11,888     Canvas
2017-07-25 10:42:11,889 best resolution matching 1877x1109 is unchanged: 1920x1152
2017-07-25 10:42:11,891 keyboard mapping already configured (skipped)
2017-07-25 10:42:11,892 get_enabled_encoders(['bencode', 'rencode', 'yaml']) enabled=['rencode', 'bencode']
2017-07-25 10:42:11,893 write_format_thread_loop starting
2017-07-25 10:42:11,894 add_packet_to_queue(hello ...)
2017-07-25 10:42:11,897 io_thread_loop(write, <bound method Protocol._write of Protocol(tcp websocket: 192.168.1.138:2200 <- 192.168.1.140:50304)>) loop starting

I think the sound packet may be of interest - I haven't enabled sound in this session.


Fri, 28 Jul 2017 17:38:01 GMT - J. Max Mena: owner, status changed

@antoine - are these logs enough, or do you want me to pipe my entire session into a file and post that?

I can also get the wireshark as well.


Fri, 28 Jul 2017 17:54:05 GMT - Antoine Martin: owner changed

@maxmylyn: this log starts near:

new_connection((1, <socket._socketobject object at 0x7f5ec46fd590>)) ...

Which means that the client has already decided to re-connect. We still don't know why it did. Maybe the javascript debug console can tell us? (enable debug mode in the HTML5 client) If that doesn't clarify things, a wireshark capture may be required.

But ideally, I would be able to reproduce it myself - that would make it much easier to fix. What am I doing wrong? Can you reduce your setup to something minimal I can easily replicate?


Mon, 11 Sep 2017 17:18:23 GMT - J. Max Mena:

As of trunk r16825 I'm unable to repro this in the same manner as I was able to before.

However, after connecting I get this error flooding the console output of the server:

2017-09-11 10:18:02,043 Warning: client decoding error:
2017-09-11 10:18:02,043  window not found

Mon, 11 Sep 2017 17:41:31 GMT - Antoine Martin:

Please see comment:7, we need the javascript console output in debug mode, and / or the complete server output (before new_connection(...). Even better would be steps so that I can reproduce the problem.


Mon, 18 Sep 2017 22:40:41 GMT - J. Max Mena:

Upped server and client to r16911:

I am definitely unable to reproduce the original error I mentioned in the description and haven't gotten disconnected even after fairly thoroughly testing this.

I'm going to close this ticket if there's no objections.


Fri, 22 Sep 2017 16:49:19 GMT - J. Max Mena: status changed; resolution set

Okay I'm still definitely unable to reproduce the hard crash or the unexpected uncaught exceptions, as of r16954. Closing.


Sat, 23 Jan 2021 05:28:39 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1595