Xpra: Ticket #1473: HTML5: audiocontext caching, re-connect, events, etc

I'm making an April Fools' Day button for spacemacs.org with Xpra HTML5 and Docker Swarm back-end with the default load balancer. Here it is: http://xpratest.tk (temporary location) I had to modify Xpra code to make it work properly(ish). I think it will be great to make it doable with the unmodified Xpra

So the modifications: First of all client need the ability to reconnect without refreshing page. It includes killing web workers and timers + Client constructor calls Utilities.getAudioContext that creates audio contexts (they are limited per page) Also the error WebSocket connection to 'ws://****/' failed: Error in connection establishment: net::ERR_CONNECTION_REFUSED should be handled properly (It occurs when there is no live backends)

At the server side I needed a way to drop the new WS client instead of the old one - this way a client will attempt to connect until it hits a live container without a client. And it has to be done before the new client will be able to mess up with the previous one (for example, force disconnect)

WARNING: eye bleed inducing code smell ahead!

server: https://github.com/JAremko/browsermax client: https://github.com/JAremko/develop.spacemacs.org

For now random hello timeouts seems to be the biggest problem. I had to UP the timeout substantially. (may be Docker related problem - I built it from github trunk, because I needed some extra sandbox features)



Sat, 25 Mar 2017 05:03:42 GMT - JAremko:

Also protecting Xpra server with CAPTCHA would be great :P For the cases like this. When you just want to embed Xpra window on a site and let users access it with some basic abuse prevention, but without custom reverse proxy shenanigans.


Sat, 25 Mar 2017 05:20:39 GMT - Antoine Martin: owner changed

A more generic solution to your "Drop-latest-WS-connection.patch" has been added in r15393: if "steal" is false (not the default), we reject the connection if a client is already connected. (with some other cleanups thrown in)

The AudioContext limitation: wouldn't it be cleaner to just cache the return value in Utilities.getAudioContext? (this should be safe as it is never called from a worker?)

As for the hello timeouts: do you really need 120 seconds?! Where is it getting stuck during all that time?

The Error in connection establishment: net::ERR_CONNECTION_REFUSED - isn't this handled already? I get sent back to the connect page.

I think that captcha stuff is out of scope as this would make it tied to an API.


Sat, 25 Mar 2017 06:08:32 GMT - JAremko:

Hi Antoine, thanks for responding!


Replying to Antoine Martin:

A more generic solution to your "Drop-latest-WS-connection.patch" has been added in r15393: if "steal" is false (not the default), we reject the connection if a client is already connected. (with some other cleanups thrown in)

Thx. I'll look into it.


The AudioContext limitation: wouldn't it be cleaner to just cache the return value in Utilities.getAudioContext? (this should be safe as it is never called from a worker?)

you're right. This is how it should be handled in the "oficial implementation" I just do not need sound at all :)


As for the hello timeouts: do you really need 120 seconds?! Where is it getting stuck during all that time?

I think it is something related to my docker setup. Or due to web-browser "HTTP simultaneous connections per host limit". So I need better tests - if I want to go serious with this :P So far it looks like this huge timeout thingy doesn't hurt.


The Error in connection establishment: net::ERR_CONNECTION_REFUSED - isn't this handled already? I get sent back to the connect page.

new WebSocket(...) in the web worker probably should be wrapped in something that allows retrying.


I think that captcha stuff is out of scope as this would make it tied to an API.

How about some kind of a universal interface for the captcha providers? Example: xpra start --captcha-provider=/usr/local/bin/captcha ... Where /usr/local/bin/captcha is a user made proxy to a captcha API like recaptcha it returns html code that will be shown by Xpra HTML5 client + a way to verify it (may be if called with a user response as an argument?) If server's captcha will use the same TCP connection (WS channel) - it will simplify load balancing. But it will need timeout on captcha solving to prevent DoS.


Sat, 25 Mar 2017 06:26:12 GMT - JAremko:

Also for restarting Client it doesn't make sense to retest stuff like web worker support. And Xpra Client should clean the Xpra container element.


Sat, 25 Mar 2017 08:38:32 GMT - JAremko:

If Xpra had Client event onconnect it would help with switching host page interface (changing from starting to started state, for example) And events like "on first GUI element(window?) appeared" and "on last GUI element disappeared" will make this ugly сrutch unnecessary.


Sat, 25 Mar 2017 13:00:35 GMT - Antoine Martin:

Note: this doesn't fire for when the server is terminated normally ("disconnect" packet handler still goes back to the connect page)

Tested by killing the server with "kill -9" and then re-starting one quickly: the html5 client connects to the new server. (forcibly killing the TCP connection should have the same effect)


I don't think I will ever have time or interest in the captcha API feature, so please create a separate ticket for that if you wish. (bearing in mind that unless you start working on it, not much is likely to happen...)


Sat, 25 Mar 2017 17:30:36 GMT - JAremko:

Replying to Antoine Martin:

This is amazing, thank You very much!


Sat, 25 Mar 2017 18:08:14 GMT - JAremko:

Replying to Antoine Martin:

Can this.reconnect_count=0 (or -1) mean infinitely?


Sat, 25 Mar 2017 18:46:45 GMT - JAremko:

Replying to Antoine Martin:

I'm getting error Uncaught ReferenceError: me is not defined at Client.js:30

// assign callback for window resize event
if (window.jQuery) {
  jQuery(window).resize(jQuery.debounce(250, function (e) {
    me._screen_resized(e, me);
  }));
}

http://xpra.org/trac/browser/xpra/trunk/src/html5/js/Client.js?rev=15402#L30


Sat, 25 Mar 2017 18:57:47 GMT - JAremko:

chrome tab dies if server is unreachable due to multiple Protocol.js workers thread alive simultaneously

http://i.imgur.com/lG8jtOx.png

Also they're keep on trying to connect to the server even when I'm clearly connected and can interact with GUI

Reproduce at http://xpratest.tk/ : Connect, then disconnect by closing window and try to connect from the same tab.

Can it be because I have such a huge hello timeout?

I set back-end count to 1 so it will be easier to debug. Also firewall allows only 1 new connection in 20 seconds (from the same IP)


Sat, 25 Mar 2017 19:13:54 GMT - JAremko:

The steal=false option seems to work, but the logs entry may be incomplete. (I don't see the text part)


Sat, 25 Mar 2017 19:16:32 GMT - JAremko:

Should reconnect occur when server or client timeout happens?


Sun, 26 Mar 2017 04:46:38 GMT - Antoine Martin:

chrome tab dies if server is unreachable due to multiple Protocol.js workers thread alive simultaneously Also they're keep on trying to connect to the server even when I'm clearly connected and can interact with GUI

How do I reproduce this with the default xpra html5 client? There should only be a single worker, which we re-use. The re-connection should only happen when the current connection failed or dropped.

The steal=false option seems to work, but the logs entry may be incomplete. (I don't see the text part)

I don't understand what you mean. What logs? Can you show an "incomplete" sample of the log you are talking about?

Should reconnect occur when server or client timeout happens?

As of r15414, we re-connect on ping echo timeouts - which now also trigger more quickly. (15 seconds, configurable)

FYI: r15415 may be of interest to you too, it shows the connection setup progress


Sun, 26 Mar 2017 06:58:04 GMT - JAremko:

Replying to Antoine Martin:

How do I reproduce this with the default xpra html5 client? There should only be a single worker, which we re-use. The re-connection should only happen when the current connection failed or dropped.

I had the same problem with my old implementation when WebSocket? constructor failed many times. I solved it with this. May be you can make sure that the old Protocol worker is removed before creating a new one?

Mb log will help https://gist.github.com/JAremko/abfb8130d87e85d7df397ea6b112ca80

I think to detect it with the default client you need to disable redirect on disconnect and connect to a wrong WS address. (but currently my client is pretty "default") https://github.com/JAremko/develop.spacemacs.org/blob/gh-pages/index.html#L383


The steal=false option seems to work, but the logs entry may be incomplete. (I don't see the text part)

I don't understand what you mean. What logs? Can you show an "incomplete" sample of the log you are talking about?

Oh borrower log show only "session busy" I was looking for "this session is already active" ok.


I noticed that after a reconnect the Client doesn't honor this rule https://github.com/JAremko/develop.spacemacs.org/blob/gh-pages/index.html#L448


Sun, 26 Mar 2017 07:01:51 GMT - JAremko:

Hm...

2017-03-26 06:26:47,482 created unix domain socket: /home/emacs/.emacs.d/.cache/bbd19a81f821-14
Unable to create /home/emacs/.dbus
Unable to create /home/emacs/.dbus/session-bus
2017-03-26 06:26:51,360 serving html content from: /usr/share/xpra/www
2017-03-26 06:26:51,467 started command 'emacs -geometry 100x48 --chdir "/home/emacs/.emacs.d/.cache/workspace"' with pid 50
2017-03-26 06:26:51,467 xpra X11 version 2.1 64-bit
2017-03-26 06:26:51,467  uid=1000 (spacemacser), gid=1000 (xpra)
2017-03-26 06:26:51,467  running with pid 12 on Linux
2017-03-26 06:26:51,468  connected to X11 display :14 with 24 bit colors
2017-03-26 06:26:51,469 15.6GB of system memory
2017-03-26 06:26:51,485 xpra is ready.
(process:51): GLib-GIO-CRITICAL **: g_settings_schema_source_lookup: assertion 'source != NULL' failed
2017-03-26 06:29:53,152 Handshake complete; enabling connection
2017-03-26 06:29:53,159 HTML5 Linux client version 2.1
2017-03-26 06:29:53,159  automatic picture encoding enabled
2017-03-26 06:29:53,160  also available:
2017-03-26 06:29:53,160   jpeg, png, rgb32
2017-03-26 06:29:53,160  client root window size is 1920x1014 with 1 display:
2017-03-26 06:29:53,160   HTML (508x268 mm - DPI: 96x96)
2017-03-26 06:29:53,160     Canvas
2017-03-26 06:29:53,161 setting keyboard layout to 'us'
2017-03-26 06:29:53,207 client 1: got hello: server version 2.1 accepted our connection
2017-03-26 06:29:53,225 client 1: startup complete
2017-03-26 06:29:58,465 Handshake complete; enabling connection
2017-03-26 06:29:58,465 Disconnecting client 10.255.0.2:52434:
2017-03-26 06:29:58,465  new client (this session does not allow sharing)
2017-03-26 06:29:58,466 xpra client 1 disconnected.
2017-03-26 06:29:58,467 HTML5 Linux client version 2.1
2017-03-26 06:29:58,467  automatic picture encoding enabled
2017-03-26 06:29:58,467  also available:
2017-03-26 06:29:58,467   jpeg, png, rgb32
2017-03-26 06:29:58,467 Last client has disconnected, terminating
2017-03-26 06:29:58,467 xpra is terminating.
2017-03-26 06:29:58,471  client root window size is 1920x1014 with 1 display:
2017-03-26 06:29:58,471   HTML (508x268 mm - DPI: 96x96)
2017-03-26 06:29:58,471     Canvas
2017-03-26 06:29:58,472 keyboard mapping already configured (skipped)
2017-03-26 06:29:58,506 client 2: got hello: server version 2.1 accepted our connection
2017-03-26 06:29:58,508 client 2: startup complete

This server has become a zombie


Sun, 26 Mar 2017 07:24:38 GMT - JAremko:

So the zombie protocol workers and zombie servers are the biggest two problems now.


Sun, 26 Mar 2017 13:31:48 GMT - JAremko:

Looks like all you need to get this "zombie worker" bug in chrome is to disable redirect on disconnect and attempt to connect to a bad(closed, unresponsive) port with many retries and wait a minute.

http://i.imgur.com/gx5qkyt.png

each error block spawns extra workers.

Firefox doesn't seems to be affected.


Mon, 27 Mar 2017 08:35:44 GMT - JAremko:

I made You a test page http://xpratest.tk/zombie-test (edit: link is now 404 - sigh)

It is the default client (r15425). I only change server, port and enable debug mode.


Mon, 27 Mar 2017 11:49:55 GMT - Antoine Martin:

Lots of fixes in r15430 + r15431. Does that work for you?

(PS: please don't add links to the wiki if those are likely to go 404 in the future)


Mon, 27 Mar 2017 12:32:26 GMT - JAremko:

Replying to Antoine Martin:

Lots of fixes in r15430 + r15431. Does that work for you?

(PS: please don't add links to the wiki if those are likely to go 404 in the future)

Looks good to me. No zombie workers and it works.

Have you fixed zombie servers as well?


Mon, 27 Mar 2017 12:39:20 GMT - Antoine Martin:

Have you fixed zombie servers as well?

What are those?

Can I close this ticket?


Mon, 27 Mar 2017 12:41:29 GMT - JAremko:

Replying to Antoine Martin:

Have you fixed zombie servers as well?

What are those?

2017-03-26 06:29:58,465 Disconnecting client 10.255.0.2:52434:
2017-03-26 06:29:58,465  new client (this session does not allow sharing)
2017-03-26 06:29:58,466 xpra client 1 disconnected.
2017-03-26 06:29:58,467 HTML5 Linux client version 2.1
2017-03-26 06:29:58,467  automatic picture encoding enabled
2017-03-26 06:29:58,467  also available:
2017-03-26 06:29:58,467   jpeg, png, rgb32
2017-03-26 06:29:58,467 Last client has disconnected, terminating
2017-03-26 06:29:58,467 xpra is terminating.
2017-03-26 06:29:58,471  client root window size is 1920x1014 with 1 display:
2017-03-26 06:29:58,471   HTML (508x268 mm - DPI: 96x96)
2017-03-26 06:29:58,471     Canvas
2017-03-26 06:29:58,472 keyboard mapping already configured (skipped)
2017-03-26 06:29:58,506 client 2: got hello: server version 2.1 accepted our connection
2017-03-26 06:29:58,508 client 2: startup complete

It seems to be something with that the first client exited before the new one completed handshake so the Xpra server is simply hangs.

Hard to reproduce...


Mon, 27 Mar 2017 12:43:30 GMT - Antoine Martin: status changed; resolution set

Please create a new ticket for the server issue - this doesn't look related to the html5 client at all.


Mon, 27 Mar 2017 12:48:17 GMT - JAremko:

ok. Thanks for all the hard work! Really appreciate it.


Sat, 01 Jul 2017 06:30:53 GMT - Antoine Martin: summary, milestone changed

(edit milestone and title)

See also #1491


Thu, 20 Jul 2017 07:19:43 GMT - Antoine Martin:

re-connect bug: #1586


Sat, 23 Jan 2021 05:25:14 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1473