Xpra: Ticket #21: jitter on link causes keys to repeat

Solutions proposed:

doing the key repeat client-side - this does not seem to be a workable solution, it is likely to break applications which are currently working just fine
a better approach would be to require the client to confirm the key repeat within a certain amount of time: that way we can keep the key pressed on the server side, but we unpress it if we don't receive a "key-still-pressed" from the client in time. This allows us to maintain a consistent server-side state but we can also detect jitter on the line and take appropriate counter measures. (if we do eventually get the "key-still-pressed" packet after the timeout, we just press it again). The downside is that the line jitter will probably still cause keys to be pressed more than once (probably twice), just not dozens of times as it is now. Difficulty comes from the fact that the key-repeat delay is configurable and that it may well be shorter than the connection's transmit time when accounting for jitter.. And that we would need to communicate this delay to the client, and that those "key-still-pressed" messages waste precious bandwidth (although hardly more so than the pure client-side option)
letting the server increase the key repeat delay - we probably would need some kind of timing information with the packets so the server can have an idea of latency (this may be worth having in any case)
sending key events ahead of anything else (even mouse motion?) - this would not solve all cases, but work for cases where the high latency is caused by xpra packets

Thu, 08 Sep 2011 09:32:37 GMT - Antoine Martin: status, summary changed

status changed from new to accepted
summary changed from high latency link causes keys to repeat to jitter on link causes keys to repeat

Sat, 15 Oct 2011 17:14:02 GMT - Antoine Martin: description changed

description modified (diff)

Mon, 17 Oct 2011 17:27:09 GMT - Antoine Martin:

Took approach number 2 in r229: we keep the benefits of doing it all server side (work with all apps that rely on keyboard state) but we rely on the client to regularly send us the "key-repeat" signal. If we don't get a signal in time, we unpress the key. (we also unpress all keys whenever we lose focus which should solve other cases too)

I have tested this by setting an artificially low timeout and the key was unpressed as expected. (could also test by delaying the "key-repeat" signal - not done yet). Can you please confirm if this works for you?

Thu, 20 Oct 2011 07:53:28 GMT - mato:

I'm not the original reporter, just discovered xpra a few days ago and ran straight into this problem quite badly (Wifi + cable modem + multihop SSH doesn't help jitter). Have tried briefly now with 0.0.7.28 and it works fine for me. Will report back after some prolonged use.

PS. xpra is awesome, thanks!

Fri, 11 Nov 2011 18:00:06 GMT - pmarek:

Don't know whether that's jitter or just a programming bug ...

I used to have the repeat error, until I used "xset r rate" to turn off key repeat on the server. (This was with 0.7.27 or something like that.)

Now I'm on 0.7.30; the problem is now a bit different. In xchat I switch the channels with CTRL+PageUp (resp. PageDown); this used to work, but now it switches many channels at once. (Easily visible because all visited channels loose the "unread messages" color.) I hold STRG, and press PageUp/PageDown for a very short time - but it seems that it gets sent multiple times to the server.

Perhaps it's just a very fast repetition?

"xset q" gives me (on my primary X)

auto repeat delay: 250 repeat rate: 30

so it should take at least 250msec for repeat to kick in - and I'm holding the keys _not_ that long. Perhaps the time for STRG is accounted somehow?

Fri, 11 Nov 2011 18:18:32 GMT - Antoine Martin:

Damn, I'm not seeing this here at all... Can you tell me your exact client and server OS/lib versions? My desktop is set to 500/33 and when I test the key repeat with a simple xterm (no xchat installed here), I see the expected number of keys at the expected rate... can you try that too?

Can you please also check "xset q" from within an xpra session to check that it did get propagated to the virtual session correctly? Otherwise, the last option is to look at the debug log with: "xpra -d all | grep -i key" (this is going to be quite verbose but should tell us what key events we simulate on the virtual session and for what reason)

Fri, 11 Nov 2011 18:26:35 GMT - pmarek:

Both are at 0.0.7.30+dfsg-1. Yes, "xset q" via xpra gives the same result: auto repeat delay: 250 repeat rate: 30

Here's the logfile.

Fri, 11 Nov 2011 18:27:10 GMT - pmarek: attachment set

attachment set to key-grep.txt.gz

xpra attach ... -d all | grep -i Key

Fri, 11 Nov 2011 18:34:52 GMT - Antoine Martin:

The client log looks absolutely fine, can you post the server log? (same grepping)

More details: the key repeat timer fires for the Control key, but the one for the Page_Up key gets cancelled before firing (since you released the key).

Fri, 11 Nov 2011 18:36:35 GMT - pmarek:

Well, that means killing the running clients ... or should a "xpra upgrade -d all" work here?

BTW, I'm in #winswitch now, easier to talk there.

Fri, 11 Nov 2011 18:52:25 GMT - pmarek: attachment set

attachment set to key-grep.txt.2.gz

server log

Fri, 11 Nov 2011 19:16:36 GMT - Antoine Martin:

The client log shows that the client correctly sends "key-repeat" messages to the server. And here is the relevant bit from the server log:

got ['key-action', 1, 'Control_L', 1, [], 65507, '', 37]
now 1pressing keycode=37, keyname=Control_L
handle keycode pressing 37: key Control_L
scheduling key repeat timer with delay 250
got ['key-repeat', 37]
cancelling key repeat timer: 742
scheduling key repeat timer with delay 250
got ['key-repeat', 37]
(...)
got ['key-action', 1, 'Page_Down', 1, ['control'], 65366, '', 117]
now 1pressing keycode=117, keyname=Page_Down
handle keycode pressing 117: key Page_Down
scheduling key repeat timer with delay 250
got ['key-repeat', 37]
cancelling key repeat timer: 739
scheduling key repeat timer with delay 250
got ['key-repeat', 37]
cancelling key repeat timer: 742
scheduling key repeat timer with delay 250
key repeat timeout for keycode 117 / 'Page_Down' - clearing it
handle keycode releasing 117: key Page_Down
cancelling key repeat timer: 741
key repeat timeout for keycode 37 / 'Control_L' - clearing it
handle keycode releasing 37: key Control_L
cancelling key repeat timer: 746
got ['key-repeat', 37]
cancelling key repeat timer: 746
scheduling key repeat timer with delay 250
got ['key-repeat', 37]
cancelling key repeat timer: 777
scheduling key repeat timer with delay 250
got ['key-action', 1, 'Page_Down', 0, ['control'], 65366, '', 117]
now 0pressing keycode=117, keyname=Page_Down
handle keycode 117: key Page_Down was already pressed/released, ignoring
got ['key-repeat', 37]
cancelling key repeat timer: 778
scheduling key repeat timer with delay 250
got ['key-repeat', 37]
cancelling key repeat timer: 779
scheduling key repeat timer with delay 250

Between the moment the "Page_Down" is pressed and released (look for "key-action" messages), we get a "key repeat timeout" for both "Control_L" (37) and "Page_Down" (117). This means that both the key-release ("key-action" with first value 1) event for Control_L and the usual "key-repeat" for "Control_L" took an unusually long time to reach the server's read processing loop. Since this does not happen at any other time during this log trace, it would seem that whatever "Control_L + Page_Down" triggered (a screen update is quite likely), it also caused the delay in processing/receiving those key messages and therefore caused the keys to remain pressed longer than intended. It would be quite useful to print timing information in the log messages to be able to confirm this...

Tue, 22 Nov 2011 09:17:53 GMT - pmarek:

Idea discussed on IRC: provide the key-release messages additionally via an UDP channel (openvpn?), so that a single packet loss is not enough to cause the problem.

(This assumes that the problem is a dropped TCP packet, with the timeout, retransmission, etc.)

Wed, 23 Nov 2011 09:57:19 GMT - Antoine Martin:

The threaded server introduced in r284 drastically reduces latency on the server and r285 prioritizes "key-repeat" packets (just in case - probably not making a huge difference). Please let me know if this improves the situation, and if the problem still occurs, what is the load on both server and client at the time?

The only location where the client can lag is when receiving very large damage data, at the moment this is processed in the main thread (see draw() in client.py). The same threaded approach cannot be used here as the expensive call is in gtk - which therefore must be called from the main thread... but maybe we can split the loading in chunks (won't be easy):

            loader = gtk.gdk.PixbufLoader(coding)
            loader.write(img_data, len(img_data))
            loader.close()
            pixbuf = loader.get_pixbuf()

The splitting in chunks is not the hard part (decide on a chunk size and just consume the buffer via idle_add calls), the problem is that we would end up consuming network packets faster than we actually process them... so the "damage-sequence" code would need to be changed (not sure if firing it after we actually finish loading and draw something would be enough: we may have accumulated too much pending data already...)

Wed, 23 Nov 2011 15:40:02 GMT - Antoine Martin:

r288 makes the parsing and decompression of network packets threaded so we don't starve the main loop for long. This reduces the latency significantly when dealing with very large packets (like full screen updates at high res).

This helps the client the most as it receives much more data than the server.

Wed, 23 Nov 2011 19:47:40 GMT - pmarek:

If you add the latency-measurement, please provide some way to get the current value - dbus, commandline, whatever.

At least for bug reports it might be interesting to have an floating average for the last 1,5,10, and 60 seconds, too ;)

Thu, 01 Dec 2011 19:09:30 GMT - Antoine Martin:

This simple patch just sends a ping every second and prints the latency when the echo arrives - comments/ideas welcome:

Index: xpra/server.py
===================================================================
--- xpra/server.py	(revision 322)
+++ xpra/server.py	(working copy)
@@ -1292,6 +1292,8 @@
                 self._send_new_window_packet(window)
         if self.send_cursors:
             self.send_cursor()
+        if True:
+            self.send_ping()
     def send_hello(self, client_capabilities):
         capabilities = {}
@@ -1533,6 +1535,16 @@
         log("received sequence: %s", packet_sequence)
         self._server_source.last_client_packet_sequence = packet_sequence
+    def send_ping(self):
+        if not self._protocol._closed:
+            self._send(["ping", long(1000.0*time.time())])
+            gobject.timeout_add(1000, self.send_ping)
+
+    def _process_ping_echo(self, proto, packet):
+        (_, ts) = packet
+        now = long(1000.0*time.time())
+        log.info("ping_echo time=%s, now=%s, ts=%s, latency=%s", time.time(), now, ts, now-ts)
+
     def _process_buffer_refresh(self, proto, packet):
         (_, id, _, jpeg_qual) = packet
         if self.encoding=="jpeg":
@@ -1591,6 +1603,7 @@
         "shutdown-server": _process_shutdown_server,
         "jpeg-quality": _process_jpeg_quality,
         "damage-sequence": _process_damage_sequence,
+        "ping_echo": _process_ping_echo,
         "buffer-refresh": _process_buffer_refresh,
         "desktop_size": _process_desktop_size,
         "encoding": _process_encoding,
Index: xpra/client.py
===================================================================
--- xpra/client.py	(revision 315)
+++ xpra/client.py	(working copy)
@@ -876,6 +876,11 @@
         log("_process_notify_close(%s)", id)
         self._client_extras.close_notify(id)
+    def _process_ping(self, packet):
+        (_, ts) = packet
+        log.info("ping: %s", ts)
+        self.send(["ping_echo", ts])
+
     def _process_window_metadata(self, packet):
         (_, id, metadata) = packet
         window = self._id_to_window[id]
@@ -918,6 +923,7 @@
         "bell": _process_bell,
         "notify_show": _process_notify_show,
         "notify_close": _process_notify_close,
+        "ping": _process_ping,
         "window-metadata": _process_window_metadata,
         "configure-override-redirect": _process_configure_override_redirect,
         "lost-window": _process_lost_window,

Thu, 01 Dec 2011 19:44:36 GMT - pmarek:

Sorry, I can't test that ...

# patch -l --dry-run -p1 /usr/lib/xpra/xpra/client.py < /tmp/xpra-patch
patching file /usr/lib/xpra/xpra/client.py
Hunk #1 FAILED at 1292.
Hunk #2 FAILED at 1533.
Hunk #3 FAILED at 1591.
3 out of 3 hunks FAILED -- saving rejects to file /usr/lib/xpra/xpra/client.py.rej
patching file /usr/lib/xpra/xpra/client.py
Hunk #1 succeeded at 877 (offset 1 line).
Hunk #2 succeeded at 924 (offset 1 line).

This is with xpra 0.0.7.31.

Thu, 01 Dec 2011 19:59:56 GMT - Antoine Martin: attachment set

attachment set to ping.patch

patching adding latency measurement via simple ping

Thu, 01 Dec 2011 20:00:57 GMT - Antoine Martin:

[~/Xpra/trunk/src] patch  -p0  < ./ping.patch
patching file xpra/server.py
patching file xpra/client.py

Applies cleanly to trunk

Sun, 04 Dec 2011 15:40:24 GMT - Antoine Martin: attachment set

attachment set to session-info-oldservers.png

new session-info window as shown when connecting to old servers (limited details)

Sun, 04 Dec 2011 15:40:54 GMT - Antoine Martin: attachment set

attachment set to session-info-newservers.png

session-info for new servers (more details)

Sun, 04 Dec 2011 15:41:28 GMT - Antoine Martin:

I needed an easy way to see the latency from both sides, so r331 now has a dedicated window for seeing this sort of useful debugging information. See screenshots.

Please try it out and report on the type of values you see when problems occur.

Wed, 07 Dec 2011 12:22:12 GMT - Antoine Martin: status changed; resolution set

status changed from accepted to closed
resolution set to fixed

r349 adds the "--no-keyboard-sync" option, the man page reads:

Disables keyboard state synchronization. Normally the key presses
and key release events are sent to the server as they occur so that
the server can maintain a consistent keyboard state.
Using this option can prevent keys from repeating unexpectedly on
high latency links but it may also disrupt applications which access
the keyboard directly (games, etc).

This will not be made the default and even if we somehow hooked the latency measurements, we probably would not want to enable this on-the-fly automatically.

Feel free to re-open if you can still reproduce.

Mon, 20 Feb 2012 19:56:42 GMT - Antoine Martin: version set

version set to 0.0.7.32

Sat, 23 Jan 2021 04:43:24 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/21