#21 closed defect (fixed)
jitter on link causes keys to repeat
Reported by: | Antoine Martin | Owned by: | Antoine Martin |
---|---|---|---|
Priority: | major | Milestone: | 0.0.7.x |
Component: | client | Version: | 0.0.7.32 |
Keywords: | Cc: |
Description (last modified by )
Solutions proposed:
- doing the key repeat client-side - this does not seem to be a workable solution, it is likely to break applications which are currently working just fine
- a better approach would be to require the client to confirm the key repeat within a certain amount of time: that way we can keep the key pressed on the server side, but we unpress it if we don't receive a "key-still-pressed" from the client in time. This allows us to maintain a consistent server-side state but we can also detect jitter on the line and take appropriate counter measures. (if we do eventually get the "key-still-pressed" packet after the timeout, we just press it again). The downside is that the line jitter will probably still cause keys to be pressed more than once (probably twice), just not dozens of times as it is now. Difficulty comes from the fact that the key-repeat delay is configurable and that it may well be shorter than the connection's transmit time when accounting for jitter.. And that we would need to communicate this delay to the client, and that those "key-still-pressed" messages waste precious bandwidth (although hardly more so than the pure client-side option)
- letting the server increase the key repeat delay - we probably would need some kind of timing information with the packets so the server can have an idea of latency (this may be worth having in any case)
- sending key events ahead of anything else (even mouse motion?) - this would not solve all cases, but work for cases where the high latency is caused by xpra packets
Attachments (5)
Change History (26)
comment:1 Changed 9 years ago by
Status: | new → accepted |
---|---|
Summary: | high latency link causes keys to repeat → jitter on link causes keys to repeat |
comment:2 Changed 9 years ago by
Description: | modified (diff) |
---|
comment:3 Changed 9 years ago by
comment:4 Changed 9 years ago by
I'm not the original reporter, just discovered xpra a few days ago and ran straight into this problem quite badly (Wifi + cable modem + multihop SSH doesn't help jitter). Have tried briefly now with 0.0.7.28 and it works fine for me. Will report back after some prolonged use.
PS. xpra is awesome, thanks!
comment:5 Changed 9 years ago by
Don't know whether that's jitter or just a programming bug ...
I used to have the repeat error, until I used "xset r rate" to turn off key repeat on the server. (This was with 0.7.27 or something like that.)
Now I'm on 0.7.30; the problem is now a bit different. In xchat I switch the channels with CTRL+PageUp
(resp. PageDown
); this used to work, but now it switches many channels at once. (Easily visible because all visited channels loose the "unread messages" color.)
I hold STRG, and press PageUp/PageDown
for a very short time - but it seems that it gets sent multiple times to the server.
Perhaps it's just a very fast repetition?
"xset q" gives me (on my primary X)
auto repeat delay: 250 repeat rate: 30
so it should take at least 250msec for repeat to kick in - and I'm holding the keys _not_ that long. Perhaps the time for STRG is accounted somehow?
comment:6 Changed 9 years ago by
Damn, I'm not seeing this here at all... Can you tell me your exact client and server OS/lib versions?
My desktop is set to 500/33 and when I test the key repeat with a simple xterm (no xchat installed here), I see the expected number of keys at the expected rate... can you try that too?
Can you please also check "xset q
" from within an xpra session to check that it did get propagated to the virtual session correctly?
Otherwise, the last option is to look at the debug log with: "xpra -d all | grep -i key
" (this is going to be quite verbose but should tell us what key events we simulate on the virtual session and for what reason)
comment:7 Changed 9 years ago by
Both are at 0.0.7.30+dfsg-1.
Yes, "xset q" via xpra gives the same result: auto repeat delay: 250 repeat rate: 30
Here's the logfile.
comment:8 Changed 9 years ago by
The client log looks absolutely fine, can you post the server log? (same grepping)
More details: the key repeat timer fires for the Control
key, but the one for the Page_Up
key gets cancelled before firing (since you released the key).
comment:9 Changed 9 years ago by
Well, that means killing the running clients ... or should a "xpra upgrade -d all" work here?
BTW, I'm in #winswitch now, easier to talk there.
comment:10 Changed 9 years ago by
The client log shows that the client correctly sends "key-repeat
" messages to the server.
And here is the relevant bit from the server log:
got ['key-action', 1, 'Control_L', 1, [], 65507, '', 37] now 1pressing keycode=37, keyname=Control_L handle keycode pressing 37: key Control_L scheduling key repeat timer with delay 250 got ['key-repeat', 37] cancelling key repeat timer: 742 scheduling key repeat timer with delay 250 got ['key-repeat', 37] (...) got ['key-action', 1, 'Page_Down', 1, ['control'], 65366, '', 117] now 1pressing keycode=117, keyname=Page_Down handle keycode pressing 117: key Page_Down scheduling key repeat timer with delay 250 got ['key-repeat', 37] cancelling key repeat timer: 739 scheduling key repeat timer with delay 250 got ['key-repeat', 37] cancelling key repeat timer: 742 scheduling key repeat timer with delay 250 key repeat timeout for keycode 117 / 'Page_Down' - clearing it handle keycode releasing 117: key Page_Down cancelling key repeat timer: 741 key repeat timeout for keycode 37 / 'Control_L' - clearing it handle keycode releasing 37: key Control_L cancelling key repeat timer: 746 got ['key-repeat', 37] cancelling key repeat timer: 746 scheduling key repeat timer with delay 250 got ['key-repeat', 37] cancelling key repeat timer: 777 scheduling key repeat timer with delay 250 got ['key-action', 1, 'Page_Down', 0, ['control'], 65366, '', 117] now 0pressing keycode=117, keyname=Page_Down handle keycode 117: key Page_Down was already pressed/released, ignoring got ['key-repeat', 37] cancelling key repeat timer: 778 scheduling key repeat timer with delay 250 got ['key-repeat', 37] cancelling key repeat timer: 779 scheduling key repeat timer with delay 250
Between the moment the "Page_Down
" is pressed and released (look for "key-action
" messages), we get a "key repeat timeout
" for both "Control_L
" (37) and "Page_Down
" (117).
This means that both the key-release ("key-action
" with first value 1) event for Control_L
and the usual "key-repeat
" for "Control_L
" took an unusually long time to reach the server's read processing loop.
Since this does not happen at any other time during this log trace, it would seem that whatever "Control_L + Page_Down
" triggered (a screen update is quite likely), it also caused the delay in processing/receiving those key messages and therefore caused the keys to remain pressed longer than intended.
It would be quite useful to print timing information in the log messages to be able to confirm this...
comment:11 Changed 9 years ago by
Idea discussed on IRC: provide the key-release messages additionally via an UDP channel (openvpn?), so that a single packet loss is not enough to cause the problem.
(This assumes that the problem is a dropped TCP packet, with the timeout, retransmission, etc.)
comment:12 Changed 9 years ago by
The threaded server introduced in r284 drastically reduces latency on the server and r285 prioritizes "key-repeat" packets (just in case - probably not making a huge difference).
Please let me know if this improves the situation, and if the problem still occurs, what is the load on both server and client at the time?
The only location where the client can lag is when receiving very large damage data, at the moment this is processed in the main thread (see draw()
in client.py
).
The same threaded approach cannot be used here as the expensive call is in gtk - which therefore must be called from the main thread... but maybe we can split the loading in chunks (won't be easy):
loader = gtk.gdk.PixbufLoader(coding) loader.write(img_data, len(img_data)) loader.close() pixbuf = loader.get_pixbuf()
The splitting in chunks is not the hard part (decide on a chunk size and just consume the buffer via idle_add
calls), the problem is that we would end up consuming network packets faster than we actually process them... so the "damage-sequence" code would need to be changed (not sure if firing it after we actually finish loading and draw something would be enough: we may have accumulated too much pending data already...)
comment:13 Changed 9 years ago by
r288 makes the parsing and decompression of network packets threaded so we don't starve the main loop for long. This reduces the latency significantly when dealing with very large packets (like full screen updates at high res).
This helps the client the most as it receives much more data than the server.
comment:14 Changed 9 years ago by
If you add the latency-measurement, please provide some way to get the current value - dbus, commandline, whatever.
At least for bug reports it might be interesting to have an floating average for the last 1,5,10, and 60 seconds, too ;)
comment:15 Changed 9 years ago by
This simple patch just sends a ping every second and prints the latency when the echo arrives - comments/ideas welcome:
Index: xpra/server.py =================================================================== --- xpra/server.py (revision 322) +++ xpra/server.py (working copy) @@ -1292,6 +1292,8 @@ self._send_new_window_packet(window) if self.send_cursors: self.send_cursor() + if True: + self.send_ping() def send_hello(self, client_capabilities): capabilities = {} @@ -1533,6 +1535,16 @@ log("received sequence: %s", packet_sequence) self._server_source.last_client_packet_sequence = packet_sequence + def send_ping(self): + if not self._protocol._closed: + self._send(["ping", long(1000.0*time.time())]) + gobject.timeout_add(1000, self.send_ping) + + def _process_ping_echo(self, proto, packet): + (_, ts) = packet + now = long(1000.0*time.time()) + log.info("ping_echo time=%s, now=%s, ts=%s, latency=%s", time.time(), now, ts, now-ts) + def _process_buffer_refresh(self, proto, packet): (_, id, _, jpeg_qual) = packet if self.encoding=="jpeg": @@ -1591,6 +1603,7 @@ "shutdown-server": _process_shutdown_server, "jpeg-quality": _process_jpeg_quality, "damage-sequence": _process_damage_sequence, + "ping_echo": _process_ping_echo, "buffer-refresh": _process_buffer_refresh, "desktop_size": _process_desktop_size, "encoding": _process_encoding, Index: xpra/client.py =================================================================== --- xpra/client.py (revision 315) +++ xpra/client.py (working copy) @@ -876,6 +876,11 @@ log("_process_notify_close(%s)", id) self._client_extras.close_notify(id) + def _process_ping(self, packet): + (_, ts) = packet + log.info("ping: %s", ts) + self.send(["ping_echo", ts]) + def _process_window_metadata(self, packet): (_, id, metadata) = packet window = self._id_to_window[id] @@ -918,6 +923,7 @@ "bell": _process_bell, "notify_show": _process_notify_show, "notify_close": _process_notify_close, + "ping": _process_ping, "window-metadata": _process_window_metadata, "configure-override-redirect": _process_configure_override_redirect, "lost-window": _process_lost_window,
comment:16 Changed 9 years ago by
Sorry, I can't test that ...
# patch -l --dry-run -p1 /usr/lib/xpra/xpra/client.py < /tmp/xpra-patch patching file /usr/lib/xpra/xpra/client.py Hunk #1 FAILED at 1292. Hunk #2 FAILED at 1533. Hunk #3 FAILED at 1591. 3 out of 3 hunks FAILED -- saving rejects to file /usr/lib/xpra/xpra/client.py.rej patching file /usr/lib/xpra/xpra/client.py Hunk #1 succeeded at 877 (offset 1 line). Hunk #2 succeeded at 924 (offset 1 line).
This is with xpra 0.0.7.31.
Changed 9 years ago by
Attachment: | ping.patch added |
---|
patching adding latency measurement via simple ping
comment:17 Changed 9 years ago by
[~/Xpra/trunk/src] patch -p0 < ./ping.patch patching file xpra/server.py patching file xpra/client.py
Applies cleanly to trunk
Changed 9 years ago by
Attachment: | session-info-oldservers.png added |
---|
new session-info window as shown when connecting to old servers (limited details)
Changed 9 years ago by
Attachment: | session-info-newservers.png added |
---|
session-info for new servers (more details)
comment:18 Changed 9 years ago by
I needed an easy way to see the latency from both sides, so r331 now has a dedicated window for seeing this sort of useful debugging information.
See screenshots.
Please try it out and report on the type of values you see when problems occur.
comment:19 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
r349 adds the "--no-keyboard-sync
" option, the man page reads:
Disables keyboard state synchronization. Normally the key presses and key release events are sent to the server as they occur so that the server can maintain a consistent keyboard state. Using this option can prevent keys from repeating unexpectedly on high latency links but it may also disrupt applications which access the keyboard directly (games, etc).
This will not be made the default and even if we somehow hooked the latency measurements, we probably would not want to enable this on-the-fly automatically.
Feel free to re-open if you can still reproduce.
comment:20 Changed 9 years ago by
Version: | → 0.0.7.32 |
---|
comment:21 Changed 6 weeks ago by
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/21
Took approach number 2 in r229: we keep the benefits of doing it all server side (work with all apps that rely on keyboard state) but we rely on the client to regularly send us the "key-repeat" signal. If we don't get a signal in time, we unpress the key. (we also unpress all keys whenever we lose focus which should solve other cases too)
I have tested this by setting an artificially low timeout and the key was unpressed as expected. (could also test by delaying the "key-repeat" signal - not done yet).
Can you please confirm if this works for you?