xpra icon
Bug tracker and wiki

Opened 6 years ago

Closed 6 years ago

#306 closed defect (duplicate)

Windows client not responding to server pings and timing out after 60 seconds

Reported by: alas Owned by: alas
Priority: major Milestone: 0.9
Component: client Version:
Keywords: Cc:

Description (last modified by Antoine Martin)

This is an odd one. There is one, and only one (so far) windows client that drops xpra sessions after 60 seconds at server request due to client ping timeout.

(Other clients seem to work fine. The machine in question, however, has been dropping xpra sessions repeatedly for some weeks now with every new version of xpra. What could be different?)

The client does this even after uninstalling xpra, uninstalling anti-virus software and disabling windows firewall, before re-installing xpra.

It does it with 0.9.0 beta and 0.8.8 stable xpra windows client, with 0.9.0 fedora server xpra. With a 0.8.5 fedora server, however, the sessions don't drop.

--enable-pings doesn't have any effect (server or client-side). Sound and video are working, until the 60 second timeout occurs. Windows clients no longer have the option of turning off sound (short of disabling pulseaudio).

It looks like there is a problem, sometimes, with the client-side ping response to the server. Maybe a patch that includes an option for debugging output for ping response client-side would provide more insight?

Attachments (2)

WindowsClient_droppingXpra_log.PNG (22.7 KB) - added by alas 6 years ago.
WindowsClient_PingTimeout_Log
JimBeam_Xpra0.9.0_droppingBug_Server-side.PNG (20.6 KB) - added by alas 6 years ago.
Xpra0.9.0_droppingBug_Server-side_Log

Download all attachments as: .zip

Change History (6)

Changed 6 years ago by alas

WindowsClient_PingTimeout_Log

Changed 6 years ago by alas

Xpra0.9.0_droppingBug_Server-side_Log

comment:1 Changed 6 years ago by Antoine Martin

Owner: changed from Antoine Martin to alas

I know it's a real pain on win32, but please include the log files as cut&copied text in code blocks and not as a screenshots (which isn't searchable or cut&pastable), use shell redirection if needed.
None of the gmail screenshot links work, please use the attachment feature of trac instead.

Also, please include the full version numbers and always test with the latest stable releases as a baseline (0.8.8 at time of writing).


As for the bug itself, I've mentioned the following things to Smo:

  • try with sound on/off (ie: --no-speaker from client)
  • switch video encoding (ie: try --encoding=rgb24)
  • --enable-pings - on both sides, preferably one at a time (does not help apparently)
  • run with "-d all" and post the log output
  • have a look at %PYTHONPATH% and clear it before running from the command line if necessary

Since the problem is limited to this one machine, I strongly suspect that something in the environment is making it misbehave, so please check other env variables too (%PATH%, etc) and look for things that may clash with our binary (OTOH: cygwin, gstreamer, gtk, etc)

You can also verify whether the ping-echo is being sent or not using wiki/Debugging (run with XPRA_USE_ALIASES=0 and -z 0)
You will then see:

XPRA_USE_ALIASES=0 xpra attach -z 0 tcp:127.0.0.1:10000 &
ngrep -d lo "ping" port 10000
##
T 127.0.0.1:57491 -> 127.0.0.1:10000 [AP]
  P.........pingA...=..."                                                                                    
##
T 127.0.0.1:10000 -> 127.0.0.1:57491 [AP]
  P.........ping_echoA...=..."?.B?..?.X.                                                                     

Just be aware that each end pings the other (which can be confusing).


Windows clients no longer have the option of turning off sound (short of disabling pulseaudio).

For this, please see ticket/297


Failing everything else, this is just a wild guess, but does this client patch make any difference?:

--- src/xpra/client_base.py	(revision 3046)
+++ src/xpra/client_base.py	(working copy)
@@ -342,7 +342,7 @@
             packet_type = self._aliases.get(packet_type)
         handler = self._packet_handlers.get(packet_type)
         if handler:
-            handler(packet)
+            gobject.idle_add(handler, packet)
             return
         handler = self._ui_packet_handlers.get(packet_type)
         if not handler:
Last edited 6 years ago by Antoine Martin (previous) (diff)

comment:2 Changed 6 years ago by Antoine Martin

Description: modified (diff)

Adding some info received via email (...) and edited bug text to remove the dead links.


Summary:

  • affects 0.8.8 and 0.9.0 clients on this particular machine only
  • audio and video continue right up until the point where the connection is terminated
  • running with --no-speaker does not make any difference (I still find the fact that the last ngrep packet shown to be an "mp3" packet is a little bit suspicious... but sound has caused so many problems all over the place that I may well be biased!)
  • the client debug log is full of:
    average server latency=0.0380001068115, using max wait=1.07600021362
    ping echo server load=(10, 50, 50), measured client latency=6ms
    

so the server is responding to the pings

  • when the server drops the connection, it successfully sends us the polite message:
    server requested disconnect: client ping timeout, - waited 60 seconds without a response
    

(so the network layer seems to be ok at that point still)

  • ngrep shows the pings from the client and the ping-echos from the server:
    T 10.0.11.57:60064 -> 10.10.10.109:1200 [AP]
    
      P.........pingA...=.%,l
    
    ########################
    
    T 10.0.11.57:60064 -> 10.10.10.109:1200 [AP]
    
      P.........ping_echoA...=.&p...."
    

But strangely enough, not a single ping from the server (and therefore no echo from the client)

  • client is a Windows 7 64-bit OS
  • environment does not contain anything too suspicious (though it has python installed)

Things left to test:

  • 0.8.6 and 0.8.7 fedora servers
  • bisect 0.8.x branch (server) between the last known good revision (0.8.5 which is r2743) and 0.8.6 / 0.8.7 / 0.8.8 (depending on result of first step)
  • try the patch from comment:1 (on each side, preferably one at a time)
  • run the server with --no-pulseaudio ?
  • rename C:\Python27 temporarily (to take it out of all %PATH%s and other path references)
  • maybe try creating a new user account on the same machine
  • r3052 adds logging to the server send_ping function, so running the server in debug mode should show lines like these:
    sending ping to \
        Protocol(SocketConnection(('127.0.0.1', 10000) - ('127.0.0.1', 32922))) with time=1365081446346
    sending ping to \
        Protocol(SocketConnection(('127.0.0.1', 10000) - ('127.0.0.1', 32922))) with time=1365081447348
    

(though this only tells us that we ask for the packet to be sent.. no guarantees that it does get sent if the network layer is somehow broken)

  • post the contents of %APPDATA%/xpra/xpra.conf (if any) and C:\Program Files (x86)\xpra\xpra.conf (I can't see why it would be unusual or how it could cause problems... but since we're out of ideas..)

Once the problematic commit is identified, we then need to figure out what to change on the client side to make it all work again. (unless there really is something wrong server-side..)

Last edited 6 years ago by Antoine Martin (previous) (diff)

comment:3 Changed 6 years ago by alas

Status: newassigned

Without changing anything client-side, but using a new fedora build server-side to include debugging tools- the problem seems to have gone into hiding. (Trying it with yet a different server also seems to have led the problem to go into hiding.)

I'll leave the ticket open for a time to see if it reappears.

comment:4 Changed 6 years ago by Antoine Martin

Resolution: duplicate
Status: assignedclosed

Fixed - see #308

Note: See TracTickets for help on using tickets.