When using windows client 0.8.2 after losing connection to the network I can't connect to the running xpra session again
2013-02-12 00:47:50,665 New connection received: SocketConnection(('1.2.3.100', 1200) - ('1.2.3.4', 61569)) 2013-02-12 00:47:51,076 Handshake complete; enabling connection 2013-02-12 00:47:51,078 Python/Gtk2 win32 client version 0.8.2 connected from 'somecomputer' 2013-02-12 00:47:51,078 using x264 as primary encoding 2013-02-12 00:47:51,078 max client resolution is 1366x768 (from [[1366, 768]]), current server resolution is 1440x900 2013-02-12 00:47:51,079 best resolution matching 1366x768 is unchanged: 1440x900 2013-02-12 00:47:51,080 setting key repeat rate from client: 500ms delay / 33ms interval 2013-02-12 00:47:51,082 keyboard mapping already configured (skipped) 2013-02-12 00:47:51,236 using sound codec mp3 2013-02-12 00:47:51,315 starting sound using pulseaudio device null.monitor 2013-02-12 00:47:51,595 codec: MPEG-1 Layer 3 (MP3) 2013-02-12 00:49:03,835 delayed_region_timeout: sending now - something is wrong! 2013-02-12 00:49:18,901 delayed_region_timeout: sending now - something is wrong! 2013-02-12 00:49:33,960 delayed_region_timeout: sending now - something is wrong! 2013-02-12 00:49:49,005 delayed_region_timeout: sending now - something is wrong! 2013-02-12 00:49:49,072 get_backlog found some damage acks that have been pending for too long, expiring them: [600, 601, 602, 603] 2013-02-12 00:50:04,058 delayed_region_timeout: sending now - something is wrong! 2013-02-12 00:50:04,123 get_backlog found some damage acks that have been pending for too long, expiring them: [604] 2013-02-12 00:50:11,085 New connection received: SocketConnection(('1.2.3.100', 1200) - ('1.2.3.4', 61598)) 2013-02-12 00:50:11,458 Handshake complete; enabling connection 2013-02-12 00:50:11,459 Disconnecting existing client Protocol(SocketConnection(('1.2.3.100', 1200) - ('1.2.3.4', 61569))), reason is: new valid connection received, this session does not allow sharing 2013-02-12 00:50:31,370 unknown or invalid packet type: ping 2013-02-12 00:50:31,371 connection closed after 1 packets received (3382 bytes) and 0 packets sent (0 bytes)
Need more details please, as per wiki/ReportingBugs: full command lines, etc Which connection is the "good" one, etc.
Server OS is Fedora 18 x86_64 Client OS is Windows 7 Ultimate 64 bit
Instructions to reproduce
dbus-launch xpra --bind-tcp=0.0.0.0:1200 --no-daemon --start-child=chromium-browser start :11
C:\Program Files (x86)\Xpra>xpra_cmd attach tcp:1.2.3.100:1200
I've been unable to reproduce on a Fedora 18 host, here's how I tested:
xpra start --bind-tcp=0.0.0.0:10000 :10
xpra attach tcp:192.168.1.1:10000
iptables -I INPUT -p tcp -d 192.168.1.1 --dport 10000 -j DROP iptables -I OUTPUT -p tcp -s 192.168.1.1 --sport 10000 -j DROP
xpra attach tcp:127.0.0.1:10000
iptables -D INPUT 1 iptables -D OUTPUT 1
What I see is that the first client eventually times out (after about 60s, and it says so) and that the second client connects immediately without problems.
Can you confirm that you get the same results?
Now, the main difference between those two tests is the client (Fedora vs Windows), but since the connection is meant to have been severed (by wifi drop out in your test case or iptables rule in my example), I don't really see why this should make any difference at all.... Ideas welcome.
Are you using the UI on MS Windows? Can you try it from the command line instead? Can you test with a plain xterm instead of Chrome?
Also, please try a v0.7.8 client/server so we can figure out if this is a recent regression.
(fixed ticket metadata)
Looking at the log messages again and something looks suspicious:
unknown or invalid packet type: ping
This happens after the old dead connection has been dropped, so it must be dropping the new connection at this point?
But why would it? r2740 will log the source of the invalid packet so we will know for sure (so please update to r2740 or later)
Also, please try both with normal logging and with "-d all
" so we can get more details from the log messages.
comment:6 which has been deleted shows that you are running the server with --no-deamon
and via putty, this is the source of the problem - just do not do that!
Xpra is a server process, running it with --no-deamon
leaves it attached to stdin/stdout and those fds end up being network connections that you then kill - the process is then unable to close properly and the kernel will leave the sockets in dead state for a while and refuse to allow a new process to use them. The exact timeout after you kill -9
the process can be tuned via sysctl IIRC - but you should just never do that.
Alternatives: use screen, use nohup, don't use --no-deamon
, etc..
Please confirm so I can close this ticket as invalid. (always posting the exact command lines and setup environment does save a lot of time)
Please see comment:5, it would be good to figure out where this message came from.
Apart from that, I believe this bug can be closed as invalid.
As requested I tried some older clients I haven't tried older server but here are my findings.
Server Fedora 18 xpra-0.8.5-1.fc18.i386 Client Windows 7 ultimate 32 bit
Client command
Xpra_cmd.exe attach tcp:1.2.3.71:10000
Server command
xpra --no-daemon --bind-tcp=0.0.0.0:10000 --start-child=xterm start :12
Windows client 0.7.8 No problem 0.7.8 disconnects cleanly when it loses its connection
Windows client 0.8.7 2 tests
fails to disconnect cleanly xterm window remains open and non responsive after killing connection on the server end Need to kill task in the task manager to continue. After reconnecting here is the client log
xpra client version 0.8.7 2013-02-28 10:36:43,048 server ping timeout - waited 60 seconds without a respon se 2013-02-28 10:36:43,053 connection closed after 0 packets received (0 bytes) and 5 packets sent (3493 bytes) 2013-02-28 10:36:43,058 Connection lost
And here is the log on the server end
New connection received: SocketConnection(('1.2.3.71', 10000) - ('1.2.3.57', 49187)) Handshake complete; enabling connection Disconnecting existing client Protocol(SocketConnection(('1.2.3.71', 10000) - ('1.2.3.57', 49186))), reason is: new valid connection received, this session does not allow sharing unknown or invalid packet type: ping connection closed after 1 packets received (3373 bytes) and 0 packets sent (0 bytes)
As you can see that mysterious message is back but only with the newer version of the client.
How do you disconnect the existing client, by making a new connection to the server? I am confused because this ticket is originally about disconnection because of network dropouts, but the log you supply indicates a disconnection due to a new client connection - those are very different things. The former will rely on a timeout, which may take some time, the latter should be instantaneous.
Do you get the same behaviour if you Control-C the client?
Does this patch fix the issue?
--- src/xpra/server_source.py (revision 2850) +++ src/xpra/server_source.py (working copy) @@ -398,8 +398,10 @@ self.window_sources = {} self.window_metdata_cache = {} self.close_mmap() - self.protocol = None self.stop_sending_sound() + if self.protocol: + self.protocol.close() + self.protocol = None def user_event(self): self.last_user_event = time.time()
This should make sure we disconnect from the server end, though I would have thought that the client would realize the connection is dead within 1 minute or so as it doesn't get the ping reply.
Also, I don't see why it would help as the server code already does this:
protocol.flush_then_close(["disconnect", reason])
But maybe the write flushing spins endlessly - in which case we may need yet more code to deal with that.. (make it timeout eventually)
The server log is from the new connection from the client log right above it.
I was removing my network cable to test this and then plugging it back in after the time-out to reconnect.
We only stumbled upon it since there is one guy that routinely just shuts his laptop lid and never disconnects his client causing the same issue.
Afterwards the server won't accept any new connections at all.
Oddly enough I built a new server from source today and using the windows client 0.8.4 and 0.8.7 that you built I'm unable to reproduce the problem.
As soon as I connect one of those clients to an older server 0.8.5 32bit that you have built as well I can reproduce it right away.
I guess I will do some further testing and let you know what I find out.
It's a shame it's not very easy to reproduce, because otherwise we could bisect this..
(raising priority as this can cause serious problems for people)
Sorry it has taken so long to fix this, I was finding it hard to reproduce reliably, but in fact this was trivial using a VirtualBox
VM and using "save machine state".
Fixed in r2878, will backport to v0.8.x and release the bug fix.
Follow up:
as of r2894, the server will kill client connections when it does not get a response to its pings (default: ping every 10 seconds, kill after 1 minute)
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/257