Xpra: Ticket #1853: After connection timeout, Xpra-Launcher.exe does not terminate but stays as a zombie process

For quite a long time (at least 2.1.3, r17250) but also with the newest 2.3 (r19246) I suffer from the following issue:

In Windows 10, when the connection to an xpra server drops I get the usual message (TortoisePlink Fatal Error Network error: Software caused connection abort) and then I quit the Xpra dialog (which invites me to reconnect) but then Xpra-Launcher.exe stays as zombie process in the task manager forever until I manually kill them.

Sometimes I use xpra so regularly that I end up having hundreds of zombies.

This effect does not show up when I manually quit the session via the context menu.

I start the session via an xpra file that has for example the following content:

username=user
encoding=rgb
ssh_port=22
speed=0
min-speed=0
host=terminal-server
min-quality=30
mode=ssh
ssh=plink -noagent -i c:\\Users\\user\\AppData\\Roaming\\Xpra\\pubkey.ppk
opengl=yes
password=
quality=0
port=2012
autoconnect=True
speaker=off

In case it helps I attach screenshots of Process Explorer showing the threads and the stack of such a zombie process. NtUserMsgWaitForMultipleObjectsEx indicates to me that it waits for something (i.e. feedback from the connection) which never occurs.



Thu, 24 May 2018 02:57:05 GMT - Lukas Haase: attachment set


Thu, 24 May 2018 02:57:14 GMT - Lukas Haase: attachment set


Thu, 24 May 2018 02:57:22 GMT - Lukas Haase: attachment set


Thu, 24 May 2018 04:59:03 GMT - Antoine Martin: status, description changed; milestone set

Thanks for the detailed bug report, I'll try to reproduce. The NtUserMsgWaitForMultipleObjectsEx is somewhere in glib, so we'll have to find a way to cancel that or just not get in that situation in the first place..


Thu, 24 May 2018 15:54:18 GMT - Antoine Martin: owner, status changed

Should be fixed in r19412. r19390 is also worth having.

AFAICT, this bug has been present forever.

Please try one of the latest beta 2.4 builds from https://xpra.org/beta/windows. If the fix works for you, this can be included in the 2.3.1 release.


Sun, 27 May 2018 06:47:17 GMT - Lukas Haase:

Antoine, thanks for working on this so quickly. It's a pleasure how much effort is put into Xpra!

Just a quick question: What are the differences on the files from https://xpra.org/beta/windows/ ?

My assumptions: 1.) Xpra_Setup_2.4-r19414M.exe = Xpra_Setup.exe 2.) Xpra_2.4-r19414M.zip = Xpra.zip 3.) Xpra.zip: no installer

What's left: 1.) Xpra vs. Xpra-Client? 2.) Xpra vs. Xpra-python3 (and Xpra-Client vs. Xpra-Client-Python3)?

On the topic: Can it be that something more profound changed for this beta? When I open my connection via xpra file, as usual, I get back to the connection window with the error: "SSH connection failure". When I try with command line:

Xpra_cmd.exe attach ssh:me@server.edu:1985
2018-05-26 23:29:47,112 Xpra gtk2 client version 2.3-r19246 64-bit
2018-05-26 23:29:47,115  running on Microsoft Windows 10
2018-05-26 23:29:47,354 GStreamer version 1.14.0 for Python 2.7.15 64-bit
2018-05-26 23:29:47,518 OpenGL_accelerate module loaded
2018-05-26 23:29:47,532 Using accelerated ArrayDatatype
2018-05-26 23:29:47,885 Warning: vendor 'Intel' is greylisted,
2018-05-26 23:29:47,886  you may want to turn off OpenGL if you encounter bugs
2018-05-26 23:29:47,940 OpenGL enabled with Intel(R) HD Graphics 4000
2018-05-26 23:29:47,956  desktop size is 1366x768 with 1 screen:
2018-05-26 23:29:47,956   Default (361x203 mm - DPI: 96x96) workarea: 1366x738
2018-05-26 23:29:47,957     DISPLAY1 (277x156 mm - DPI: 125x125)
2018-05-26 23:29:47,967  keyboard settings: layout=us
2018-05-26 23:29:48,626 Error: failed to receive anything, not an xpra server?
2018-05-26 23:29:48,628   could also be the wrong protocol, username, password or port
2018-05-26 23:29:48,631 Connection lost

Server says:

$ xpra --version
xpra v1.0.2-r14941

(I can imagine that the server might be pretty old but it has to run on an old RedHat machine. Besides, I don't have root rights)


Sun, 27 May 2018 08:33:48 GMT - Antoine Martin:

My assumptions: (..)

Correct.

More details can be found here: wiki/Download.

On the topic: Can it be that something more profound changed for this beta? When I open my connection via xpra file, as usual, I get back to the connection window with the error: "SSH connection failure".

The only change related to SSH connections is this fix: r19430. It shouldn't have any effect if you don't specify a password to use, and the build number you are showing doesn't have this change:

2018-05-26 23:29:47,112 Xpra gtk2 client version 2.3-r19246 64-bit

This looks like a standard 2.3 release build, not the latest beta which should be 2.4-r19414 or later, r19487 was added to the repository today.

To debug SSH connection issues, you can set the environment variable: XPRA_SSH_DEBUG=1.

xpra v1.0.2-r14941

The current 1.x version is 1.0.11, there have been many critical bug fixes since 1.0.2 (crashes, etc)... use at your own risk!


Sun, 27 May 2018 08:56:30 GMT - Lukas Haase:

To debug SSH connection issues, you can set the environment variable: XPRA_SSH_DEBUG=1.

set XPRA_SSH_DEBUG=1 before calling Xpra_cmd.exe does not do anything for me, unfortunately.

However, I started now:

xpra_cmd -d all attach ssh/me@server/1985

Does this help at all?

[...]
2018-05-27 01:47:03,332 callbacks for event WM_DWMNCRENDERINGCHANGED: None
2018-05-27 01:47:03,333 WM_DWMNCRENDERINGCHANGED: 1 / 0
2018-05-27 01:47:03,340 DefWindowProc(12981958, 799L, 1L, 0)=0
2018-05-27 01:47:03,341 NotifyIconWndProc(22877740, 799L, 1L, 0) instance=<xpra.platform.win32.win32_NotifyIcon.win32NotifyIcon object at 0x09fcb690>, message(799)=None
2018-05-27 01:47:03,342 io_thread_loop(read, <bound method Protocol._read of Protocol(Pipe(ssh://me@server/:1985))>) loop starting
2018-05-27 01:47:03,737 read_parse_thread_loop starting
2018-05-27 01:47:03,738 read thread: eof
2018-05-27 01:47:03,744 parse thread: empty marker, exiting
2018-05-27 01:47:03,745 io_thread_loop(read, <bound method Protocol._read of Protocol(Pipe(ssh://me@server/:1985))>) loop ended, closed=False
2018-05-27 01:47:03,746 Protocol.close() closed=False, connection=Pipe(ssh://me@server/:1985)
2018-05-27 01:47:03,748 Protocol.close() calling <bound method TwoFileConnection.close of Pipe(ssh://me@server/:1985)>
2018-05-27 01:47:03,749 Pipe(ssh://me@server/:1985).close() close callback=<function stop_tunnel at 0x0a051d30>, readable=<open file '<fdopen>', mode 'rb' at 0x0a003e90>, writeable=<open file '<fdopen>', mode 'wb' at 0x0a003ee8>
2018-05-27 01:47:03,749 Pipe(ssh://me@server/:1985).close() calling <function stop_tunnel at 0x0a051d30>
2018-05-27 01:47:03,750 Pipe(ssh://me@server/:1985).close() done
2018-05-27 01:47:03,754 terminate_queue_threads()
2018-05-27 01:47:03,755 write thread: empty marker, exiting
2018-05-27 01:47:03,755 Protocol.close() done
2018-05-27 01:47:03,756 Protocol.close() closed=True, connection=None
2018-05-27 01:47:03,757 check_server_echo(0) last=True, server_ok=True (last_ping_echoed_time=0)
2018-05-27 01:47:03,757 io_thread_loop(write, <bound method Protocol._write of Protocol(None)>) loop ended, closed=True
2018-05-27 01:47:03,758 Error: failed to receive anything, not an xpra server?
2018-05-27 01:47:03,759   could also be the wrong protocol, username, password or port
2018-05-27 01:47:03,760 Connection lost
2018-05-27 01:47:03,760 GTKXpraClient.quit(1) current exit_code=None
2018-05-27 01:47:03,761 UIXpraClient.cleanup()
[...]

The current 1.x version is 1.0.11, there have been many critical bug fixes since 1.0.2 (crashes, etc)... use at your own risk!

I just saw we're using the winswitch.repo (http://winswitch.org/dists/CentOS). Maybe a yum update or so is sufficient. I'll check with my sysadmin. (System is CentOS 6.8)


Sun, 27 May 2018 09:10:08 GMT - Antoine Martin:

set XPRA_SSH_DEBUG=1 before calling Xpra_cmd.exe does not do anything for me, unfortunately.

This will log the ssh command string used, but you've omitted most of the logs, so it cannot be seen.

Does this help at all? (..)

No, the ssh connection bits would be near the top, which is missing.

Maybe a yum update or so is sufficient.

It should be enough. There are some occasional dependency issues with older distros like centos 6.8 (ie: ffmpeg rpm soname version conflicts), let me know if you hit any.


Mon, 25 Jun 2018 08:56:50 GMT - Antoine Martin: status changed; resolution set


Sat, 30 Jun 2018 06:23:18 GMT - Lukas Haase:

After some debugging, I think I could track it down: There seems to be a difference in how path is resolved and hence, which plink.exe is used.

When I just enter plink.exe in the command prompt, I get the TortoisePlink, Release 0.68. Strangely enough, this does not work: When I execute plink user@server, it immideately returns.

It seems that the old xpra version always used the plink.exe that's shipped with it (in the same directory) whereas the new one uses the one in the path first which is fairly strange! I will create a new bug report for this one.

In any case, in my with the newest version (2.3.2-r19729), it seems that this is fixed. At least with the tests that I made so far. If it appears again, I will re-open this ticket.


Sat, 30 Jun 2018 07:02:19 GMT - Antoine Martin:

It seems that the old xpra version always used the plink.exe that's shipped with it (in the same directory) whereas the new one uses the one in the path first which is fairly strange! I will create a new bug report for this one.

New ticket: #1892


Sat, 23 Jan 2021 05:35:31 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1853