Hello,
when killing the server with SIGINT, the socket isn't removed, so a next run of the server does the following:
/home/arthur/.xpra/Chani-1 is not responding, waiting for it to timeout before clearing it.....
It also doesn't kill X, so the Xdummy process remains, and the server doesn't start.
Trimmed output when killed with control-C:
got signal SIGINT, exiting Tray.cleanup() Tray.cleanup() done failed to release dbus notification forwarder: \ org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \ Possible causes include: the remote application did not send a reply, \ the message bus security policy blocked the reply, \ the reply timeout expired, or the network connection was broken. cleanup will disconnect: []
Cannot reproduce.
This is what I see when I use "xpra stop" to stop the session (this solution is preferred to SIGINT
):
New connection received: SocketConnection(/home/antoine/.xpra/desktop-10) connection closed after 0 packets received (0 bytes) and 0 packets sent (0 bytes) Connection lost Handshake complete; enabling connection Python/GObject Linux client version 0.10.0 connected from 'desktop' windows/pixels forwarding is disabled for this client max client resolution is 0x0 (from []), current server resolution is 2560x1600 Shutting down in response to request Disconnecting existing client Protocol(SocketConnection(/home/antoine/.xpra/desktop-10)), reason is: shutting down connection closed after 2 packets received (577 bytes) and 2 packets sent (1101 bytes) xpra client disconnected. Connection lost Connection lost New connection received: SocketConnection(/home/antoine/.xpra/desktop-10) connection closed after 0 packets received (0 bytes) and 0 packets sent (0 bytes) Connection lost xpra is terminating. closing tcp socket 0.0.0.0:10000 removing socket /home/antoine/.xpra/desktop-10 killing xvfb with pid 9346 Server terminated successfully (0). Closing log file.
TIL:
removing socket /home/antoine/.xpra/desktop-10
And now with killall -SIGINT xpra
:
got signal SIGINT, exiting xpra is terminating. closing tcp socket 0.0.0.0:10000 removing socket /home/antoine/.xpra/desktop-10 killing xvfb with pid 9939 Server terminated successfully (0). Closing log file.
This also worked fine. Using SIGTERM
has the same effect.
I've tried both with and without --no-daemon
.
And now with control-C from the controlling terminal:
^C got signal SIGINT, exiting child 'xterm' with pid 10181 has terminated xpra is terminating. closing tcp socket 0.0.0.0:10000 removing socket /home/antoine/.xpra/desktop-10 killing xvfb with pid 10172 Server terminated successfully (0). Closing log file.
The only way that I can get the server to not cleanup its socket is to use SIGKILL
or to kill the vfb from underneath it, which is not something we can handle gracefully anyway, so just don't do that.
Here is the command line I have used for most of this testing:
dbus-launch xpra start :10 --no-daemon --no-pulseaudio
(adding/removing dbus and pulseaudio does not seem to make any difference)
I am only adding dbus-launch
because of the warning in your logs, looks to me like you are running from a desktop session and should either be using dbus-launch
(if you aren't already) or --no-notifications
.
Yes, I need --no-notifications which I forgot to add on the commandline. I don't think that changes my problem, which I can see both on Archlinux and CentOS 6.2.
OK, I simply cannot reproduce with Fedora 19 (python2.7) but with CentOS
6.4 I can (maybe related to the old version of python?), about one in 10 attempts fails to run the cleanups on exit (sometimes repeatedly, sometimes impossible to reproduce - I cannot discern any patterns here yet).
Do you have steps that make it more likely to trigger this bug? (hard fix without being to reproduce reliably, even harder to confirm a fix should I find one to test) I've tried with a client connected and without, with apps and without, etc.. Is trunk also affected?
When we exit on a "deadly signal", we schedule the actual quit() call to run from a timer soon after (0.5s in trunk), so that we still get a chance to run the cleanups that need a working main loop to complete or just a bit of time to do what they needed (sending a "shutdown" packet to clients).
Please consider using xpra stop
instead of SIGINT
, as this does not seem too be affected.
Does this patch help:
### Eclipse Workspace Patch 1.0 #P Xpra Index: src/xpra/scripts/server.py =================================================================== --- src/xpra/scripts/server.py (revision 3587) +++ src/xpra/scripts/server.py (working copy) @@ -396,7 +396,8 @@ if os.name=="posix": os.setsid() try: - xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True, preexec_fn=setsid) + xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True, + stdin=subprocess.PIPE, preexec_fn=setsid) except OSError, e: sys.stderr.write("Error starting Xvfb: %s\n" % (e,)) return 1
According to the answers to this question, it isn't easy to ensure that the subprocess will not be receiving the SIGINT
- this was the easiest way I could find.
More good pointers:
Does this solve the problem?
Added to trunk as of r3597 as this seems to help us exit cleanly with XShm
enabled.
Also doing the same thing for pulseaudio and children processes (--start-child=
) in r3651
Dbus seems to be to blame. I don't use Dbus at all. With --no-notifications, the problem disappears, but without the option I believe the following message prevents Xpra from cleaning up properly.
2013-07-12 10:12:42,365 failed to release dbus notification forwarder: \ org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \ Possible causes include: the remote application did not send a reply, \ the message bus security policy blocked the reply, the reply timeout expired, \ or the network connection was broken.
This is a pity, I believe you should be able to detect when Dbus is not available and handle it gracefully, should you not?
I don't see how we can fail to cleanup the notification forwarder without first having succeeded in loading it up, which even includes talking to the dbus daemon to claim the dbus name. Are you sure you don't have a dbus daemon? (maybe from your desktop session?) You may be able to verify the status of the forwarder with "xpra info" (not sure which versions support that) Or you can also run a simple notification program from an xterm to see if the notifications are being grabbed somewhere - if no dbus handler exists, the test should fail.
Specifying --no-notifications
prevents loading of the forwarder altogether.
Without a dbus instance, you should see on startup:
error loading or registering our dbus notifications forwarder: (..)
Does r3841 help?
It does not (with ctrl-C)
Output:
2013-07-12 11:16:51,595 xpra is ready. ^C2013-07-12 11:16:52,625 2013-07-12 11:16:52,626 got signal SIGINT, exiting
Then when restarting:
/home/arthur/.xpra/Gurney-1 is not responding, waiting for it to timeout before clearing it..... (EE) Fatal server error: (EE) Server is already active for display 1 If this server is no longer running, remove /tmp/.X1-lock
(because Xpra didn't kill the server)
With xpra stop, with or without r3841, killing seems to be done properly.
Please confirm if r3841 helps with ctrl-C so that I know if it helps at all. (and if not, maybe take it out)
No, it does not help, see first line of comment 11.
I'm seeing similar issues -- lots of zombie Xvfb
processes. I also use Ctrl+C to terminate xpra
, the Xvfb
terminates properly when I ask to xpra stop
.
Debian Lenny, xpra 0.9.0, Python 2.6.6 (marked as outdated in the xpra log: Warning: outdated/buggy version of Python: 2.6.6.final.0
)
Please, do not use 0.9.0
! It is very buggy, the latest stable 0.9.x release is 0.9.8.
As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.
Replying to totaam:
Please, do not use
0.9.0
! It is very buggy, the latest stable 0.9.x release is 0.9.8.
Sorry, it's not Lenny but Squeeze. Had to update GPG key for APT source winswitch.org
. Now running 0.9.6-1, same behavior. (Why is 0.9.8-1 not available from winswitch.org
for Squeeze but for Raring)?
As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.
Python 2.7 will be available after upgrading the server to Wheezy. I can live with an occasional killall Xvfb
until then.
Had to update GPG key for APT source winswitch.org
See KEYEXPIRED 1273837137
in wiki/FAQ
Why is 0.9.8-1 not available from winswitch.org for Squeeze but for Raring
I stopped making packages for squeeze as it was becoming far too difficult to build anything with so many outdated packages in the distro.
Python 2.7 will be available after upgrading the server to Wheezy
Wheezy gets a lot more testing - and has up to date packages.
I am closing this ticket as FIXED
, feel free to re-open if you see this problem with up to date packages.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/352