#352 closed defect (fixed)
Killing server with SIGINT doesn't remove socket, doesn't kill Xvfb/Xdummy
Reported by: | ahuillet | Owned by: | ahuillet |
---|---|---|---|
Priority: | major | Milestone: | 0.10 |
Component: | server | Version: | |
Keywords: | Cc: |
Description (last modified by )
Hello,
when killing the server with SIGINT, the socket isn't removed, so a next run of the server does the following:
/home/arthur/.xpra/Chani-1 is not responding, waiting for it to timeout before clearing it.....
It also doesn't kill X, so the Xdummy process remains, and the server doesn't start.
Trimmed output when killed with control-C:
got signal SIGINT, exiting Tray.cleanup() Tray.cleanup() done failed to release dbus notification forwarder: \ org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \ Possible causes include: the remote application did not send a reply, \ the message bus security policy blocked the reply, \ the reply timeout expired, or the network connection was broken. cleanup will disconnect: []
Change History (19)
comment:1 Changed 8 years ago by
Description: | modified (diff) |
---|---|
Milestone: | → 0.10 |
Owner: | changed from Antoine Martin to Antoine Martin |
Status: | new → assigned |
comment:2 Changed 8 years ago by
Description: | modified (diff) |
---|---|
Owner: | changed from Antoine Martin to ahuillet |
Status: | assigned → new |
comment:3 Changed 8 years ago by
Yes, I need --no-notifications which I forgot to add on the commandline. I don't think that changes my problem, which I can see both on Archlinux and CentOS 6.2.
comment:4 Changed 8 years ago by
OK, I simply cannot reproduce with Fedora 19 (python2.7) but with CentOS
6.4 I can (maybe related to the old version of python?), about one in 10 attempts fails to run the cleanups on exit (sometimes repeatedly, sometimes impossible to reproduce - I cannot discern any patterns here yet).
Do you have steps that make it more likely to trigger this bug? (hard fix without being to reproduce reliably, even harder to confirm a fix should I find one to test)
I've tried with a client connected and without, with apps and without, etc..
Is trunk also affected?
When we exit on a "deadly signal", we schedule the actual quit() call to run from a timer soon after (0.5s in trunk), so that we still get a chance to run the cleanups that need a working main loop to complete or just a bit of time to do what they needed (sending a "shutdown" packet to clients).
Please consider using xpra stop
instead of SIGINT
, as this does not seem too be affected.
comment:5 Changed 8 years ago by
Does this patch help:
### Eclipse Workspace Patch 1.0 #P Xpra Index: src/xpra/scripts/server.py =================================================================== --- src/xpra/scripts/server.py (revision 3587) +++ src/xpra/scripts/server.py (working copy) @@ -396,7 +396,8 @@ if os.name=="posix": os.setsid() try: - xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True, preexec_fn=setsid) + xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True, + stdin=subprocess.PIPE, preexec_fn=setsid) except OSError, e: sys.stderr.write("Error starting Xvfb: %s\n" % (e,)) return 1
According to the answers to this question, it isn't easy to ensure that the subprocess will not be receiving the SIGINT
- this was the easiest way I could find.
More good pointers:
Does this solve the problem?
comment:6 Changed 8 years ago by
Added to trunk as of r3597 as this seems to help us exit cleanly with XShm
enabled.
comment:7 Changed 8 years ago by
Also doing the same thing for pulseaudio and children processes (--start-child=
) in r3651
comment:8 Changed 8 years ago by
Dbus seems to be to blame. I don't use Dbus at all.
With --no-notifications, the problem disappears, but without the option I believe the following message prevents Xpra from cleaning up properly.
2013-07-12 10:12:42,365 failed to release dbus notification forwarder: \ org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \ Possible causes include: the remote application did not send a reply, \ the message bus security policy blocked the reply, the reply timeout expired, \ or the network connection was broken.
This is a pity, I believe you should be able to detect when Dbus is not available and handle it gracefully, should you not?
comment:9 Changed 8 years ago by
I don't see how we can fail to cleanup the notification forwarder without first having succeeded in loading it up, which even includes talking to the dbus daemon to claim the dbus name. Are you sure you don't have a dbus daemon? (maybe from your desktop session?)
You may be able to verify the status of the forwarder with "xpra info" (not sure which versions support that)
Or you can also run a simple notification program from an xterm to see if the notifications are being grabbed somewhere - if no dbus handler exists, the test should fail.
Specifying --no-notifications
prevents loading of the forwarder altogether.
Without a dbus instance, you should see on startup:
error loading or registering our dbus notifications forwarder: (..)
comment:11 Changed 8 years ago by
It does not (with ctrl-C)
Output:
2013-07-12 11:16:51,595 xpra is ready. ^C2013-07-12 11:16:52,625 2013-07-12 11:16:52,626 got signal SIGINT, exiting
Then when restarting:
/home/arthur/.xpra/Gurney-1 is not responding, waiting for it to timeout before clearing it..... (EE) Fatal server error: (EE) Server is already active for display 1 If this server is no longer running, remove /tmp/.X1-lock
(because Xpra didn't kill the server)
comment:12 Changed 8 years ago by
With xpra stop, with or without r3841, killing seems to be done properly.
comment:13 Changed 8 years ago by
Please confirm if r3841 helps with ctrl-C so that I know if it helps at all.
(and if not, maybe take it out)
comment:15 Changed 8 years ago by
I'm seeing similar issues -- lots of zombie Xvfb
processes. I also use Ctrl+C to terminate xpra
, the Xvfb
terminates properly when I ask to xpra stop
.
Debian Lenny, xpra 0.9.0, Python 2.6.6 (marked as outdated in the xpra log: Warning: outdated/buggy version of Python: 2.6.6.final.0
)
comment:16 follow-up: 17 Changed 8 years ago by
Please, do not use 0.9.0
! It is very buggy, the latest stable 0.9.x release is 0.9.8.
As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.
comment:17 Changed 8 years ago by
Replying to totaam:
Please, do not use
0.9.0
! It is very buggy, the latest stable 0.9.x release is 0.9.8.
Sorry, it's not Lenny but Squeeze. Had to update GPG key for APT source winswitch.org
. Now running 0.9.6-1, same behavior. (Why is 0.9.8-1 not available from winswitch.org
for Squeeze but for Raring)?
As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.
Python 2.7 will be available after upgrading the server to Wheezy. I can live with an occasional killall Xvfb
until then.
comment:18 Changed 8 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Had to update GPG key for APT source winswitch.org
See KEYEXPIRED 1273837137
in wiki/FAQ
Why is 0.9.8-1 not available from winswitch.org for Squeeze but for Raring
I stopped making packages for squeeze as it was becoming far too difficult to build anything with so many outdated packages in the distro.
Python 2.7 will be available after upgrading the server to Wheezy
Wheezy gets a lot more testing - and has up to date packages.
I am closing this ticket as FIXED
, feel free to re-open if you see this problem with up to date packages.
comment:19 Changed 5 weeks ago by
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/352
Cannot reproduce.
This is what I see when I use "xpra stop" to stop the session (this solution is preferred to
SIGINT
):TIL:
And now with
killall -SIGINT xpra
:This also worked fine. Using
SIGTERM
has the same effect.I've tried both with and without
--no-daemon
.And now with control-C from the controlling terminal:
The only way that I can get the server to not cleanup its socket is to use
SIGKILL
or to kill the vfb from underneath it, which is not something we can handle gracefully anyway, so just don't do that.Here is the command line I have used for most of this testing:
(adding/removing dbus and pulseaudio does not seem to make any difference)
I am only adding
dbus-launch
because of the warning in your logs, looks to me like you are running from a desktop session and should either be usingdbus-launch
(if you aren't already) or--no-notifications
.