Xpra: Ticket #912: "Too many open files" caused by 0.14 clients with trunk (0.16) servers, after many sound restarts

My OSX clients running in vbox get lots of restarts, which makes us restart a new process on the server. Eventually leading to:

2015-07-08 11:22:39,408 error setting up sound: [Errno 24] Too many open files
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 799, in start_sending_sound
  File "/usr/lib64/python2.7/site-packages/xpra/sound/wrapper.py", line 192, in start
  File "/usr/lib64/python2.7/site-packages/xpra/net/subprocess_wrapper.py", line 313, in start
    self.process = self.exec_subprocess()
  File "/usr/lib64/python2.7/site-packages/xpra/net/subprocess_wrapper.py", line 338, in exec_subprocess
    proc = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=sys.stderr.fileno(), env=self.get_env(), **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 702, in __init__
    errread, errwrite), to_close = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.7/subprocess.py", line 1130, in _get_handles
    c2pread, c2pwrite = self.pipe_cloexec()
  File "/usr/lib64/python2.7/subprocess.py", line 1175, in pipe_cloexec
    r, w = os.pipe()
OSError: [Errno 24] Too many open files

And maybe other weird behaviour as we run out of resources.

Relevant links:

The subprocess should be garbage collected after we call terminate() and cause a SIGINT: the child reaper has code to explicitly "forget" about the dead process.

Wed, 08 Jul 2015 06:52:05 GMT - Antoine Martin: status changed

I can see the leak simply by keeping an eye on:

watch "ls -la /proc/19763/fd  | wc -l"

It goes up by 8 file descriptors with every restart!

And you can trigger this bug with 0.14.x clients more easily by using XPRA_SOUND_FAKE_OVERRUN=1 xpra attach ..

Wed, 08 Jul 2015 10:18:12 GMT - Antoine Martin: status changed; resolution set

The leak was coming from the new palib (used in trunk only) and not from the subprocess code!

Wed, 15 Jul 2015 01:59:02 GMT - Antoine Martin: summary changed

Thu, 06 Aug 2015 07:45:54 GMT - Antoine Martin:

I was going to replace this palib code with a "more simple" dbus version (in a separate process if needed, since python dbus does not play well with threading - and we want synchronous calls, which may not be possible).

But oh my. Every single example out there of using pulseaudio via python dbus is completely broken, at least with Fedora 22 and Ubuntu Vivid. They just don't run because the connection fails. This includes the reference code: connecting to server, which spends a lot of lines of code trying to figure out where the pulseaudio socket lives (and why do they keep on changing it without keeping at least a symlink in the old location?), and still gets it wrong. Looks lile it should be using /run/user/1000/pulse/native instead (for uid=1000).

Once you get past this hurdle, you get the very unhelpful org.freedesktop.DBus.Error.NoReply, which doesn't tell you why things don't work at all, and from then on your connection is dropped and you get org.freedesktop.DBus.Error.Disconnected. The only thing I found was in the system log: [pulseaudio] pstream.c: Received SHM frame on a socket where SHM is disabled. I didn't ask for SHM and I don't need it either, wth?

This is par for the course given the constant API breakage coming from systemd / freedesktop (previous example here: ticket:492#comment:3)

Options left:

That's enough time wasted for now.

Sun, 20 Mar 2016 01:09:46 GMT - Antoine Martin:

The palib code is not getting fixed anytime soon, removed completely in r12177.

Sun, 20 Mar 2016 11:36:12 GMT - Antoine Martin:

See #1148.

Sat, 23 Jan 2021 05:09:35 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/912