Xpra: Ticket #1871: Cannot start xpra server with 8 nvidia GPUs

I'm using the newest version 2.3.1-r19533 on Ubuntu 16.04 (Xenial), but I cannot start my session on the server. I tried on two servers and both always get the same results. When use the command

xpra start :2233

, it will always say

server failure: disconnected before the session could be established
server requested disconnect: server error (failed to start a new session)

.And in the log file, I see

Fatal server error:
(EE) Server is already active for display 2233
	If this server is no longer running, remove /tmp/.X2233-lock
	and start again.
(EE)

But I have cleared all the processes that start with the letter 'X'... I also use journalctl after set

DEBUG=auth,proxy,util,x11

in /etc/defaults/xpra. Everything seems fine there except these lines,

get_server_state: connect(/run/user/1031/xpra/mvig-2233)=[Errno 111] Connection refused
socket_details: '/run/user/1031/xpra/mvig-2233' state does not match (UNKNOWN vs LIVE)
get_server_state: connect(/home/yelantf/.xpra/mvig-2233)=[Errno 111] Connection refused
socket_details: '/home/yelantf/.xpra/mvig-2233' state does not match (UNKNOWN vs LIVE)
identify_new_socket new_sockets=()
start_server_subprocess failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/xpra/server/proxy/proxy_server.py", line 263, in proxy_session
proc, socket_path, display = self.start_new_session(username, uid, gid, sns, displays)
File "/usr/lib/python2.7/dist-packages/xpra/server/proxy/proxy_server.py", line 419, in start_new_session
proc, socket_path, display = start_server_subprocess(sys.argv[0], args, mode, opts, username, uid, gid, env, cwd)
File "/usr/lib/python2.7/dist-packages/xpra/scripts/main.py", line 1795, in start_server_subprocess
socket_path, display = identify_new_socket(proc, dotxpra, existing_sockets, matching_display, new_server_uuid, display_name, uid)
File "/usr/lib/python2.7/dist-packages/xpra/scripts/main.py", line 1890, in identify_new_socket
raise InitException("failed to identify the new server display!")
InitException: failed to identify the new server display!
\Error: failed to start server subprocess:
\failed to identify the new server display!

. I don't know what's happening, and have no idea to start a new session in xpra now.



Sun, 10 Jun 2018 02:47:25 GMT - yelantf: component changed; keywords set


Sun, 10 Jun 2018 06:59:31 GMT - Antoine Martin: owner, description changed; milestone set; keywords deleted

Everything seems fine there except these lines, socket_details: '/home/yelantf/.xpra/mvig-2233' state does not match (UNKNOWN vs LIVE) identify_new_socket new_sockets=() start_server_subprocess failed

That's odd, I don't have such problems in my Xenial VM. It looks like the server takes way too long to startup. The socket will be in UNKNOWN state until the server finishes starting up then it will be LIVE.

Can you post the /run/user/1031/xpra/:2233.log? (and the matching .bak file if there is one) The timeout should be long enough for most systems. I assume that you've tried other display numbers and that this makes no difference? You can find the vfb left behind, if any, with:

ps -ef | grep "Xvfb"

(on other platforms grep for Xorg)


Sun, 10 Jun 2018 07:00:01 GMT - Antoine Martin: summary changed


Sun, 10 Jun 2018 07:11:37 GMT - yelantf: attachment set

logfile


Sun, 10 Jun 2018 07:11:55 GMT - yelantf: attachment set

another log file


Sun, 10 Jun 2018 07:12:24 GMT - yelantf:

Replying to Antoine Martin:

Can you post the /run/user/1031/xpra/:2233.log? (and the matching .bak file if there is one) I assume that you've tried other display numbers and that this makes no difference?

Tried another display number :2235, things are still the same. By '.bak file' do you mean .log.old? There is two logs . I attached both them here.


Sun, 10 Jun 2018 07:20:18 GMT - yelantf: attachment set

.log.old changes after a while


Sun, 10 Jun 2018 07:26:31 GMT - yelantf:

By reading the .log.old, I think the problem is caused by the GPUs on my server. As you can see in this file, there are 8 gpus on the server. Maybe this causes too much time when I try to start a new session. Then I try to start a session and monitor .log.old file. In the .log.old file, it takes a lot of time to scan the gpu devices. After quite a long time, the .log.old file shows that "xpra is ready". Then I try to connect it and it works. But if I try to list the session before the "xpra is ready" is printed out in .log.old file, it will clean the state-unknown session...


Sun, 10 Jun 2018 07:59:49 GMT - Antoine Martin: summary changed

2018-06-10 15:06:55,377 Error importing swscale colorspace conversion (csc_swscale) 2018-06-10 15:06:55,378 libswscale.so.5: cannot open shared object file: No such file or directory That's not normal. The swscale library is a dependency of the package, it should be installed.

By reading the .log.old, I think the problem is caused by the GPUs on my server. As you can see in this file, there are 8 gpus on the server. Maybe this causes too much time when I try to start a new session.

As per wiki/ReportingBugs: anything that would make the setup unusual - 8 GPUs definitely qualifies there. That's definitely the problem, it takes over a minute to initialize all the GPUs. By the time this is finished, the proxy has assumed that the session failed and returns the error.

And since the NVENC runtime API version doesn't match the one used for building, nvenc fails anyway:

NVENCException: getting API function list - returned 15: This indicates that an invalid struct version was used by the client.

So all this initialization cost was for nothing.

So your 2 options are - both should work:

To fix nvenc, use the latest drivers from nvidia - but this will not fix the timeout.

BTW,

2018-06-10 15:18:57,826 setting keyboard layout to 'cn'

Does that work OK?


Sun, 10 Jun 2018 10:39:04 GMT - yelantf:

Replying to Antoine Martin:

2018-06-10 15:06:55,377 Error importing swscale colorspace conversion (csc_swscale) 2018-06-10 15:06:55,378 libswscale.so.5: cannot open shared object file: No such file or directory That's not normal. The swscale library is a dependency of the package, it should be installed.

Well, so how to install swscale library? I upgrade xpra by installing the .deb files manully and see no warnings or errors. But I only upgrade xpra with latest .deb file (without upgrading other dependency packages). In the path /usr/lib/xpra, I find three, libswscale.so, libswscale.so.4 and libswscale.so.4.6.100. There is no libswscale.so.5.

So your 2 options are - both should work:

OK, I'd simply choose the second choice.

BTW,

2018-06-10 15:18:57,826 setting keyboard layout to 'cn'

Does that work OK?

I found no difference between 'cn' and 'us'. But both keyboard layouts are not able to transfer Chinese characters which produced by Chinese input methods. Teamviewer can do this, so I am wondering if it is possible to support that in xpra.

Thank you for your generous help anyway.


Sun, 10 Jun 2018 11:52:40 GMT - Antoine Martin: status changed; resolution set

Well, so how to install swscale library? I upgrade xpra by installing the .deb files manually and see no warnings or errors. But I only upgrade xpra with latest .deb file (without upgrading other dependency packages).

Well, again this qualifies as a non-standard installation method and should be reported. Don't do that and let apt-get update things as needed, there is an updated ffmpeg package in the repository with the newer swscale library version xpra is linked against.

I am wondering if it is possible to support that in xpra.

I'm sure it's doable, no idea how though.

I am closing this ticket as fixed, feel free to re-open if you still have problems.


Sat, 23 Jan 2021 05:35:59 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1871