Part of changeset r1527 was a change to protocol.py (see the patchset in the attached file). This change leads to errors:
2012-09-19 11:58:02,810 xpra client version 0.7.0 Attached (press Control-C to detach) [xcb] Unknown sequence number while processing queue [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called [xcb] Aborting, sorry about that. python2.7: /var/tmp/portage/x11-libs/libX11-1.5.0/work/libX11-1.5.0/src/xcb_io.c:273: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
some seconds after a connection is attached.
After reverting this change the error is gone.
Yes, we do want that change and I thought I had checked that all callees were indeed non-UI-thread safe.
How do you get this warning to show up? Which distro, version, etc? (edit: looks like gentoo with portage, so what profile, arch, etc..)
Also what was the exact revision tested?
More details: the client only has 3 functions called from the network thread (see self._ui_packet_handlers
vs self._packet_handlers
in client.py), all the other functions are called from the UI thread via idle_add (the difference is that this is now done in client.py
specifically to give us the ability to not go via the UI thread when not needed), those are:
ping
: only records the event and calls the send()
which eventually leads to _add_packet_to_queue
which is part of the network queue code, and which is thread safe (uses an exclusive lock to queue).
ping_echo
: only records the echoed data
draw
: there is a potential race here with mmap areas, which needs fixing, but this does not call anything UI related. The other codepath simply queues the packet for processing by another thread (the draw_thread_loop
thread) (and it may also send a network packet in rare mmap cases when the window has disappeared - same codepath as ping)
This other thread (which was also introduced by the same changeset and is much more likely to be the real cause of the problem - your partial revert may just make it less likely or impossible to lose the race) only picks up items to draw from the queue and calls window.draw_region
do decode and draw the frame. Assuming that you are using python2.x and gtk2.x (not doing so is very difficult!), this leads to 5 possibilities depending on the encoding used (which is not always the same as the encoding you specified... - which one did you specify?):
paint_mmap
is called from the UI thread (grabs hold of the rgb24 mmapped data and calls do_paint_rgb24
)
paint_rgb24
decompresses the data with zlib then calls do_paint_rgb24
from the UI thread
paint_x264
and paint_vpx
use paint_with_video_decoder
to decode the picture to rgb24 format then call do_paint_rgb24
from the UI thread
paint_pixbuf
(used for png and jpeg) is called from the UI thread
I've gone as far as I can go without being able to reproduce the problem myself.
You are so fast! Faster than the problem reappears ...
The system is a gentoo linux last updated a week ago. It runs:
I'm using jpeg with 20% quality (because x264 is very slow and has bad picture quality).
I had this error for two days. Then I went back to a good version and from this version I went on until the problem apeared again. That revision was r1527. After undoing the change in protocol.py it was working and I went on to the head of the trunk (r1590) where it was working to.
Now I wanted to see this problem with debug switched on, but it disappeared! I'm currently on r1527 and this worked for hours. The xpra server was the same all the time (session started: 1 day, 19:17:50).
I think you could close the ticket for now. If the problem appears again, I know what I have to look for to find the cause.
Thank you very much for your quick investigation.
Closing for now. (r1591 fixes the mmap issue)
FYI: jpeg at 20% quality is very poor, is x264 really slower? (the bad quality thing is possible - we'll provide a similar option to be able to choose quality over latency, similar to the jpeg option). What are you link characteristics (lan, dsl?) and what app can I use to reproduce?
I'm using different applications (chrome, gnumeric, gnucash, eclipse). Client is connected to internet via 80MBit down, 30MBit up. Server runs behind DSL line with 3MBit down and 400kBit up. Connection is tunneled via SSH with compression enabled (which speeds up tmux). Xpra is connected via tcp:localhost:5555.
At home I use a WLAN connection with 54MBit.
Via internet with x264 I have very high latency (sometimes up to 10 seconds). And the picture quality is often bad. That means text is unreadable, pictures are hardly to recognize. Sometimes it refreshes a little bit later to a better picture. But often picture simply stays that bad. Via WLAN picture quality is good (nearly excellent) but sometimes bad.
Via internet with jpeg 20% picture quality is ok (text is readable). Latency is lesser or equal to x264, but picture quality is consistent.
Via WLAN I normally use png which gives good latency and excellent picture quality.
But I think this is off-topic here ...
Hah, ok, a few points:
--auto-refresh-delay=DELAY
should be a good way of dealing with low quality frames when these are unavoidable... except there is a bug in trunk with x264/vpx which causes it not to work reliably at present (details: we send very small updates as png or rgb24 to avoid doing a full x264/vpx frame refresh, these are lossless so we cancel the auto-refresh at that point thinking we already have a lossy picture..)
If you can help me reproduce your issues, we should eventually be able to get you going with x264, and you should get an order of magnitude better quality/bandwidth usage than with jpeg..
I've seen it for the first time with x264 encoding, by running this command in an xterm and then clicking on the client window to focus it:
while true; do dmesg; done
OTOH:
It takes quite a while to reproduce, got a backtrace from gdb this time:
(gdb) bt #0 0x0000003529635925 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00000035296370d8 in __GI_abort () at abort.c:91 #2 0x000000352962e6a2 in __assert_fail_base (fmt=0x3529778188 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x352b6b3958 "!xcb_xlib_threads_sequence_lost", file=file@entry=0x352b6b3c12 "xcb_io.c", line=line@entry=273, function=function@entry=0x352b6b3d28 "poll_for_event") at assert.c:94 #3 0x000000352962e752 in __GI___assert_fail (assertion=0x352b6b3958 "!xcb_xlib_threads_sequence_lost", file= 0x352b6b3c12 "xcb_io.c", line=273, function=0x352b6b3d28 "poll_for_event") at assert.c:103 #4 0x000000352b642ac9 in ?? () from /lib64/libX11.so.6 #5 0x000000352b642ae1 in ?? () from /lib64/libX11.so.6 #6 0x000000352b64304d in _XEventsQueued () from /lib64/libX11.so.6 #7 0x000000352b63468d in XPending () from /lib64/libX11.so.6 #8 0x0000003472a5e5bc in ?? () from /lib64/libgdk-x11-2.0.so.0 #9 0x000000352ba4747b in g_main_context_check () from /lib64/libglib-2.0.so.0 #10 0x000000352ba47902 in ?? () from /lib64/libglib-2.0.so.0 #11 0x000000352ba47dc2 in g_main_loop_run () from /lib64/libglib-2.0.so.0 #12 0x000000347234ac57 in gtk_main () from /lib64/libgtk-x11-2.0.so.0 #13 0x00007fd32aea55a8 in ?? () from /usr/lib64/python2.7/site-packages/gtk-2.0/gtk/_gtk.so #14 0x000000352a6dce36 in call_function (oparg=<optimized out>, pp_stack=0x7fffb070b2d8) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4082 #15 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740 #16 0x000000352a6dcd51 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=0x7fffb070b4d8, func= <function at remote 0x2a68e60>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184 #17 call_function (oparg=<optimized out>, pp_stack=0x7fffb070b4d8) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119 #18 PyEval_EvalFrameEx (f=f@entry= Frame 0x25fc210, for file /usr/lib64/python2.7/site-packages/xpra/scripts/main.py, line 613, in run_client (parser=<OptionParser(process_default_values=True, allow_interspersed_args=True, _long_opt={'--delay-tray': <Option(_long_opts=['--delay-tray'], help='Waits for the first events before showing the system tray', callback_args=None, callback=None, default=False, nargs=None, choices=None, dest='delay_tray', container=<OptionGroup(_long_opt={...}, title='Client Features Options', parser=<...>, _short_opt={'-d': <Option(_long_opts=['--debug'], help='List of categories to enable debugging for (or "all")', callback_args=None, callback=None, default=None, nargs=1, choices=None, dest='debug', container=<OptionGroup(_long_opt={...}, title='Advanced Options', parser=<...>, _short_opt={...}, option_list=[<Option(_long_opts=['--password-file'], help='The file containing the password required to connect (useful to secure TCP mode)', callback_args=None, callback=None, default=None, nargs=1, choices=None, dest='password...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740 #19 0x000000352a6ddb1f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=4, kws=0x7fd33570e730, kwcount=0, defs=0x0, defcount=0, closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330 #20 0x000000352a6dc113 in fast_function (nk=<optimized out>, na=4, n=<optimized out>, pp_stack=0x7fffb070b788, func= <function at remote 0x259a230>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4194 #21 call_function (oparg=<optimized out>, pp_stack=0x7fffb070b788) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119 #22 PyEval_EvalFrameEx (f=f@entry= Frame 0x23c9bc0, for file /usr/lib64/python2.7/site-packages/xpra/scripts/main.py, line 440, in main (script_file='/usr/bin/xpra', cmdline=['/usr/bin/xpra', 'attach', 'tcp:127.0.0.1:10000'], bool_default=<function at remote 0x259a410>, int_default=<function at remote 0x259a500>, float_default=<function at remote 0x259a578>, start_str='\t%prog start DISPLAY\n', list_str='\t%prog list\n', upgrade_str='\t%prog upgrade DISPLAY', note_str='', stop_str='\t%prog stop [DISPLAY]\n', parser=<OptionParser(process_default_values=True, allow_interspersed_args=True, _long_opt={'--delay-tray': <Option(_long_opts=['--delay-tray'], help='Waits for the first events before showing the system tray', callback_args=None, callback=None, default=False, nargs=None, choices=None, dest='delay_tray', container=<OptionGroup(_long_opt={...}, title='Client Features Options', parser=<...>, _short_opt={'-d': <Option(_long_opts=['--debug'], help='List of categories to enable debugging for (or "all")', callback_args=None, callback=None, defaul...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740 #23 0x000000352a6ddb1f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=2, kws=0x7fd33570e1b0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330 #24 0x000000352a6dc113 in fast_function (nk=<optimized out>, na=2, n=<optimized out>, pp_stack=0x7fffb070ba38, func= <function at remote 0x2592ed8>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4194 #25 call_function (oparg=<optimized out>, pp_stack=0x7fffb070ba38) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119 #26 PyEval_EvalFrameEx (f=f@entry=Frame 0x23c8840, for file /usr/bin/xpra, line 6, in <module> (), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740 #27 0x000000352a6ddb1f in PyEval_EvalCodeEx (co=co@entry=0x7fd33574fbb0, globals=globals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=closure@entry=0x0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330 #28 0x000000352a6ddbf2 in PyEval_EvalCode (co=co@entry=0x7fd33574fbb0, globals=globals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin---Type <return> to continue, or q <return> to quit--- /xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}) at /usr/src/debug/Python-2.7.3/Python/ceval.c:689 #29 0x000000352a6f6b9a in run_mod (mod=<optimized out>, filename=filename@entry=0x7fffb070c3e0 "/usr/bin/xpra", globals=globals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, flags=flags@entry=0x7fffb070bcb0, arena=arena@entry=0x239cc70) at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:1361 #30 0x000000352a6f7992 in PyRun_FileExFlags (fp=fp@entry=0x23c4040, filename=filename@entry=0x7fffb070c3e0 "/usr/bin/xpra", start=start@entry=257, globals=globals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry= {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, closeit=closeit@entry=1, flags=flags@entry=0x7fffb070bcb0) at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:1347 #31 0x000000352a6f83ab in PyRun_SimpleFileExFlags (fp=fp@entry=0x23c4040, filename=0x7fffb070c3e0 "/usr/bin/xpra", closeit=1, flags=flags@entry=0x7fffb070bcb0) at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:951 #32 0x000000352a6f8fe9 in PyRun_AnyFileExFlags (fp=fp@entry=0x23c4040, filename=<optimized out>, closeit=<optimized out>, flags=flags@entry=0x7fffb070bcb0) at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:755 #33 0x000000352a709882 in Py_Main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/Python-2.7.3/Modules/main.c:639 #34 0x0000003529621735 in __libc_start_main (main=0x4006c0 <main>, argc=4, ubp_av=0x7fffb070be68, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb070be58) at libc-start.c:226 #35 0x00000000004006f1 in _start ()
But I wasn't running with GDK_SYNCHRONIZE="1"
so this is not particularly helpful.
I think it may be related to focus switching more than the actual clicking, just a guess though..
Adding GDK_SYNCHRONIZE="1"
does not help, gives the same crash.
This is not encoding specific either, as I am now using mmap and it definitely uses idle_add (and therefore runs in the UI thread) for painting..
So, now following the only other entry points:
_unfocus
and focus_change
: may call update_focus
which may call clear_repeat
(gobject.source_remove
is fine to call, even if this was not from the UI thread) and send_focus
(usual network queuing which is safe, as per comment:3)
do_motion_notify_event
: as above, just queues a packet
_button_action
: same
tcpdump to the rescue, here are the last few messages before the crash:
# T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........damage-sequence?...?..?.<?.. # T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP] P......'..draw...?..?.<.mmap..@...,@..{.?..?..f # T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........damage-sequence?...?..?.<?.h ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........clipboard-token.PRIMARY ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........focus...mod2 ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........button-action..C.?.{?....mod2 ## T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP] P......'..draw...?..?.<.mmap..@..*.@..{.?..?..f # T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........damage-sequence?...?..?.<?.. ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P...... ..pointer-position..?.|?....mod2P...... ..pointer-position..?.}?....mod2P...... ..pointer-position..?.~?....mod2 ## T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP] P......'..draw...?..?.<.mmap..@....@..{.?..?..f # T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P...... ..pointer-position..?..?....mod2 # T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP] P......'..draw...?..?.<.mmap..@.."l@..{.?..?..f # T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P.........damage-sequence?...?..?.<?.w ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P...... ..pointer-position..?..?....mod2P.........damage-sequence?...?..?.<?..P...... ..pointer-position..?..?....mod2P...... \ ..pointer-position..?..?....mod2P.........button-action..D.?..?....mod2P...... ..pointer-position..?..?.... mod2P...... ..pointer-position..?..?....mod2 ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P...... ..pointer-position..?..?....mod2 ## T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP] P...... ..pointer-position..?..?....mod2 ## T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP] P.........clipboard-token.PRIMARY ## T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP] P......'..draw...?..?.<.mmap..@...,@..{.?..?..f ##
I had not thought about it, but we can also have clipboard packets when we click like a maniac.. oh this is going to be fun..
oh, this is really embarrassing. So much effort led me back to the original change: when I reviewed the callees, I had missed the fact that client.py
overrides client_base.py
to patch in the clipboard packets hooks... which were now running in the non-UI thread..
Fixed in r1717, I can no longer crash the client!
Thanks for fixing!
Strange that I never had these problems again ...
(setting correct milestone the work was completed in)
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/184