Xpra: Ticket #184: Xcb-Errors

Part of changeset r1527 was a change to protocol.py (see the patchset in the attached file). This change leads to errors:

2012-09-19 11:58:02,810 xpra client version 0.7.0
Attached (press Control-C to detach)
[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
python2.7: /var/tmp/portage/x11-libs/libX11-1.5.0/work/libX11-1.5.0/src/xcb_io.c:273: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.

some seconds after a connection is attached.

After reverting this change the error is gone.



Wed, 19 Sep 2012 12:54:11 GMT - Mike: attachment set


Wed, 19 Sep 2012 13:19:16 GMT - Antoine Martin: status changed

Yes, we do want that change and I thought I had checked that all callees were indeed non-UI-thread safe.

How do you get this warning to show up? Which distro, version, etc? (edit: looks like gentoo with portage, so what profile, arch, etc..)


Wed, 19 Sep 2012 14:42:04 GMT - Antoine Martin: description changed

Also what was the exact revision tested?


Wed, 19 Sep 2012 15:08:23 GMT - Antoine Martin:

More details: the client only has 3 functions called from the network thread (see self._ui_packet_handlers vs self._packet_handlers in client.py), all the other functions are called from the UI thread via idle_add (the difference is that this is now done in client.py specifically to give us the ability to not go via the UI thread when not needed), those are:

This other thread (which was also introduced by the same changeset and is much more likely to be the real cause of the problem - your partial revert may just make it less likely or impossible to lose the race) only picks up items to draw from the queue and calls window.draw_region do decode and draw the frame. Assuming that you are using python2.x and gtk2.x (not doing so is very difficult!), this leads to 5 possibilities depending on the encoding used (which is not always the same as the encoding you specified... - which one did you specify?):

I've gone as far as I can go without being able to reproduce the problem myself.


Thu, 20 Sep 2012 04:40:45 GMT - Mike:

You are so fast! Faster than the problem reappears ...

The system is a gentoo linux last updated a week ago. It runs:

I'm using jpeg with 20% quality (because x264 is very slow and has bad picture quality).

I had this error for two days. Then I went back to a good version and from this version I went on until the problem apeared again. That revision was r1527. After undoing the change in protocol.py it was working and I went on to the head of the trunk (r1590) where it was working to.

Now I wanted to see this problem with debug switched on, but it disappeared! I'm currently on r1527 and this worked for hours. The xpra server was the same all the time (session started: 1 day, 19:17:50).

I think you could close the ticket for now. If the problem appears again, I know what I have to look for to find the cause.

Thank you very much for your quick investigation.


Thu, 20 Sep 2012 05:41:46 GMT - Antoine Martin: status changed; resolution set

Closing for now. (r1591 fixes the mmap issue)

FYI: jpeg at 20% quality is very poor, is x264 really slower? (the bad quality thing is possible - we'll provide a similar option to be able to choose quality over latency, similar to the jpeg option). What are you link characteristics (lan, dsl?) and what app can I use to reproduce?


Thu, 20 Sep 2012 08:01:59 GMT - Mike:

I'm using different applications (chrome, gnumeric, gnucash, eclipse). Client is connected to internet via 80MBit down, 30MBit up. Server runs behind DSL line with 3MBit down and 400kBit up. Connection is tunneled via SSH with compression enabled (which speeds up tmux). Xpra is connected via tcp:localhost:5555.

At home I use a WLAN connection with 54MBit.

Via internet with x264 I have very high latency (sometimes up to 10 seconds). And the picture quality is often bad. That means text is unreadable, pictures are hardly to recognize. Sometimes it refreshes a little bit later to a better picture. But often picture simply stays that bad. Via WLAN picture quality is good (nearly excellent) but sometimes bad.

Via internet with jpeg 20% picture quality is ok (text is readable). Latency is lesser or equal to x264, but picture quality is consistent.

Via WLAN I normally use png which gives good latency and excellent picture quality.

But I think this is off-topic here ...


Thu, 20 Sep 2012 08:15:13 GMT - Antoine Martin:

Hah, ok, a few points:

If you can help me reproduce your issues, we should eventually be able to get you going with x264, and you should get an order of magnitude better quality/bandwidth usage than with jpeg..


Fri, 21 Sep 2012 13:21:21 GMT - Antoine Martin: status changed; resolution deleted

I've seen it for the first time with x264 encoding, by running this command in an xterm and then clicking on the client window to focus it:

while true; do
  dmesg;
done

OTOH:


Wed, 26 Sep 2012 11:17:31 GMT - Antoine Martin:

It takes quite a while to reproduce, got a backtrace from gdb this time:

(gdb) bt
#0  0x0000003529635925 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000035296370d8 in __GI_abort () at abort.c:91
#2  0x000000352962e6a2 in __assert_fail_base (fmt=0x3529778188 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x352b6b3958 "!xcb_xlib_threads_sequence_lost", file=file@entry=0x352b6b3c12 "xcb_io.c",
    line=line@entry=273, function=function@entry=0x352b6b3d28 "poll_for_event") at assert.c:94
#3  0x000000352962e752 in __GI___assert_fail (assertion=0x352b6b3958 "!xcb_xlib_threads_sequence_lost", file=
    0x352b6b3c12 "xcb_io.c", line=273, function=0x352b6b3d28 "poll_for_event") at assert.c:103
#4  0x000000352b642ac9 in ?? () from /lib64/libX11.so.6
#5  0x000000352b642ae1 in ?? () from /lib64/libX11.so.6
#6  0x000000352b64304d in _XEventsQueued () from /lib64/libX11.so.6
#7  0x000000352b63468d in XPending () from /lib64/libX11.so.6
#8  0x0000003472a5e5bc in ?? () from /lib64/libgdk-x11-2.0.so.0
#9  0x000000352ba4747b in g_main_context_check () from /lib64/libglib-2.0.so.0
#10 0x000000352ba47902 in ?? () from /lib64/libglib-2.0.so.0
#11 0x000000352ba47dc2 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#12 0x000000347234ac57 in gtk_main () from /lib64/libgtk-x11-2.0.so.0
#13 0x00007fd32aea55a8 in ?? () from /usr/lib64/python2.7/site-packages/gtk-2.0/gtk/_gtk.so
#14 0x000000352a6dce36 in call_function (oparg=<optimized out>, pp_stack=0x7fffb070b2d8)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:4082
#15 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#16 0x000000352a6dcd51 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=0x7fffb070b4d8, func=
    <function at remote 0x2a68e60>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184
#17 call_function (oparg=<optimized out>, pp_stack=0x7fffb070b4d8) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#18 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x25fc210, for file /usr/lib64/python2.7/site-packages/xpra/scripts/main.py, line 613, in run_client (parser=<OptionParser(process_default_values=True, allow_interspersed_args=True, _long_opt={'--delay-tray': <Option(_long_opts=['--delay-tray'], help='Waits for the first events before showing the system tray', callback_args=None, callback=None, default=False, nargs=None, choices=None, dest='delay_tray', container=<OptionGroup(_long_opt={...}, title='Client Features Options', parser=<...>, _short_opt={'-d': <Option(_long_opts=['--debug'], help='List of categories to enable debugging for (or "all")', callback_args=None, callback=None, default=None, nargs=1, choices=None, dest='debug', container=<OptionGroup(_long_opt={...}, title='Advanced Options', parser=<...>, _short_opt={...}, option_list=[<Option(_long_opts=['--password-file'], help='The file containing the password required to connect (useful to secure TCP mode)', callback_args=None, callback=None, default=None, nargs=1, choices=None, dest='password...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#19 0x000000352a6ddb1f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0,
    args=<optimized out>, argcount=argcount@entry=4, kws=0x7fd33570e730, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#20 0x000000352a6dc113 in fast_function (nk=<optimized out>, na=4, n=<optimized out>, pp_stack=0x7fffb070b788, func=
    <function at remote 0x259a230>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4194
#21 call_function (oparg=<optimized out>, pp_stack=0x7fffb070b788) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#22 PyEval_EvalFrameEx (f=f@entry=
    Frame 0x23c9bc0, for file /usr/lib64/python2.7/site-packages/xpra/scripts/main.py, line 440, in main (script_file='/usr/bin/xpra', cmdline=['/usr/bin/xpra', 'attach', 'tcp:127.0.0.1:10000'], bool_default=<function at remote 0x259a410>, int_default=<function at remote 0x259a500>, float_default=<function at remote 0x259a578>, start_str='\t%prog start DISPLAY\n', list_str='\t%prog list\n', upgrade_str='\t%prog upgrade DISPLAY', note_str='', stop_str='\t%prog stop [DISPLAY]\n', parser=<OptionParser(process_default_values=True, allow_interspersed_args=True, _long_opt={'--delay-tray': <Option(_long_opts=['--delay-tray'], help='Waits for the first events before showing the system tray', callback_args=None, callback=None, default=False, nargs=None, choices=None, dest='delay_tray', container=<OptionGroup(_long_opt={...}, title='Client Features Options', parser=<...>, _short_opt={'-d': <Option(_long_opts=['--debug'], help='List of categories to enable debugging for (or "all")', callback_args=None, callback=None, defaul...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#23 0x000000352a6ddb1f in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0,
    args=<optimized out>, argcount=argcount@entry=2, kws=0x7fd33570e1b0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#24 0x000000352a6dc113 in fast_function (nk=<optimized out>, na=2, n=<optimized out>, pp_stack=0x7fffb070ba38, func=
    <function at remote 0x2592ed8>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4194
#25 call_function (oparg=<optimized out>, pp_stack=0x7fffb070ba38) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4119
#26 PyEval_EvalFrameEx (f=f@entry=Frame 0x23c8840, for file /usr/bin/xpra, line 6, in <module> (), throwflag=throwflag@entry=0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740
#27 0x000000352a6ddb1f in PyEval_EvalCodeEx (co=co@entry=0x7fd33574fbb0, globals=globals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0,
    kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=closure@entry=0x0)
    at /usr/src/debug/Python-2.7.3/Python/ceval.c:3330
#28 0x000000352a6ddbf2 in PyEval_EvalCode (co=co@entry=0x7fd33574fbb0, globals=globals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin---Type <return> to continue, or q <return> to quit---
/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}) at /usr/src/debug/Python-2.7.3/Python/ceval.c:689
#29 0x000000352a6f6b9a in run_mod (mod=<optimized out>, filename=filename@entry=0x7fffb070c3e0 "/usr/bin/xpra",
    globals=globals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, flags=flags@entry=0x7fffb070bcb0, arena=arena@entry=0x239cc70)
    at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:1361
#30 0x000000352a6f7992 in PyRun_FileExFlags (fp=fp@entry=0x23c4040, filename=filename@entry=0x7fffb070c3e0 "/usr/bin/xpra",
    start=start@entry=257, globals=globals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, locals=locals@entry=
    {'GPollableInputStream': <type at remote 0x267ad60>, '__builtins__': <module at remote 0x7fd3357cdad0>, '__file__': '/usr/bin/xpra', 'xpra': <module at remote 0x7fd3357095c8>, '__package__': None, 'sys': <module at remote 0x7fd3357cdb78>, 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x26c6828>, __doc__=<gobject.GObject.__doc__ at remote 0x7fd3357900d0>, __module__='gtk') at remote 0x27574a0>, '__name__': '__main__', 'GPollableOutputStream': <type at remote 0x26aa1d0>, 'GFileDescriptorBased': <type at remote 0x267b380>, '__doc__': None}, closeit=closeit@entry=1, flags=flags@entry=0x7fffb070bcb0)
    at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:1347
#31 0x000000352a6f83ab in PyRun_SimpleFileExFlags (fp=fp@entry=0x23c4040, filename=0x7fffb070c3e0 "/usr/bin/xpra", closeit=1,
    flags=flags@entry=0x7fffb070bcb0) at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:951
#32 0x000000352a6f8fe9 in PyRun_AnyFileExFlags (fp=fp@entry=0x23c4040, filename=<optimized out>, closeit=<optimized out>,
    flags=flags@entry=0x7fffb070bcb0) at /usr/src/debug/Python-2.7.3/Python/pythonrun.c:755
#33 0x000000352a709882 in Py_Main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/Python-2.7.3/Modules/main.c:639
#34 0x0000003529621735 in __libc_start_main (main=0x4006c0 <main>, argc=4, ubp_av=0x7fffb070be68, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb070be58) at libc-start.c:226
#35 0x00000000004006f1 in _start ()

But I wasn't running with GDK_SYNCHRONIZE="1" so this is not particularly helpful.

I think it may be related to focus switching more than the actual clicking, just a guess though..


Wed, 26 Sep 2012 11:59:45 GMT - Antoine Martin:

Adding GDK_SYNCHRONIZE="1" does not help, gives the same crash.

This is not encoding specific either, as I am now using mmap and it definitely uses idle_add (and therefore runs in the UI thread) for painting..

So, now following the only other entry points:


tcpdump to the rescue, here are the last few messages before the crash:

#
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........damage-sequence?...?..?.<?..
#
T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP]
  P......'..draw...?..?.<.mmap..@...,@..{.?..?..f
#
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........damage-sequence?...?..?.<?.h
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........clipboard-token.PRIMARY
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........focus...mod2
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........button-action..C.?.{?....mod2
##
T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP]
  P......'..draw...?..?.<.mmap..@..*.@..{.?..?..f
#
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........damage-sequence?...?..?.<?..
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P...... ..pointer-position..?.|?....mod2P...... ..pointer-position..?.}?....mod2P...... ..pointer-position..?.~?....mod2
##
T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP]
  P......'..draw...?..?.<.mmap..@....@..{.?..?..f
#
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P...... ..pointer-position..?..?....mod2
#
T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP]
  P......'..draw...?..?.<.mmap..@.."l@..{.?..?..f
#
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P.........damage-sequence?...?..?.<?.w
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P...... ..pointer-position..?..?....mod2P.........damage-sequence?...?..?.<?..P...... ..pointer-position..?..?....mod2P...... \
  ..pointer-position..?..?....mod2P.........button-action..D.?..?....mod2P...... ..pointer-position..?..?....
  mod2P...... ..pointer-position..?..?....mod2
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P...... ..pointer-position..?..?....mod2
##
T 127.0.0.1:50015 -> 127.0.0.1:10000 [AP]
  P...... ..pointer-position..?..?....mod2
##
T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP]
  P.........clipboard-token.PRIMARY
##
T 127.0.0.1:10000 -> 127.0.0.1:50015 [AP]
  P......'..draw...?..?.<.mmap..@...,@..{.?..?..f
##

I had not thought about it, but we can also have clipboard packets when we click like a maniac.. oh this is going to be fun..


Wed, 26 Sep 2012 12:07:52 GMT - Antoine Martin: status changed; resolution set

oh, this is really embarrassing. So much effort led me back to the original change: when I reviewed the callees, I had missed the fact that client.py overrides client_base.py to patch in the clipboard packets hooks... which were now running in the non-UI thread..

Fixed in r1717, I can no longer crash the client!


Wed, 26 Sep 2012 12:15:11 GMT - Mike:

Thanks for fixing!

Strange that I never had these problems again ...


Mon, 19 May 2014 12:33:26 GMT - Antoine Martin: milestone changed

(setting correct milestone the work was completed in)


Sat, 23 Jan 2021 04:47:49 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/184