Xpra: Ticket #191: short-lived OR windows cause crashes (ie: kmail hover tooltips)

As discussed over email, here is a stacktrace obtained with trunk (0.7.0 beta), png or rgb24 encodings and also using:

XPRA_X11_DEBUG=1 XPRA_SYNCHRONIZE=1 GDK_SYNCHRONIZE=1 xpra ..

To trigger, fire up kmail and hover over emails to get the tooltip window to popup.

Everytime the window disappears, one of those fires:

(gdb) break gdk_x_error
Breakpoint 1 at 0x7fd47124e0a0: file /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c, line 458.
(gdb) cont
Continuing.
[New Thread 0x7fd46acb8700 (LWP 8117)]
[New Thread 0x7fd46a4b7700 (LWP 8118)]
[New Thread 0x7fd469cb6700 (LWP 8119)]
[New Thread 0x7fd4694b5700 (LWP 8121)]
Breakpoint 1, gdk_x_error (display=0x1bb30d0, error=0x7fff2ad22840) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c:458
458     /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c: No such file or directory.
(gdb) bt
#0  gdk_x_error (display=0x1bb30d0, error=0x7fff2ad22840) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c:458
#1  0x00007fd4725f6563 in _XError (dpy=0x1bb30d0, rep=<optimized out>) at ../../src/XlibInt.c:1583
#2  0x00007fd4725f35d1 in handle_error (dpy=0x1bb30d0, err=0x20000d0, in_XReply=<optimized out>) at ../../src/xcb_io.c:212
#3  0x00007fd4725f3615 in handle_response (dpy=0x1bb30d0, response=0x20000d0, in_XReply=<optimized out>) at ../../src/xcb_io.c:324
#4  0x00007fd4725f4420 in _XReply (dpy=0x1bb30d0, rep=0x7fff2ad229e0, extra=0, discard=1) at ../../src/xcb_io.c:626
#5  0x00007fd4725efdfd in XSync (dpy=0x1bb30d0, discard=discard@entry=0) at ../../src/Sync.c:44
#6  0x00007fd4712469b1 in IA__gdk_flush () at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:2558
#7  0x00007fd471caf059 in _wrap_gdk_flush (self=<optimized out>) at gdk.c:17325
#8  0x00000000004eb52c in PyEval_EvalFrameEx ()
#9  0x00000000004eb222 in PyEval_EvalFrameEx ()
#10 0x00000000004eb222 in PyEval_EvalFrameEx ()
#11 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#12 0x00000000004f2f32 in function_call.15047 ()
#13 0x000000000044a176 in PyObject_Call ()
#14 0x00000000004ec759 in PyEval_EvalFrameEx ()
#15 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#16 0x00000000004eafd8 in PyEval_EvalFrameEx ()
#17 0x00000000004eb222 in PyEval_EvalFrameEx ()
#18 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#19 0x00000000004f2e00 in function_call.15047 ()
#20 0x000000000044a176 in PyObject_Call ()
#21 0x000000000047cc7a in instancemethod_call.8523 ()
#22 0x000000000044a176 in PyObject_Call ()
#23 0x000000000044c666 in PyEval_CallObjectWithKeywords ()
#24 0x00007fd4777f09db in pyg_signal_class_closure_marshal (closure=<optimized out>, closure@entry=0x1be4320, return_value=return_value@entry=
    0x7fff2ad23c00, n_param_values=<optimized out>, n_param_values@entry=2, param_values=<optimized out>, param_values@entry=0x18386e0,
    invocation_hint=<optimized out>, invocation_hint@entry=0x7fff2ad23a40, marshal_data=<optimized out>, marshal_data@entry=0x0)
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygtype.c:1377
#25 0x00007fd47875b663 in g_closure_invoke (closure=0x1be4320, return_value=0x7fff2ad23c00, n_param_values=2, param_values=0x18386e0,
    invocation_hint=0x7fff2ad23a40) at /tmp/buildd/glib2.0-2.32.3/./gobject/gclosure.c:777
#26 0x00007fd47876c530 in signal_emit_unlocked_R (node=node@entry=0x17999c0, detail=detail@entry=0, instance=instance@entry=0x1f4f160,
    emission_return=emission_return@entry=0x7fff2ad23c00, instance_and_params=instance_and_params@entry=0x18386e0)
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3585
#27 0x00007fd47877392b in g_signal_emitv (instance_and_params=0x18386e0, signal_id=<optimized out>, detail=0, return_value=0x7fff2ad23c00)
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3037
#28 0x00007fd4777e8584 in pygobject_emit (self=<optimized out>, args=('unmanaged', False))
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygobject.c:1842
#29 0x00000000004eaa3b in PyEval_EvalFrameEx ()
#30 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#31 0x00000000004eafd8 in PyEval_EvalFrameEx ()
#32 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#33 0x00000000004f2e00 in function_call.15047 ()
#34 0x000000000044a176 in PyObject_Call ()
#35 0x000000000047cc7a in instancemethod_call.8523 ()
#36 0x000000000044a176 in PyObject_Call ()
#37 0x000000000044c666 in PyEval_CallObjectWithKeywords ()
#38 0x00007fd4777f09db in pyg_signal_class_closure_marshal (closure=<optimized out>, return_value=0x7fff2ad24790,
    n_param_values=<optimized out>, param_values=<optimized out>, invocation_hint=<optimized out>, marshal_data=<optimized out>)
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygtype.c:1377
---Type <return> to continue, or q <return> to quit---
#39 0x00007fd47875b724 in g_closure_invoke (closure=0x1be4320, return_value=0x7fff2ad24790, n_param_values=2, param_values=0x1851640,
    invocation_hint=0x7fff2ad245d0) at /tmp/buildd/glib2.0-2.32.3/./gobject/gclosure.c:777
#40 0x00007fd47876c530 in signal_emit_unlocked_R (node=node@entry=0x1cabf60, detail=detail@entry=0, instance=instance@entry=0x1f4f160,
    emission_return=emission_return@entry=0x7fff2ad24790, instance_and_params=instance_and_params@entry=0x1851640)
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3585
#41 0x00007fd47877392b in g_signal_emitv (instance_and_params=0x1851640, signal_id=<optimized out>, detail=0, return_value=0x7fff2ad24790)
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3037
Python Exception <class 'gdb.error'> There is no member named ob_base.:
#42 0x00007fd4777e8584 in pygobject_emit (self=<optimized out>, args=)
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygobject.c:1842
#43 0x000000000044a176 in PyObject_Call ()
Python Exception <class 'gdb.error'> There is no member named ob_base.:
#44 0x00007fd46c09324a in __pyx_pf_8wimpiggy_8lowlevel_8bindings_154_maybe_send_event (__pyx_v_event=, __pyx_v_signal='wimpiggy-unmap-event',
    __pyx_v_window=<optimized out>, __pyx_self=<optimized out>) at wimpiggy/lowlevel/bindings.c:22779
#45 __pyx_pw_8wimpiggy_8lowlevel_8bindings_155_maybe_send_event (__pyx_self=<optimized out>, __pyx_args=<optimized out>,
    __pyx_kwds=<optimized out>) at wimpiggy/lowlevel/bindings.c:22615
#46 0x000000000044a176 in PyObject_Call ()
#47 0x00007fd46c096b09 in __pyx_pf_8wimpiggy_8lowlevel_8bindings_156_route_event (__pyx_v_parent_signal='wimpiggy-child-unmap-event',
Python Exception <class 'gdb.error'> There is no member named ob_base.:
    __pyx_v_signal='wimpiggy-unmap-event', __pyx_v_event=, __pyx_self=<optimized out>) at wimpiggy/lowlevel/bindings.c:23121
#48 __pyx_pw_8wimpiggy_8lowlevel_8bindings_157_route_event (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>)
    at wimpiggy/lowlevel/bindings.c:22943
#49 0x000000000044a176 in PyObject_Call ()
#50 0x00007fd46c09c4f0 in __pyx_f_8wimpiggy_8lowlevel_8bindings_x_event_filter (__pyx_v_e_gdk=0x7fff2ad24c50,
    __pyx_v_gdk_event=<optimized out>, __pyx_v_userdata=<optimized out>) at wimpiggy/lowlevel/bindings.c:25877
#51 0x00007fd471244a2a in gdk_event_apply_filters (filters=<optimized out>, event=<optimized out>, xevent=<optimized out>)
    at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:356
#52 gdk_event_translate (display=display@entry=0x1ba1230, event=event@entry=0x246fd10, xevent=xevent@entry=0x7fff2ad24c50,
    return_exposes=return_exposes@entry=0) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:927
#53 0x00007fd4712466f6 in _gdk_events_queue (display=display@entry=0x1ba1230) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:2310
#54 0x00007fd47124677e in gdk_event_dispatch (source=source@entry=0x1bd3520, callback=<optimized out>, user_data=<optimized out>)
    at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:2371
#55 0x00007fd478093205 in g_main_dispatch (context=0x1baeab0) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:2539
#56 g_main_context_dispatch (context=context@entry=0x1baeab0) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:3075
#57 0x00007fd478093538 in g_main_context_iterate (context=0x1baeab0, block=block@entry=1, dispatch=dispatch@entry=1,
    self=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:3146
#58 0x00007fd478093932 in g_main_loop_run (loop=0x1e9fe60) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:3340
#59 0x00007fd4715d8797 in IA__gtk_main () at /tmp/buildd/gtk+2.0-2.24.10/gtk/gtkmain.c:1256
#60 0x00007fd471c8ef14 in _wrap_gtk_main (self=<optimized out>) at /tmp/buildd/pygtk-2.24.0/gtk/gtk.override:1240
#61 0x00000000004eb52c in PyEval_EvalFrameEx ()
#62 0x00000000004eb222 in PyEval_EvalFrameEx ()
#63 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#64 0x00000000004eafd8 in PyEval_EvalFrameEx ()
#65 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#66 0x00000000004eafd8 in PyEval_EvalFrameEx ()
#67 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#68 0x00000000004d980d in PyRun_FileExFlags ()
#69 0x00000000004d9a96 in PyRun_SimpleFileExFlags ()
#70 0x00000000004da39e in Py_Main ()
#71 0x00007fd4793bfead in __libc_start_main (main=<optimized out>, argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff2ad25948) at libc-start.c:228
#72 0x000000000041e131 in _start ()


Thu, 04 Oct 2012 15:23:39 GMT - Antoine Martin:

Filtering it a bit and we get (first to last):

(..)

Now, I do have logs for X11 errors I have seen and that I wanted to get rid of:

Ignoring X error: BadWindow on <function unmanageit at 0x1aef2a8>
Ignoring X error: BadWindow on <built-in function xcomposite_unredirect_window>
Ignoring X error:  on <built-in function xdamage_stop>

This happens because we fire the cleanup code on a window which is already unmapped - but since we trap the errors, it shouldn't matter.

So, I need to figure out why the errors end up not being swallowed, and also try to find a way to avoid making those calls unnecessarily.


Thu, 04 Oct 2012 15:24:47 GMT - Antoine Martin: status, description changed


Thu, 04 Oct 2012 15:27:22 GMT - onlyjob: cc set

The above is happening with "rgb24" or "png" encodings (I didn't try with x264). Actual crash may happen only after dismissal of many (>20) hover windows but sometimes it is enough to do it just 5...20 times to crash.


Thu, 04 Oct 2012 21:16:39 GMT - Antoine Martin:

I cannot reproduce reliably enough to debug... It was suggested that this bug occurs on chrome's homepage by hovering over the favourite bookmarks shown there, connecting to a Fedora 18 server from a MS Windows client, but not for me.

However, some changes may have helped:

Now the interesting thing is that with those patches applied, I have yet to get a crash.. but some OR windows may be left un-destroyed, if that is the case, please apply the patch attached to this ticket.

r1817 adds XPRA_X11_DEBUG env var support to the gdk filter loop, enabling this will log all events we process - hopefully this will help since we print them before we hand over to gtk..


Thu, 04 Oct 2012 21:17:32 GMT - Antoine Martin: attachment set

force unmanage windows if we find that the composite window is gone


Fri, 05 Oct 2012 02:33:48 GMT - onlyjob:

I tried trunk r1818 and the problem is still easy to reproduce for me. It took just about 30...40 tooltip window pop-ups to crash.

Attaching log taken with (without force-unmanage-window.patch)

"XPRA_X11_DEBUG=1 XPRA_SYNCHRONIZE=1 GDK_SYNCHRONIZE=1 xpra start :8 -d all"


Fri, 05 Oct 2012 02:34:14 GMT - onlyjob: attachment set


Fri, 05 Oct 2012 03:48:44 GMT - onlyjob:

Antoine, in r1818 there is another problem that you anticipated: some tooltip windows (OR-windows) are stay opened hovering on screen without any means to close them. This is a very serious issue as re-attaching xpra is not removing them Closing the application doesn't help either -- those windows just stay open.

force-manage-window.patch fixes this issue completely.


Fri, 05 Oct 2012 04:27:02 GMT - onlyjob:

FIY 0.6.3 with backported [r1814,r1815,r1816,r1817] and force-unmanage-window.patch​ is crashing as before.

For whatever reason 0.6.3 is much easier to crash but both 0.6.3 and trunk are vulnerable now.

Thanks.


Fri, 05 Oct 2012 05:35:26 GMT - Antoine Martin:

force-unmanage-window.patch fixes this issue completely.

Are you saying that there are no more stuck OR-windows with this patch? I am still seeing them here..


Fri, 05 Oct 2012 06:06:28 GMT - onlyjob:

Replying to antoine:

force-unmanage-window.patch fixes this issue completely.

Are you saying that there are no more stuck OR-windows with this patch? I am still seeing them here..

I'm not sure if we're talking about the same windows but since I applied this patch I had none of those horrible ones that you can't close even by re-attaching or by closing the application. That was even worse than crash... :)

I'm still getting crashes that are quite reproducible with kmail.


Fri, 05 Oct 2012 07:22:19 GMT - Antoine Martin:

Hah, ok, I was still getting them - will try again.

Until I can reproduce this more reliably - can you post the log of a crash (ideally more than one) with trunk (r1830 or better) and with XPRA_X11_DEBUG=1, this should show all gdk events - not just the ones we process, I am thinking that maybe we need to intercept more events to prevent gdk from barfing out when something unexpected happens.


Fri, 05 Oct 2012 09:20:53 GMT - onlyjob:

Unfortunately r1830 is broken -- windows are opening completely blank, there is nothing but white background...


Fri, 05 Oct 2012 11:36:49 GMT - Antoine Martin:

Sorry about that, was trying to commit only the minimum needed and hold on on some parts... and ended up going out of sync. Fixed in r1832.


Fri, 05 Oct 2012 12:04:40 GMT - onlyjob: attachment set


Fri, 05 Oct 2012 12:05:44 GMT - onlyjob:

r1833a.log: Xpra started as "XPRA_X11_DEBUG=1 xpra start :8 -d all"


Fri, 05 Oct 2012 12:16:23 GMT - onlyjob: attachment set


Fri, 05 Oct 2012 12:21:11 GMT - onlyjob:

r1833b.log: Xpra started as "XPRA_X11_DEBUG=1 xpra upgrade :8 -d all" after previous crash (r1833a.log).

Please let me know if more logs are needed.


Fri, 05 Oct 2012 12:30:14 GMT - onlyjob:

I'm sorry to report that I got indestructible window with r1833....


Fri, 05 Oct 2012 12:34:46 GMT - Antoine Martin:

That's ok, that's expected, what I am trying to get rid of first is the crashes... then we can worry about the side effects, undoing what is more harmful than good.

Does r1834 help?


Fri, 05 Oct 2012 13:18:44 GMT - onlyjob:

Replying to antoine:

That's ok, that's expected, what I am trying to get rid of first is the crashes... then we can worry about the side effects, undoing what is more harmful than good.

I'm with you but crashes are less harmful as they are easier to recover from. :)

Does r1834 help?

No... :(

Attaching r1834a.log


Fri, 05 Oct 2012 13:19:07 GMT - onlyjob: attachment set


Fri, 05 Oct 2012 17:32:07 GMT - Antoine Martin:

Err, my bad, sorry again.

The trap was meant to go in mouse motion events, not just mouse clicks... (again I was trying to commit piecemeal..) - r1842 does this better by moving common code to a single method which takes care of it.

If you do manage to reproduce the problem again, please tell me how: kmail has stopped crashing for me. I will now deal with undoing the stuff which prevents OR windows from disappearing when they should.


Sat, 06 Oct 2012 01:05:12 GMT - onlyjob:

Looks like r1849 is OK. Naturally I will test more but so far I couldn't reproduce the issue. Great work Antoine, you did a fantastic work which saved the day, once again. :) Thank you very much.

What was the problem after all?


Sat, 06 Oct 2012 06:43:35 GMT - Antoine Martin:

Thanks. Your interpretation of this bug hunt is flattering, but unfortunately a post-portem is required to see where I went wrong and why I wasted days chasing shadows:

Many thanks for sticking with me during that time, and also thanks to AradTechKK for testing, finding a reliable crash setup (Fedora 18 + kmail) and giving me access to his test system.


Sat, 06 Oct 2012 07:25:57 GMT - Antoine Martin: status changed; resolution set

backports in r1854, closing.


Sat, 23 Jan 2021 04:48:01 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/191