xpra icon
Bug tracker and wiki

Opened 8 years ago

Closed 8 years ago

#191 closed defect (fixed)

short-lived OR windows cause crashes (ie: kmail hover tooltips)

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 0.7
Component: core Version: trunk
Keywords: Cc: onlyjob@…

Description (last modified by Antoine Martin)

As discussed over email, here is a stacktrace obtained with trunk (0.7.0 beta), png or rgb24 encodings and also using:

XPRA_X11_DEBUG=1 XPRA_SYNCHRONIZE=1 GDK_SYNCHRONIZE=1 xpra ..

To trigger, fire up kmail and hover over emails to get the tooltip window to popup.

Everytime the window disappears, one of those fires:

(gdb) break gdk_x_error                                                                                                                          
Breakpoint 1 at 0x7fd47124e0a0: file /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c, line 458.                                                
(gdb) cont                                                                                                                                       
Continuing.                                                                                                                                      
[New Thread 0x7fd46acb8700 (LWP 8117)]                                                                                                           
[New Thread 0x7fd46a4b7700 (LWP 8118)]                                                                                                           
[New Thread 0x7fd469cb6700 (LWP 8119)]                                                                                                           
[New Thread 0x7fd4694b5700 (LWP 8121)]                                                                                                           

Breakpoint 1, gdk_x_error (display=0x1bb30d0, error=0x7fff2ad22840) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c:458                     
458     /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c: No such file or directory.                                                            
(gdb) bt                                                                                                                                         
#0  gdk_x_error (display=0x1bb30d0, error=0x7fff2ad22840) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkmain-x11.c:458                               
#1  0x00007fd4725f6563 in _XError (dpy=0x1bb30d0, rep=<optimized out>) at ../../src/XlibInt.c:1583                                               
#2  0x00007fd4725f35d1 in handle_error (dpy=0x1bb30d0, err=0x20000d0, in_XReply=<optimized out>) at ../../src/xcb_io.c:212                       
#3  0x00007fd4725f3615 in handle_response (dpy=0x1bb30d0, response=0x20000d0, in_XReply=<optimized out>) at ../../src/xcb_io.c:324               
#4  0x00007fd4725f4420 in _XReply (dpy=0x1bb30d0, rep=0x7fff2ad229e0, extra=0, discard=1) at ../../src/xcb_io.c:626                              
#5  0x00007fd4725efdfd in XSync (dpy=0x1bb30d0, discard=discard@entry=0) at ../../src/Sync.c:44                                                  
#6  0x00007fd4712469b1 in IA__gdk_flush () at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:2558                                           
#7  0x00007fd471caf059 in _wrap_gdk_flush (self=<optimized out>) at gdk.c:17325                                                                  
#8  0x00000000004eb52c in PyEval_EvalFrameEx ()                                                                                                  
#9  0x00000000004eb222 in PyEval_EvalFrameEx ()                                                                                                  
#10 0x00000000004eb222 in PyEval_EvalFrameEx ()                                                                                                  
#11 0x00000000004f1db0 in PyEval_EvalCodeEx ()                                                                                                   
#12 0x00000000004f2f32 in function_call.15047 ()                                                                                                 
#13 0x000000000044a176 in PyObject_Call ()                                                                                                       
#14 0x00000000004ec759 in PyEval_EvalFrameEx ()                                                                                                  
#15 0x00000000004f1db0 in PyEval_EvalCodeEx ()                                                                                                   
#16 0x00000000004eafd8 in PyEval_EvalFrameEx ()                                                                                                  
#17 0x00000000004eb222 in PyEval_EvalFrameEx ()                                                                                                  
#18 0x00000000004f1db0 in PyEval_EvalCodeEx ()                                                                                                   
#19 0x00000000004f2e00 in function_call.15047 ()                                                                                                 
#20 0x000000000044a176 in PyObject_Call ()                                                                                                       
#21 0x000000000047cc7a in instancemethod_call.8523 ()                                                                                            
#22 0x000000000044a176 in PyObject_Call ()                                                                                                       
#23 0x000000000044c666 in PyEval_CallObjectWithKeywords ()                                                                                       
#24 0x00007fd4777f09db in pyg_signal_class_closure_marshal (closure=<optimized out>, closure@entry=0x1be4320, return_value=return_value@entry=   
    0x7fff2ad23c00, n_param_values=<optimized out>, n_param_values@entry=2, param_values=<optimized out>, param_values@entry=0x18386e0,          
    invocation_hint=<optimized out>, invocation_hint@entry=0x7fff2ad23a40, marshal_data=<optimized out>, marshal_data@entry=0x0)                 
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygtype.c:1377                                                        
#25 0x00007fd47875b663 in g_closure_invoke (closure=0x1be4320, return_value=0x7fff2ad23c00, n_param_values=2, param_values=0x18386e0,            
    invocation_hint=0x7fff2ad23a40) at /tmp/buildd/glib2.0-2.32.3/./gobject/gclosure.c:777                                                       
#26 0x00007fd47876c530 in signal_emit_unlocked_R (node=node@entry=0x17999c0, detail=detail@entry=0, instance=instance@entry=0x1f4f160,           
    emission_return=emission_return@entry=0x7fff2ad23c00, instance_and_params=instance_and_params@entry=0x18386e0)                               
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3585                                                                                       
#27 0x00007fd47877392b in g_signal_emitv (instance_and_params=0x18386e0, signal_id=<optimized out>, detail=0, return_value=0x7fff2ad23c00)       
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3037                                                                                       
#28 0x00007fd4777e8584 in pygobject_emit (self=<optimized out>, args=('unmanaged', False))                                                       
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygobject.c:1842                                                      
#29 0x00000000004eaa3b in PyEval_EvalFrameEx ()                                                                                                  
#30 0x00000000004f1db0 in PyEval_EvalCodeEx ()                                                                                                   
#31 0x00000000004eafd8 in PyEval_EvalFrameEx ()                                                                                                  
#32 0x00000000004f1db0 in PyEval_EvalCodeEx ()                                                                                                   
#33 0x00000000004f2e00 in function_call.15047 ()                                                                                                 
#34 0x000000000044a176 in PyObject_Call ()                                                                                                       
#35 0x000000000047cc7a in instancemethod_call.8523 ()                                                                                            
#36 0x000000000044a176 in PyObject_Call ()                                                                                                       
#37 0x000000000044c666 in PyEval_CallObjectWithKeywords ()                                                                                       
#38 0x00007fd4777f09db in pyg_signal_class_closure_marshal (closure=<optimized out>, return_value=0x7fff2ad24790,                                
    n_param_values=<optimized out>, param_values=<optimized out>, invocation_hint=<optimized out>, marshal_data=<optimized out>)                 
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygtype.c:1377                                                        
---Type <return> to continue, or q <return> to quit---                                                                                           
#39 0x00007fd47875b724 in g_closure_invoke (closure=0x1be4320, return_value=0x7fff2ad24790, n_param_values=2, param_values=0x1851640,            
    invocation_hint=0x7fff2ad245d0) at /tmp/buildd/glib2.0-2.32.3/./gobject/gclosure.c:777                                                       
#40 0x00007fd47876c530 in signal_emit_unlocked_R (node=node@entry=0x1cabf60, detail=detail@entry=0, instance=instance@entry=0x1f4f160,           
    emission_return=emission_return@entry=0x7fff2ad24790, instance_and_params=instance_and_params@entry=0x1851640)                               
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3585                                                                                       
#41 0x00007fd47877392b in g_signal_emitv (instance_and_params=0x1851640, signal_id=<optimized out>, detail=0, return_value=0x7fff2ad24790)       
    at /tmp/buildd/glib2.0-2.32.3/./gobject/gsignal.c:3037                                                                                       
Python Exception <class 'gdb.error'> There is no member named ob_base.:                                                                          
#42 0x00007fd4777e8584 in pygobject_emit (self=<optimized out>, args=)                                                                           
    at /home/martin/debian/pkg-gnome/build-area/pygobject-2-2.28.6/gobject/pygobject.c:1842                                                      
#43 0x000000000044a176 in PyObject_Call ()
Python Exception <class 'gdb.error'> There is no member named ob_base.: 
#44 0x00007fd46c09324a in __pyx_pf_8wimpiggy_8lowlevel_8bindings_154_maybe_send_event (__pyx_v_event=, __pyx_v_signal='wimpiggy-unmap-event', 
    __pyx_v_window=<optimized out>, __pyx_self=<optimized out>) at wimpiggy/lowlevel/bindings.c:22779
#45 __pyx_pw_8wimpiggy_8lowlevel_8bindings_155_maybe_send_event (__pyx_self=<optimized out>, __pyx_args=<optimized out>, 
    __pyx_kwds=<optimized out>) at wimpiggy/lowlevel/bindings.c:22615
#46 0x000000000044a176 in PyObject_Call ()
#47 0x00007fd46c096b09 in __pyx_pf_8wimpiggy_8lowlevel_8bindings_156_route_event (__pyx_v_parent_signal='wimpiggy-child-unmap-event', 
Python Exception <class 'gdb.error'> There is no member named ob_base.: 
    __pyx_v_signal='wimpiggy-unmap-event', __pyx_v_event=, __pyx_self=<optimized out>) at wimpiggy/lowlevel/bindings.c:23121
#48 __pyx_pw_8wimpiggy_8lowlevel_8bindings_157_route_event (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>)
    at wimpiggy/lowlevel/bindings.c:22943
#49 0x000000000044a176 in PyObject_Call ()
#50 0x00007fd46c09c4f0 in __pyx_f_8wimpiggy_8lowlevel_8bindings_x_event_filter (__pyx_v_e_gdk=0x7fff2ad24c50, 
    __pyx_v_gdk_event=<optimized out>, __pyx_v_userdata=<optimized out>) at wimpiggy/lowlevel/bindings.c:25877
#51 0x00007fd471244a2a in gdk_event_apply_filters (filters=<optimized out>, event=<optimized out>, xevent=<optimized out>)
    at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:356
#52 gdk_event_translate (display=display@entry=0x1ba1230, event=event@entry=0x246fd10, xevent=xevent@entry=0x7fff2ad24c50, 
    return_exposes=return_exposes@entry=0) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:927
#53 0x00007fd4712466f6 in _gdk_events_queue (display=display@entry=0x1ba1230) at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:2310
#54 0x00007fd47124677e in gdk_event_dispatch (source=source@entry=0x1bd3520, callback=<optimized out>, user_data=<optimized out>)
    at /tmp/buildd/gtk+2.0-2.24.10/gdk/x11/gdkevents-x11.c:2371
#55 0x00007fd478093205 in g_main_dispatch (context=0x1baeab0) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:2539
#56 g_main_context_dispatch (context=context@entry=0x1baeab0) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:3075
#57 0x00007fd478093538 in g_main_context_iterate (context=0x1baeab0, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:3146
#58 0x00007fd478093932 in g_main_loop_run (loop=0x1e9fe60) at /tmp/buildd/glib2.0-2.32.3/./glib/gmain.c:3340
#59 0x00007fd4715d8797 in IA__gtk_main () at /tmp/buildd/gtk+2.0-2.24.10/gtk/gtkmain.c:1256
#60 0x00007fd471c8ef14 in _wrap_gtk_main (self=<optimized out>) at /tmp/buildd/pygtk-2.24.0/gtk/gtk.override:1240
#61 0x00000000004eb52c in PyEval_EvalFrameEx ()
#62 0x00000000004eb222 in PyEval_EvalFrameEx ()
#63 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#64 0x00000000004eafd8 in PyEval_EvalFrameEx ()
#65 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#66 0x00000000004eafd8 in PyEval_EvalFrameEx ()
#67 0x00000000004f1db0 in PyEval_EvalCodeEx ()
#68 0x00000000004d980d in PyRun_FileExFlags ()
#69 0x00000000004d9a96 in PyRun_SimpleFileExFlags ()
#70 0x00000000004da39e in Py_Main ()
#71 0x00007fd4793bfead in __libc_start_main (main=<optimized out>, argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff2ad25948) at libc-start.c:228
#72 0x000000000041e131 in _start ()

Attachments (5)

force-unmanage-window.patch (2.5 KB) - added by Antoine Martin 8 years ago.
force unmanage windows if we find that the composite window is gone
xpra_trunk_r1818_a.log.xz (204.7 KB) - added by onlyjob 8 years ago.
r1833a.log.xz (511.6 KB) - added by onlyjob 8 years ago.
r1833b.log.xz (60.2 KB) - added by onlyjob 8 years ago.
r1834a.log.xz (101.9 KB) - added by onlyjob 8 years ago.

Download all attachments as: .zip

Change History (26)

comment:1 Changed 8 years ago by Antoine Martin

Filtering it a bit and we get (first to last):

  • main_loop_run

(..)

  • gdk_event_dispatch
  • wimpiggy.lowlevel.bindings.x_event_filter
  • wimpiggy.lowlevel.bindings.route_event
  • wimpiggy.lowlevel.bindings.maybe_send_event with a "wimpiggy-unmap-event"
  • pygobject_emit with an "unmanaged" signal

Now, I do have logs for X11 errors I have seen and that I wanted to get rid of:

Ignoring X error: BadWindow on <function unmanageit at 0x1aef2a8>
Ignoring X error: BadWindow on <built-in function xcomposite_unredirect_window>
Ignoring X error:  on <built-in function xdamage_stop>

This happens because we fire the cleanup code on a window which is already unmapped - but since we trap the errors, it shouldn't matter.

So, I need to figure out why the errors end up not being swallowed, and also try to find a way to avoid making those calls unnecessarily.

Last edited 8 years ago by Antoine Martin (previous) (diff)

comment:2 Changed 8 years ago by Antoine Martin

Description: modified (diff)
Status: newaccepted

comment:3 Changed 8 years ago by onlyjob

Cc: onlyjob@… added

The above is happening with "rgb24" or "png" encodings (I didn't try with x264).
Actual crash may happen only after dismissal of many (>20) hover windows but sometimes it is enough to do it just 5...20 times to crash.

comment:4 Changed 8 years ago by Antoine Martin

I cannot reproduce reliably enough to debug...
It was suggested that this bug occurs on chrome's homepage by hovering over the favourite bookmarks shown there, connecting to a Fedora 18 server from a MS Windows client, but not for me.

However, some changes may have helped:

  • in r1814 we drop events that have no gdk window associated with them,
  • r1815 and r1816 ensure we only destroy windows once

Now the interesting thing is that with those patches applied, I have yet to get a crash.. but some OR windows may be left un-destroyed, if that is the case, please apply the patch attached to this ticket.

r1817 adds XPRA_X11_DEBUG env var support to the gdk filter loop, enabling this will log all events we process - hopefully this will help since we print them before we hand over to gtk..

Changed 8 years ago by Antoine Martin

Attachment: force-unmanage-window.patch added

force unmanage windows if we find that the composite window is gone

comment:5 Changed 8 years ago by onlyjob

I tried trunk r1818 and the problem is still easy to reproduce for me.
It took just about 30...40 tooltip window pop-ups to crash.

Attaching log taken with (without force-unmanage-window.patch)

"XPRA_X11_DEBUG=1 XPRA_SYNCHRONIZE=1 GDK_SYNCHRONIZE=1 xpra start :8 -d all"

Last edited 8 years ago by onlyjob (previous) (diff)

Changed 8 years ago by onlyjob

Attachment: xpra_trunk_r1818_a.log.xz added

comment:6 Changed 8 years ago by onlyjob

Antoine, in r1818 there is another problem that you anticipated:
some tooltip windows (OR-windows) are stay opened hovering on screen without any means to close them.
This is a very serious issue as re-attaching xpra is not removing them
Closing the application doesn't help either -- those windows just stay open.

force-manage-window.patch fixes this issue completely.

(I guess something was wrong with my copy-pasting skills as the patch applies well)

Version 1, edited 8 years ago by onlyjob (previous) (next) (diff)

comment:7 Changed 8 years ago by onlyjob

FIY 0.6.3 with backported [r1814,r1815,r1816,r1817] and force-unmanage-window.patch​ is crashing as before.

For whatever reason 0.6.3 is much easier to crash but both 0.6.3 and trunk are vulnerable now.

Thanks.

comment:8 Changed 8 years ago by Antoine Martin

force-unmanage-window.patch fixes this issue completely.

Are you saying that there are no more stuck OR-windows with this patch? I am still seeing them here..

comment:9 in reply to:  8 Changed 8 years ago by onlyjob

Replying to antoine:

force-unmanage-window.patch fixes this issue completely.

Are you saying that there are no more stuck OR-windows with this patch?
I am still seeing them here..


I'm not sure if we're talking about the same windows but since I applied this patch I had none of those horrible ones that you can't close even by re-attaching or by closing the application. That was even worse than crash... :)

I'm still getting crashes that are quite reproducible with kmail.

comment:10 Changed 8 years ago by Antoine Martin

Hah, ok, I was still getting them - will try again.

Until I can reproduce this more reliably - can you post the log of a crash (ideally more than one) with trunk (r1830 or better) and with XPRA_X11_DEBUG=1, this should show all gdk events - not just the ones we process, I am thinking that maybe we need to intercept more events to prevent gdk from barfing out when something unexpected happens.

comment:11 Changed 8 years ago by onlyjob

Unfortunately r1830 is broken -- windows are opening completely blank, there is nothing but white background...

comment:12 Changed 8 years ago by Antoine Martin

Sorry about that, was trying to commit only the minimum needed and hold on on some parts... and ended up going out of sync. Fixed in r1832.

Changed 8 years ago by onlyjob

Attachment: r1833a.log.xz added

comment:13 Changed 8 years ago by onlyjob

r1833a.log: Xpra started as "XPRA_X11_DEBUG=1 xpra start :8 -d all"

Changed 8 years ago by onlyjob

Attachment: r1833b.log.xz added

comment:14 Changed 8 years ago by onlyjob

r1833b.log: Xpra started as "XPRA_X11_DEBUG=1 xpra upgrade :8 -d all" after previous crash (r1833a.log).

Please let me know if more logs are needed.

comment:15 Changed 8 years ago by onlyjob

I'm sorry to report that I got indestructible window with r1833....

comment:16 Changed 8 years ago by Antoine Martin

That's ok, that's expected, what I am trying to get rid of first is the crashes... then we can worry about the side effects, undoing what is more harmful than good.

Does r1834 help?

Last edited 8 years ago by Antoine Martin (previous) (diff)

comment:17 in reply to:  16 Changed 8 years ago by onlyjob

Replying to antoine:

That's ok, that's expected, what I am trying to get rid of first is the
crashes... then we can worry about the side effects, undoing what is more
harmful than good.


I'm with you but crashes are less harmful as they are easier to recover from.
:)

Does r1834 help?


No... :(

Attaching r1834a.log

Changed 8 years ago by onlyjob

Attachment: r1834a.log.xz added

comment:18 Changed 8 years ago by Antoine Martin

Err, my bad, sorry again.

The trap was meant to go in mouse motion events, not just mouse clicks... (again I was trying to commit piecemeal..) - r1842 does this better by moving common code to a single method which takes care of it.

If you do manage to reproduce the problem again, please tell me how: kmail has stopped crashing for me. I will now deal with undoing the stuff which prevents OR windows from disappearing when they should.

comment:19 Changed 8 years ago by onlyjob

Looks like r1849 is OK. Naturally I will test more but so far I couldn't reproduce the issue.
Great work Antoine, you did a fantastic work which saved the day, once again. :)
Thank you very much.

What was the problem after all?

comment:20 Changed 8 years ago by Antoine Martin

Thanks. Your interpretation of this bug hunt is flattering, but unfortunately a post-portem is required to see where I went wrong and why I wasted days chasing shadows:

  • this particular bug has been present forever, it is a race - I will backport the fix to older versions, even if it is much harder to hit there
  • recent versions have lower latency and make the race more likely to hit - in a way, this is a good thing, we're faster and more efficient, and we end up fixing real bugs
  • I dismissed the mouse event handler code as being the source of the problem because this code had been working fine forever (or at least, crashing so rarely I never knew about it), first mistake.
  • it would have been easier to diagnose if the gtk main loop could report where it was last entered rather than where we call gtk.main from, that is why:
  • I got side-tracked into the filter_events code because it is part of the call sequence where the blow up eventually occurs, and trying to fix it there was just an exercise in futility: by then it is too late, and you would end up adding workaround after workaround, without delaying the inevitable crash much!
  • when these changes failed to improve things significantly (being a race, they did make a difference in some cases... which didn't help the hunt), I should have gone back to the log files and re-evaluated my understanding of the problem, second major mistake.

Many thanks for sticking with me during that time, and also thanks to AradTechKK for testing, finding a reliable crash setup (Fedora 18 + kmail) and giving me access to his test system.

comment:21 Changed 8 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed

backports in r1854, closing.

Note: See TracTickets for help on using tickets.