It doesn't crash it just seems to be stuck:
(gdb) bt #0 0x00000034528eb7fd in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000003455409f42 in poll (__timeout=-1, __nfds=1, __fds=0x7fff3f1f6fb0) at /usr/include/bits/poll2.h:46 #2 _xcb_conn_wait (c=c@entry=0x1054260, cond=cond@entry=0x7fff3f1f7030, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:414 #3 0x000000345540b37f in wait_for_reply (c=c@entry=0x1054260, request=16876, e=e@entry=0x7fff3f1f70f8) at xcb_in.c:399 #4 0x000000345540b492 in xcb_wait_for_reply (c=c@entry=0x1054260, request=16876, e=e@entry=0x7fff3f1f70f8) at xcb_in.c:429 #5 0x0000003456041b47 in _XReply (dpy=dpy@entry=0xbe0c60, rep=rep@entry=0x7fff3f1f7140, extra=extra@entry=0, discard=discard@entry=1) at xcb_io.c:602 #6 0x000000345603d76d in XSync (dpy=0xbe0c60, discard=0) at Sync.c:44 #7 0x000000386365c17e in gdk_flush () from /lib64/libgdk-x11-2.0.so.0 #8 0x00007f85b5f69289 in _wrap_gdk_flush (self=<optimized out>) at gdk.c:17325 #9 0x00000033b2e49dd3 in PyObject_Call (func=<built-in function flush>, arg=<optimized out>, kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #10 0x00007f85b4a627e9 in __pyx_pf_4xpra_3x11_7gtk_x11_12gdk_bindings_42_gw (__pyx_self=<optimized out>, __pyx_v_xwin=12582914L, __pyx_v_display=<gtk.gdk.Display at remote 0xfb6780>) at xpra/x11/gtk_x11/gdk_bindings.c:8311 #11 __pyx_pw_4xpra_3x11_7gtk_x11_12gdk_bindings_43_gw (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at xpra/x11/gtk_x11/gdk_bindings.c:8147 #12 0x00000033b2e49dd3 in PyObject_Call (func=<built-in function _gw>, arg=<optimized out>, kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #13 0x00007f85b4a66d0b in __pyx_f_4xpra_3x11_7gtk_x11_12gdk_bindings_x_event_filter (__pyx_v_e_gdk=0x7fff3f1f7530, __pyx_v_gdk_event=<optimized out>, __pyx_v_userdata=<optimized out>) at xpra/x11/gtk_x11/gdk_bindings.c:9054 #14 0x000000386365c7b5 in gdk_event_translate () from /lib64/libgdk-x11-2.0.so.0 #15 0x000000386365e968 in _gdk_events_queue () from /lib64/libgdk-x11-2.0.so.0 #16 0x000000386365ea0e in gdk_event_dispatch () from /lib64/libgdk-x11-2.0.so.0 #17 0x00007f85b6f06df6 in g_main_dispatch (context=0x104fbf0) at gmain.c:3054 #18 g_main_context_dispatch (context=context@entry=0x104fbf0) at gmain.c:3630 #19 0x00007f85b6f07148 in g_main_context_iterate (context=0x104fbf0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3701 #20 0x00007f85b6f0754a in g_main_loop_run (loop=0x1e4aa30) at gmain.c:3895 #21 0x0000003862f3ed47 in gtk_main () from /lib64/libgtk-x11-2.0.so.0
Sanitized as:
wait_for_reply() gtk.gdk.flush() xpra.x11.gtk_x11.gdk_bindings._gw() xpra.x11.gtk_x11.gdk_bindings.x_event_filter() gdk_event_translate() gdk_event_dispatch() gtk_main()
And here is another one (with 0.10.x r4000), showing up in a different place:
wait_for_reply() XGetGeometry() gdk_window_get_geometry() resize_corral_window() xpra.x11.gtk_x11.error.call_synced() xpra.x11.gtk_x11.window.may_resize_corral_window()
With the X11 server just waiting for something:
(gdb) bt #0 0x00000034528ed533 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x000000000046692c in WaitForSomething () #2 0x0000000000436eb1 in Dispatch () #3 0x00000000004266da in main ()
Sometimes it also just kills the X11 server (no xpra backtrace in that case - no X11 backtrace yet..).
In the xpra backtrace case, it looks like we're stuck waiting for a reply from the X11 server, and none is coming...
Possibly because we accessed the server from a non-UI thread somewhere. (Ouch!)
Could also be some library problems (#422: the AMD OpenCL
icd is known to cause weird problems)
This is a regression, but not a recent one:
(and it could still be related to lib injection from buggy OpenCL
/ Nvidia
bits.)
Let the bisection "fun" begin (using rgb encoding for testing):
PIL.Image
import on F19..): ok?
So this crash is caused by r3613 which enables XShm
(#345) by default.
This can be confirmed by running normally crashing versions (including trunk) without problems by using:
XPRA_XSHM=0 xpra start ...
Problem is that XShm
provides a major performance boost, so ideally we want to find a fix for the crash without turning it off completely..
Adding some debug, this is often the last thing that shows before we crash:
2013-12-17 15:29:36,198 resize_corral_window() Xpra: Fatal IO error 11 (Resource temporarily unavailable) on X server :10.
Interestingly, I also got this error:
shmget failed: error 28 (No space left on device)
Which got me watching the shared memory segments with:
watch "ipcs | wc -l"
And this number goes up with every resize, but never seems to go back down again! Looks like something isn't freeing up the shared memory!
this seems to fix the leak
fixes the leak and adds some debug
With the patch above I see:
2013-12-17 16:19:55,287 XShmWrapper.XShmCreateImage(740x416-24) True ID=203 2013-12-17 16:19:55,287 XShmWrapper.shmget(PRIVATE, 1234320 bytes, 1023) shmid=880476204 2013-12-17 16:19:55,287 XShmWrapper.shmat(880476204, NULL, 0) True 2013-12-17 16:19:55,287 XShmWrapper.XShmAttach(..) True 2013-12-17 16:19:55,287 XShmWrapper.get_image(4195268L, 0, 0, 740, 416) 2013-12-17 16:19:55,289 XShmWrapper.get_image(4195268L, 0, 0, 740, 416) ref_count=1, \ returning XShmImageWrapper(BGRX: 0, 0, 740, 416, ID=718) 2013-12-17 16:19:55,289 XShmImageWrapper.get_image_pixels() self=XShmImageWrapper(BGRX: 0, 0, 740, 416, ID=718) 2013-12-17 16:19:55,291 WindowModel.do_xpra_configure_event(<AdHocStruct object, contents: \ {'delivered_to': <gtk.gdk.Window object at 0x3c82910 (GdkWindow at 0x3c47360)>, 'send_event': 0, \ 'height': 418, 'width': 743, 'window': <gtk.gdk.Window object at 0x3c82910 (GdkWindow at 0x3c47360)>, \ 'y': 213, 'x': 94, 'serial': 25676L, 'border_width': 0, 'type': 22, \ 'display': <gtk.gdk.Display object at 0x29867d0 (GdkDisplayX11 at 0x2a18230)>}>) managed=True 2013-12-17 16:19:55,292 resize_corral_window() 2013-12-17 16:19:55,292 corral window geometry: (743, 418) 2013-12-17 16:19:55,292 client window geometry: (0, 0, 743, 418) 2013-12-17 16:19:55,292 WindowModel.do_xpra_configure_event(<AdHocStruct object, contents: \ {'delivered_to': <gtk.gdk.Window object at 0x3c827d0 (GdkWindow at 0x3c47240)>, 'send_event': 0, \ 'height': 418, 'width': 743, 'window': <gtk.gdk.Window object at 0x3c827d0 (GdkWindow at 0x3c47240)>, \ 'y': 0, 'x': 0, 'serial': 25679L, 'border_width': 0, 'type': 22, \ 'display': <gtk.gdk.Display object at 0x29867d0 (GdkDisplayX11 at 0x2a18230)>}>) managed=True 2013-12-17 16:19:55,292 resize_corral_window() 2013-12-17 16:19:55,292 corral window geometry: (743, 418) 2013-12-17 16:19:55,293 client window geometry: (0, 0, 743, 418) 2013-12-17 16:19:55,310 XShmWrapper.get_image(4194552L, 17, 301, 480, 13) 2013-12-17 16:19:55,311 XShmWrapper.get_image(4194552L, 17, 301, 480, 13) ref_count=1, \ returning XShmImageWrapper(BGRX: 17, 301, 480, 13, ID=719) 2013-12-17 16:19:55,312 XShmWrapper.cleanup() ID=203, ref_count=1 2013-12-17 16:19:55,313 XShmWrapper.XShmCreateImage(743x418-24) True ID=204 2013-12-17 16:19:55,313 XShmWrapper.shmget(PRIVATE, 1245268 bytes, 1023) shmid=880508973 2013-12-17 16:19:55,313 XShmWrapper.shmat(880508973, NULL, 0) True 2013-12-17 16:19:55,313 XShmWrapper.XShmAttach(..) True 2013-12-17 16:19:55,314 XShmWrapper.get_image(4195272L, 0, 0, 743, 418) 2013-12-17 16:19:55,315 XShmWrapper.get_image(4195272L, 0, 0, 743, 418) ref_count=1, \ returning XShmImageWrapper(BGRX: 0, 0, 743, 418, ID=720) 2013-12-17 16:19:55,316 XShmImageWrapper.free() ID=718, \ free_callback=<built-in method free_image of xpra.x11.bindings.ximage.XShmWrapper object at 0x3c16be8> 2013-12-17 16:19:55,317 XShmWrapper.free_image() ID=203, closed=1, new ref_count=0 2013-12-17 16:19:55,318 XShmWrapper.free() ID=203, has_shm=True 2013-12-17 16:19:55,318 XShmWrapper.free() ID=203, image=0x3d6f590L Xpra: Fatal IO error 11 (Resource temporarily unavailable) on X server :10.
So, it looks to me like we still have an XShmWrapper
live (ID=203, ref_count=1
) when we call a new get_image
following the window resize.
We create a new wrapper (ID=204
) and hand out a new image (ID=720
), then when we free()
the older wrapper, things blow up.
Thoughts:
forces us to not use more than one XShm mapping per window
improve xshm debugging and add unique ids and thread checking
XShm
handle for the same pixmap, just a bit wasteful - but in our case they will be of different sizes, which is why we need them. The attachment/ticket/475/xshm-single.patch implements a one-mapping-per-window policy, this makes things slower for no real benefit.
Fixed in r4963.
0.10.x backports in: r4965 and r4966 - keeping ticket opened for testing.
0.10.10 released with the fix after testing ok - closing
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/475