xpra icon
Bug tracker and wiki

Opened 8 years ago

Closed 8 years ago

#242 closed defect (fixed)

0.8.0: client-side errors in log

Reported by: onlyjob Owned by: Antoine Martin
Priority: major Milestone: 0.8
Component: core Version: trunk
Keywords: Cc:

Description

Changing encoding (x264 to rgb24; rgb24 to x264, etc.) when gwenview is running in xpra-0.8.0 session produces multiple errors in log:

2013-02-02 11:11:07,389 draw error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 127, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region
    self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image
    img_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta
    raise Exception("delta region references pixmap data we do not have!")
Exception: delta region references pixmap data we do not have!
2013-02-02 11:11:07,392 error processing draw packet
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1124, in _draw_thread_loop
    self._do_draw(packet)
  File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 127, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region
    self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image
    img_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta
    raise Exception("delta region references pixmap data we do not have!")
Exception: delta region references pixmap data we do not have!


This problem is not specific to gwenview and affect some other applications. I've seen it in kmail too.

Change History (10)

comment:1 Changed 8 years ago by Antoine Martin

Owner: set to Antoine Martin
Status: newaccepted

I'm not seeing it in either...

Does this patch fix things?

Index: src/xpra/window_source.py
===================================================================
--- src/xpra/window_source.py	(revision 2634)
+++ src/xpra/window_source.py	(working copy)
@@ -436,6 +436,8 @@
             #we will resend a key frame because it looks like we will
             #drop a frame which is being processed
             self.video_encoder_cleanup()
+        #clear delta since things got cancelled
+        self.last_pixmap_data = None
 
     def cancel_expire_timer(self):
         if self.expire_timer:

comment:2 Changed 8 years ago by Antoine Martin

I still cannot reproduce this (I suspect it has something to do with client DE and window resizing/positioning) so please try r2639 and see if it fixes things for you.

comment:3 Changed 8 years ago by onlyjob

I've seen few errors in 0.8.1 but after applying the patch from comment:1 I haven't seen errors for ~90m. This is not a final confirmation though as it happen to be more difficult to reproduce in 0.8.1. I will post another update later...

comment:4 Changed 8 years ago by Antoine Martin

The only difference between r2639 and the patch above is when we clear the last_pixmap_data reference: before instead of after the video encoder cleanup..
This is a race, I'll have to review again I guess.

comment:5 Changed 8 years ago by onlyjob

Reporting back as promised: patch didn't fixed it, at least not completely.
I got

2013-02-04 16:52:33,272 draw error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 130, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region
    self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image
    img_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta
    raise Exception("delta region references pixmap data we do not have!")
Exception: delta region references pixmap data we do not have!
2013-02-04 16:52:33,274 error processing draw packet
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1124, in _draw_thread_loop
    self._do_draw(packet)
  File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 130, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region
    self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image
    img_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta
    raise Exception("delta region references pixmap data we do not have!")
Exception: delta region references pixmap data we do not have!

in last few hours but that's much much less then what 0.8.0 was logging.
I reckon r2639 helped. Also it is definitely more difficult to replicate -- I couldn't quite get it deliberately, those errors just appears sometimes and I don't know why or what action produces them...

comment:6 Changed 8 years ago by Antoine Martin

I still do not fully understand what is going on here.

Threads are involved, so it may well be a race - it does look like one - I just can't see where... or maybe I can?

So here is an explanation of what delta does, both for my own sake, also documentation can't hurt (this should go on the wiki when fixed), and so that you can proof read it and hopefully spot something stupid I have missed.

Server side:

  • last_pixmap_data stores a copy of the last pixmap we sent to the client (as well as its size, sequence number, etc). It may be set to None: at first or when we clear it. As per the changes in r2639: we clear it when switching encoding or when cancelling frames.

Since r2660, we also clear it when the client reports failing to decode a frame (using a negative decode time). This is only for recovering quicker, it doesn't fix the underlying issue which is that the client does not have the pixmap that the server thinks it has.. The client should never fail to decode frames. (*)

  • We only set last_pixmap_data at the end of make_data_packet, once we've successfully encoded a frame. And after checking for cancellations (frames can get cancelled if the window is destroyed, resized, encoding changed..). By that point, the packet we generated will end up queued for sending using queue_damage_packet and cannot be cancelled. The whole system relies on this packet reaching the client and being acknowledged back. This is important for x264/vpx as they use intra-frames compression, and here because we rely on the client having a copy of the uncompressed data. When we set it, we also set the "store" attribute so that the client knows it needs to keep a copy around after decoding (which may be used for later delta encoding).
  • When last_pixmap_data is set and we get a new screen update of exactly the same size, and we use a lossless encoding, and both the client and server support the required features, then we xor the two pixmaps before compression. We also pass the sequence number of the reference frame we used for xoring to the client (via client_options as an attribute called "delta") so the client knows that it needs to xor the data after decompression (and can also check that the frame sequence number matches what it has).

Client side:

  • the draw thread loop picks up draw packets posted by the network thread and calls draw_region on the window class, passing it a callback which will send the acknowledgement packet back to the server once done (it also takes care of firing this ack packet in the case of decoding errors - but you should see an error in the client output in this case)
  • the gtk2 version of the window class (this is the default implementation used, the cairo/gtk3 and GL versions aren't supported yet) will dispatch the paint request to the appropriate method for the encoding specified (paint_image for png and jpeg, paint_rgb24 for rgb24, paint_with_video_decoder for x264 and vpx...)
  • just like the server, this class keeps a last_pixmap_data variable, which starts empty
  • in the case of lossless encodings (rgb24 or png), all their paint methods (there are variants using python-imaging or plain gdk-pixbuf), end up calling process_delta after decompressing the data (but before the actual painting part)
  • in process_delta, we first look for the "delta" attribute and if set, we verify that we have stored a previous pixmap, that it is the right size, that the sequence matches and then we xor it (this is the part that goes wrong)
  • at the end of process_delta, after xoring if needed, we then store the pixmap data in "last_pixmap_data" whenever we see a "store" attribute. (both "store" and "delta" can be set at the same time: we apply the xor delta then store the result)


The problem you are seeing comes from the fact that the server relies on the client having a copy of a frame (as the server sent a "store" previously) and it does not.
Special cases:

  • when the client resizes, we re-initialize the backing class (see init in make_new_backing), but this does not clear the last_pixmap_data
  • when the window is already gone, we don't decode anything, we don't store anything, but since the window is gone...

Maybe: we get the draw before the window has had a chance of showing up? (new windows are processed in the UI thread, draws are not and therefore may get there quicker)

comment:7 Changed 8 years ago by Antoine Martin

Can you please test r2661 and r2662 and tell me if either of them fixes the issue.
I don't like r2662 much because it makes us use the UI thread for one instruction, just to ensure they are processed in the right order (it is very hard to come up with a better solution). But unfortunately, I think this is the cause of the problem you reported.

comment:8 Changed 8 years ago by onlyjob

So far tried r2661 -- it works great with new libav-9.1 but doesn't fix the problem with "draw" errors logged.
I will test r2662 soon.

comment:9 Changed 8 years ago by onlyjob

I think we can declare r2662 as viable fix for the issue.

With r2662 applied on top of 0.8.1 (together with r2661) I haven't seen a single draw error for hours.

Well done.

comment:10 Changed 8 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed

Thanks for all the testing.

Note: See TracTickets for help on using tickets.