xpra icon
Bug tracker and wiki

This bug tracker and wiki are being discontinued
please use https://github.com/Xpra-org/xpra instead.


Opened 9 years ago

Closed 9 years ago

Last modified 6 weeks ago

#132 closed defect (fixed)

idle_add is a latency bottleneck in damage_to_data

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 0.4
Component: server Version: 0.3.x
Keywords: Cc:

Description (last modified by Antoine Martin)

When we get many small areas in damage requests, we may end up scheduling hundreds of calls to idle_add from damage_to_data which causes the latency to shoot right up.

Solutions:

  • Run the whole damage_to_data in one idle_add call - or even run it directly from damage (via damage_now or send_delayed). This would reduce the number of idle_add calls at the cost of making one longer idle_add call...
  • Keep track of how many idle_add calls are pending and use this measure to update the batching delay. (but this may still not allow the many small regions to be coalesced..)
  • Prefer full frame refreshes over many small regions by increasing packet_cost and/or decreasing pixels_threshold

Or a combination of the above? Or??

Once done, we can deal with #135

Attachments (3)

xpra-unthreadeddamage.patch (15.5 KB) - added by Antoine Martin 9 years ago.
removes threading from damage codepath
xpra-unthreadeddamage-addlatencymeasurements.patch (23.3 KB) - added by Antoine Martin 9 years ago.
updated patch which dumps the damage latency at various points in the pipeline
xpra-protocol-addlatencymeasurements2.patch (3.5 KB) - added by Antoine Martin 9 years ago.
patch for protocol.py so the latency reported is when the packet has been sent by the network layer

Download all attachments as: .zip

Change History (10)

comment:1 Changed 9 years ago by Antoine Martin

Description: modified (diff)
Status: newaccepted

comment:2 Changed 9 years ago by Antoine Martin

Using this simple patch:

Index: src/xpra/server.py
===================================================================
--- src/xpra/server.py	(revision 883)
+++ src/xpra/server.py	(working copy)
@@ -539,9 +540,10 @@
             except Exception, e:
                 log.error("damage_to_data: error processing region %s: %s", damage, e)
                 continue
-            gobject.idle_add(self._process_damage_regions, wid, window, ww, wh, regions, coding, sequence, options)
+            gobject.idle_add(self._process_damage_regions, time.time(), wid, window, ww, wh, regions, coding, sequence, options)
 
-    def _process_damage_regions(self, wid, window, ww, wh, regions, coding, sequence, options):
+    def _process_damage_regions(self, req_time, wid, window, ww, wh, regions, coding, sequence, options):
+        log.info("process_damage_regions: idle_add latency=%s", int(1000*1000*(time.time()-req_time)))
         if self._damage_cancelled.get(wid, 0)>=sequence:
             log("process_damage_regions: dropping damage request with sequence=%s", sequence)
             return

And running the xterm performance test, we can clearly see the idle_add latency shooting right up:

process_damage_regions: idle_add latency=39
process_damage_regions: idle_add latency=601
process_damage_regions: idle_add latency=793
process_damage_regions: idle_add latency=1873
process_damage_regions: idle_add latency=8841
process_damage_regions: idle_add latency=2904
process_damage_regions: idle_add latency=3646
process_damage_regions: idle_add latency=4523
process_damage_regions: idle_add latency=4644
process_damage_regions: idle_add latency=5743
Uh-oh, our size doesn't fit window sizing constraints: 979x633 vs 979x628
Uh-oh, our size doesn't fit window sizing constraints: 979x633 vs 979x628
unknown modifier: meta
process_damage_regions: idle_add latency=135900
process_damage_regions: idle_add latency=135923
process_damage_regions: idle_add latency=135952
process_damage_regions: idle_add latency=106001
process_damage_regions: idle_add latency=289604
(...)
process_damage_regions: idle_add latency=1162372
process_damage_regions: idle_add latency=1150017
process_damage_regions: idle_add latency=124438
process_damage_regions: idle_add latency=93
process_damage_regions: idle_add latency=28
process_damage_regions: idle_add latency=41
process_damage_regions: idle_add latency=46
process_damage_regions: idle_add latency=59

Then suddenly dropping back down... now need to figure out why!

comment:3 Changed 9 years ago by Antoine Martin

A simple:

time.sleep(0)

added in r884 ensures we don't queue too many idle_add calls before giving them a chance to run.

A better solution is probably to remove the threading from damage_to_data and to keep track of the idle_add latency as a measure of how smoothly the system is running, and batch more when it goes too high?

Changed 9 years ago by Antoine Martin

Attachment: xpra-unthreadeddamage.patch added

removes threading from damage codepath

Changed 9 years ago by Antoine Martin

updated patch which dumps the damage latency at various points in the pipeline

comment:4 Changed 9 years ago by Antoine Martin

With this code in _get_rgb_rawdata

    pixbuf = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, False, 8, width, height)
    log.info("get_rgb_rawdata(..) creating pixbuf object took %s ms", int(1000*(time.time()-start)))
    pixbuf.get_from_drawable(pixmap, pixmap.get_colormap(), x, y, 0, 0, width, height)
    log.info("get_rgb_rawdata(..) pixbuf.get_from_drawable took %s ms", int(1000*(time.time()-start)))
    raw_data = pixbuf.get_pixels()
    log.info("get_rgb_rawdata(..) pixbuf.get_pixels took %s ms", int(1000*(time.time()-start)))
    rowstride = pixbuf.get_rowstride()
    log.info("get_rgb_rawdata(..) took %s ms", int(1000*(time.time()-start)))

We can see:

get_rgb_rawdata(..) creating pixbuf object took 0 ms
get_rgb_rawdata(..) pixbuf.get_from_drawable took 182 ms
get_rgb_rawdata(..) pixbuf.get_pixels took 183 ms
get_rgb_rawdata(..) took 184 ms

So the expensive call is pixbuf.get_from_drawable() in gtk.

Solutions:

  • use the X11 pixmap directly/unwrapped (tricky?)
  • ???
Last edited 9 years ago by Antoine Martin (previous) (diff)

Changed 9 years ago by Antoine Martin

patch for protocol.py so the latency reported is when the packet has been sent by the network layer

comment:5 Changed 9 years ago by Antoine Martin

r886 commits the unthreaded damage code and adds the latency measurement via "xpra info", see changeset commit message for details

comment:6 Changed 9 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed

good enough in trunk with the latency minimization code from #153

comment:7 Changed 6 weeks ago by migration script

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/132

Note: See TracTickets for help on using tickets.