Xpra: Ticket #132: idle_add is a latency bottleneck in damage_to_data

When we get many small areas in damage requests, we may end up scheduling hundreds of calls to idle_add from damage_to_data which causes the latency to shoot right up.

Solutions:

Run the whole damage_to_data in one idle_add call - or even run it directly from damage (via damage_now or send_delayed). This would reduce the number of idle_add calls at the cost of making one longer idle_add call...
Keep track of how many idle_add calls are pending and use this measure to update the batching delay. (but this may still not allow the many small regions to be coalesced..)
Prefer full frame refreshes over many small regions by increasing packet_cost and/or decreasing pixels_threshold

Or a combination of the above? Or??

Once done, we can deal with #135

Sun, 03 Jun 2012 08:39:14 GMT - Antoine Martin: status, description changed

status changed from new to accepted
description modified (diff)

Sun, 03 Jun 2012 10:32:10 GMT - Antoine Martin:

Using this simple patch:

Index: src/xpra/server.py
===================================================================
--- src/xpra/server.py	(revision 883)
+++ src/xpra/server.py	(working copy)
@@ -539,9 +540,10 @@
             except Exception, e:
                 log.error("damage_to_data: error processing region %s: %s", damage, e)
                 continue
-            gobject.idle_add(self._process_damage_regions, wid, window, ww, wh, regions, coding, sequence, options)
+            gobject.idle_add(self._process_damage_regions, time.time(), wid, window, ww, wh, regions, coding, sequence, options)
-    def _process_damage_regions(self, wid, window, ww, wh, regions, coding, sequence, options):
+    def _process_damage_regions(self, req_time, wid, window, ww, wh, regions, coding, sequence, options):
+        log.info("process_damage_regions: idle_add latency=%s", int(1000*1000*(time.time()-req_time)))
         if self._damage_cancelled.get(wid, 0)>=sequence:
             log("process_damage_regions: dropping damage request with sequence=%s", sequence)
             return

And running the xterm performance test, we can clearly see the idle_add latency shooting right up:

process_damage_regions: idle_add latency=39
process_damage_regions: idle_add latency=601
process_damage_regions: idle_add latency=793
process_damage_regions: idle_add latency=1873
process_damage_regions: idle_add latency=8841
process_damage_regions: idle_add latency=2904
process_damage_regions: idle_add latency=3646
process_damage_regions: idle_add latency=4523
process_damage_regions: idle_add latency=4644
process_damage_regions: idle_add latency=5743
Uh-oh, our size doesn't fit window sizing constraints: 979x633 vs 979x628
Uh-oh, our size doesn't fit window sizing constraints: 979x633 vs 979x628
unknown modifier: meta
process_damage_regions: idle_add latency=135900
process_damage_regions: idle_add latency=135923
process_damage_regions: idle_add latency=135952
process_damage_regions: idle_add latency=106001
process_damage_regions: idle_add latency=289604
(...)
process_damage_regions: idle_add latency=1162372
process_damage_regions: idle_add latency=1150017
process_damage_regions: idle_add latency=124438
process_damage_regions: idle_add latency=93
process_damage_regions: idle_add latency=28
process_damage_regions: idle_add latency=41
process_damage_regions: idle_add latency=46
process_damage_regions: idle_add latency=59

Then suddenly dropping back down... now need to figure out why!

Sun, 03 Jun 2012 10:46:14 GMT - Antoine Martin:

A simple:

time.sleep(0)

added in r884 ensures we don't queue too many idle_add calls before giving them a chance to run.

A better solution is probably to remove the threading from damage_to_data and to keep track of the idle_add latency as a measure of how smoothly the system is running, and batch more when it goes too high?

Sun, 03 Jun 2012 11:33:10 GMT - Antoine Martin: attachment set

attachment set to xpra-unthreadeddamage.patch

removes threading from damage codepath

Sun, 03 Jun 2012 14:05:47 GMT - Antoine Martin: attachment set

attachment set to xpra-unthreadeddamage-addlatencymeasurements.patch

updated patch which dumps the damage latency at various points in the pipeline

Sun, 03 Jun 2012 14:52:55 GMT - Antoine Martin:

With this code in _get_rgb_rawdata

    pixbuf = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, False, 8, width, height)
    log.info("get_rgb_rawdata(..) creating pixbuf object took %s ms", int(1000*(time.time()-start)))
    pixbuf.get_from_drawable(pixmap, pixmap.get_colormap(), x, y, 0, 0, width, height)
    log.info("get_rgb_rawdata(..) pixbuf.get_from_drawable took %s ms", int(1000*(time.time()-start)))
    raw_data = pixbuf.get_pixels()
    log.info("get_rgb_rawdata(..) pixbuf.get_pixels took %s ms", int(1000*(time.time()-start)))
    rowstride = pixbuf.get_rowstride()
    log.info("get_rgb_rawdata(..) took %s ms", int(1000*(time.time()-start)))

We can see:

get_rgb_rawdata(..) creating pixbuf object took 0 ms
get_rgb_rawdata(..) pixbuf.get_from_drawable took 182 ms
get_rgb_rawdata(..) pixbuf.get_pixels took 183 ms
get_rgb_rawdata(..) took 184 ms

So the expensive call is pixbuf.get_from_drawable() in gtk.

Solutions:

use the X11 pixmap directly/unwrapped (tricky?)
???

Sun, 03 Jun 2012 15:24:23 GMT - Antoine Martin: attachment set

attachment set to xpra-protocol-addlatencymeasurements2.patch

patch for protocol.py so the latency reported is when the packet has been sent by the network layer

Sun, 03 Jun 2012 16:47:57 GMT - Antoine Martin:

r886 commits the unthreaded damage code and adds the latency measurement via "xpra info", see changeset commit message for details

Fri, 20 Jul 2012 05:14:41 GMT - Antoine Martin: status changed; resolution set

status changed from accepted to closed
resolution set to fixed

good enough in trunk with the latency minimization code from #153

Sat, 23 Jan 2021 04:46:24 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/132