xpra icon
Bug tracker and wiki

Opened 8 years ago

Closed 8 years ago

#225 closed defect (fixed)

calculate batch delay is expensive and hurts performance

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 0.9
Component: server Version: trunk
Keywords: Cc:

Description (last modified by Antoine Martin)

Using profiling during #224, I found that we call calculate_batch_delay far too often and that it is way too expensive. (see callgraph attached)
We now avoid calling it when we are pretty sure it won't help, see r2291:

self._damage(window, 0, 0, w, h, options={"calculate" : False, "min_delay" : 50})

What we still need to do:

  • don't call this from the UI thread: either have a dedicated thread (running at low priority with long pauses, or even wake up events fired from the other threads when they think the delay needs updating) or call it from the damage thread.
  • run most of it from ServerSource, and call each WindowSource with the partial results so window-specific values can be applied without re-calculating all of it for each window.
  • keep an overall batch delay in ServerSource so that any new window can inherit a good guess of what the delay should be initially, without needing to calculate anything.
  • maybe re-write some of it in cython to improve performance

Attachments (1)

pycallgraph-damage2.png (458.9 KB) - added by Antoine Martin 8 years ago.
shows where the time is being spent when dealing with damage requests in UI thread

Download all attachments as: .zip

Change History (3)

Changed 8 years ago by Antoine Martin

Attachment: pycallgraph-damage2.png added

shows where the time is being spent when dealing with damage requests in UI thread

comment:1 Changed 8 years ago by Antoine Martin

Description: modified (diff)
Status: newaccepted

comment:2 Changed 8 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed
  • r2424 and r2425 moved the calculations to a separate thread and refactored them to ensure we only calculate common values once
  • r2461 ensure we calculate the target latency in that thread as it is quite an expensive calculation (at least too expensive to do in the critical damage path)
  • r2462 allows to use a cython variant for the calculations, giving another huge speedup

This is now mostly solved, although there may be further optimizations we can make.

Note: See TracTickets for help on using tickets.