xpra icon
Bug tracker and wiki

Window Refresh

The key feature of xpra is the forwarding of individual window contents to the client. This is achieved using the X11 Composite extension and for the windows and servers that support it with the MIT-SHM extension, xpra is a compositing window manager that just happens to display the window contents remotely.
The speed at which the server compresses and sends screen updates to the client depends on a number of factors, the most important ones are obviously the encoding used and your network connection, but as can be seen below the client rendering also has an influence.
There are only a few things the system can tune at runtime, each is explained in more detail below:

  • the "batch delay": this is the time that the server will wait for screen updates to accumulate before grabbing the next frame
  • the "quality" setting: mainly for the video encoders, but also jpeg and webp
  • the "speed" setting: mainly for the video encoders, but also used for png/jpeg (at very low speed, we "optimize" the compression)

Code Pointers

Since this is an area that receives constant tuning and improvements, the most reliable source of current information is the code itself:

  • xpra/server/batch_config.py stores the dynamically updated batch delay configuration data and also historical values and the last set of factors used to derive the delay
  • xpra/server/server_source.py: a ServerSource is instantiated for each client connection.
    • calculate_delay_thread which runs at most 4 times a second to tune the batch delay, both the global default one (default_batch_config which is a DamageBatchConfig) and the one specific to each window (see below for details)
  • xpra/server/source_stats.py: GlobalPerformanceStatistics: collects various statistics about this client's connection speed, the various work queues, etc. The get_factors method returns a guesstimate of how the batch delay should be adjusted for the given parameters ("target latency" and "pixel count")
  • xpra/server/window_source.py: this class is instantiated for each window and for each client, it deals with sending the window's pixels to the client. In particular:
  • xpra/server/window_stats.py: WindowPerformanceStatistics: per-window statistics: performance of the encoding/decoding, amount of frames and pixels in flight, etc. Again, the get_factors method rreturns a guesstimate of how the batch delay should ne agjusted for the given parameters ("pixel_count" and current "batch delay")
  • xpra/server/batch_delay_calculator.py: this class is where we use the factors obtained by get_factors above to tune the various attributes.


Note also that many of these classes have a add_stats method which is used by xpra info to provide detailed statistics from the command line and is a good first step for debugging.

Background on damage request processing

Step by step (and oversimplified - so check the code for details) of the critical damage code path:

  • an X11 client application running against the xpra display updates something on one of its windows, or creates a new window, resizes it, etc. It notifies the X11 server that something needs to be redrawn, changed or updated.
  • xpra, working as a compositing window manager, receives a notification that something needs updating/changing.
  • this Damage event is forwarded through various classes (CompositeHelper, WindowModel, .. and eventually ends up in the server's damage method - often as a Damage event directly, or in other cases we synthesize one to force the client to (re-)draw the window.
  • then for each client connected, we call damage on its ServerSource (see above)
  • then this damage event is forwarded to the WindowSource for the given window (creating one if needed)
  • then we either add this event to the current list of delayed regions (which will expire after "batch delay"), we create a new delayed region, or if things are not congested at all we just process it immediately
  • process_damage_region (called when the timer expires - or directly) will grab the window's pixels and queue a request to call "make a data packet" from it. This is done so that we minimize the amount of work done in the critical UI thread, another thread will pick items off this queue.

Note: delayed regions may get delayed more than originally intended if the client has an excessive packet or pixel backlog.
The damage thread will pick items from this queue, decide what compression to use, compress the pixels and make a data packet then queue this packet for sending. Note that the queue is in ServerSource and it is shared between all the windows..

Things not covered here:

  • auto-refresh
  • choosing between a full frame or many smaller regions
  • various tunables
  • callbacks recording performance statistics
  • callbacks between ServerSource and WindowSource
  • calculating the backlog
  • error handling
  • cancelling requests
  • client encoding options
  • mmap
  • video encoding: choosing a pipeline (many factors..), window dimensions restrictions, etc..

The Batch Delay Factors

Warning: this section is out of date

Again, this is a simplified explanation, and the actual values used will make use of heuristics, weighted averages, etc. Recent vs average values, trends, etc. Please refer to the source for details.

From GlobalPerformanceStatistics:

  • client-latency: the client latency measured during the processing of pixel data (when we get the echo back). We know the lowest latency observed, and we try to keep the latency as low as that
  • client-ping-latency: as above, but measured from ping packets (client)
  • server-ping-latency: latency measured when the client pings the server, the client then sends this information as part of ping echo response packets
  • damage-data-queue: the number of damage frames waiting to be compressed, we want to keep this low - especially as this data is uncompressed at this point, so it can be quite large
  • damage-packet-queue-size: the number of packets waiting to be sent by the network layer - we want to keep this low, but sometimes many small packets can make it look worse than it is..
  • damage-packet-queue-pixels: the number of pixels in the packet queue, the target value depends on the current size of the window
  • mmap-area: how full the shared memory area is (only when mmap is enabled)


From WindowPerformanceStatistics:

  • damage-processing-latency: how long frames take from the moment we receive the damage event until the packet is queued for sending. This value increases with the number of pixels that are encoded. We want to keep this low.
  • damage-processing-ratios: as above, but based on the trend: this measure goes up when we queue more damage requests than we can encode.
  • damage-out-latency: how long frames take from the moment the damage event until the packet containing it has made it out of the network layer (it may still be in the operating system's buffers though). Again, we want to keep this low. This value increases when the network becomes the bottleneck.
  • damage-network-delay: the difference between damage processing latency} and damage send latency, this should equal to the network latency - but because the values are running averages, it is not very reliable.
  • network-send-speed: we keep track of the socket's performance (in bytes per second) - generally the values are very high because the OS will just buffer things for us, but when things start to back up, we can notice that the send speed drops rapidly
  • client-decode-speed: how quickly the client is decoding frames
  • damage-rate: when nothing happens for a while, this means that the window is not busy and therefore we ought to be able to lower the delay


Each factor provides:

  • a textual description which can be used for debugging
  • a change factor (float number) ranging from 0.0 (lower the delay) to a small positive number (up to 10 - generally, but that is not enforced).. A value of 1.0 means no change.
  • a weight value: this is to measure how confident this particular factor is about the change it requests.

ie:

  • with a weight of 0.0, the factor is irrelevant as it won't be counted
  • with a high enough weight, a factor can overrule others

Speed and Quality auto tuning

Please refer to the code for accurate information.

The auto-tuning code should:

  • honour the minimum speed and quality settings set via --min-quality and -min-speed
  • tune the speed so that encoding of one frame takes roughly as long as the batch delay (the time spent accumulating updates for the next frame), so that there is only one frame in the pipeline on average. The aim is to always keep the encoder busy, but without any backlog. (note: this does not necessarily lead to the best framerate, since a busy cpu may cause the batch delay to increase... and in turn lower the speed)
  • client speed: if the client is struggling to decode frames, then we use a higher speed (which means less effort in decompressing the stream) - the target client decoding speed is 8 MPixels/s
  • quality: we try to lower the quality if we find that the client has a backlog of frames to draw/acknowledge (usually a sign of network congestion), or if the measure client latency is higher than normal (also a sign of congestion)

Debugging

"xpra info" provides a lot of details on the current values used for batch delay and encoding speed/quality calculations. A good start is:

xpra info | grep batch
xpra info | grep latency.avg

You should narrow down to the individual window you are interested in.

Last modified 6 months ago Last modified on 01/12/17 12:26:26