The key feature of xpra is the forwarding of individual window contents to the client. This is achieved using the X11 Composite extension and for the windows and servers that support it with the MIT-SHM extension, xpra is a compositing window manager that just happens to display the window contents remotely.
The speed at which the server compresses and sends screen updates to the client depends on a number of factors, the most important ones are obviously the encoding used and your network connection, but as can be seen below the client rendering also has an influence.
There are only a few things the system can tune at runtime, each is explained in more detail below:
- the "batch delay": this is the time that the server will wait for screen updates to accumulate before grabbing the next frame
- the "quality" setting: mainly for the video encoders, but also jpeg and webp
- the "speed" setting: mainly for the video encoders, but also used for png/jpeg (at very low speed, we "optimize" the compression)
Since this is an area that receives constant tuning and improvements, the most reliable source of current information is the code itself:
- xpra/server/batch_config.py stores the dynamically updated batch delay configuration data and also historical values and the last set of factors used to derive the delay
- xpra/server/server_source.py: a
ServerSourceis instantiated for each client connection.
calculate_delay_threadwhich runs at most 4 times a second to tune the batch delay, both the global default one (
default_batch_configwhich is a
DamageBatchConfig) and the one specific to each window (see below for details)
GlobalPerformanceStatistics: collects various statistics about this client's connection speed, the various work queues, etc. The
get_factorsmethod returns a guesstimate of how the batch delay should be adjusted for the given parameters ("target latency" and "pixel count")
- xpra/server/window_source.py: this class is instantiated for each window and for each client, it deals with sending the window's pixels to the client. In particular:
WindowPerformanceStatistics: per-window statistics: performance of the encoding/decoding, amount of frames and pixels in flight, etc. Again, the
get_factorsmethod rreturns a guesstimate of how the batch delay should ne agjusted for the given parameters ("pixel_count" and current "batch delay")
- xpra/server/batch_delay_calculator.py: this class is where we use the factors obtained by
get_factorsabove to tune the various attributes.
Note also that many of these classes have a
add_stats method which is used by
xpra info to provide detailed statistics from the command line and is a good first step for debugging.
Background on damage request processing
Step by step (and oversimplified - so check the code for details) of the critical damage code path:
- an X11 client application running against the xpra display updates something on one of its windows, or creates a new window, resizes it, etc. It notifies the X11 server that something needs to be redrawn, changed or updated.
- xpra, working as a compositing window manager, receives a notification that something needs updating/changing.
Damageevent is forwarded through various classes (
WindowModel, .. and eventually ends up in the server's
damagemethod - often as a
Damageevent directly, or in other cases we synthesize one to force the client to (re-)draw the window.
- then for each client connected, we call
- then this damage event is forwarded to the
WindowSourcefor the given window (creating one if needed)
- then we either add this event to the current list of delayed regions (which will expire after "batch delay"), we create a new delayed region, or if things are not congested at all we just process it immediately
process_damage_region(called when the timer expires - or directly) will grab the window's pixels and queue a request to call "make a data packet" from it. This is done so that we minimize the amount of work done in the critical UI thread, another thread will pick items off this queue.
Note: delayed regions may get delayed more than originally intended if the client has an excessive packet or pixel backlog.
The damage thread will pick items from this queue, decide what compression to use, compress the pixels and make a data packet then queue this packet for sending. Note that the queue is in
ServerSource and it is shared between all the windows..
Things not covered here:
- choosing between a full frame or many smaller regions
- various tunables
- callbacks recording performance statistics
- callbacks between
- calculating the backlog
- error handling
- cancelling requests
- client encoding options
- video encoding: choosing a pipeline (many factors..), window dimensions restrictions, etc..
The Batch Delay Factors
Warning: this section is out of date
Again, this is a simplified explanation, and the actual values used will make use of heuristics, weighted averages, etc. Recent vs average values, trends, etc.
Please refer to the source for details.
client-latency: the client latency measured during the processing of pixel data (when we get the echo back). We know the lowest latency observed, and we try to keep the latency as low as that
client-ping-latency: as above, but measured from ping packets (client)
server-ping-latency: latency measured when the client pings the server, the client then sends this information as part of ping echo response packets
damage-data-queue: the number of damage frames waiting to be compressed, we want to keep this low - especially as this data is uncompressed at this point, so it can be quite large
damage-packet-queue-size: the number of packets waiting to be sent by the network layer - we want to keep this low, but sometimes many small packets can make it look worse than it is..
damage-packet-queue-pixels: the number of pixels in the packet queue, the target value depends on the current size of the window
mmap-area: how full the shared memory area is (only when mmap is enabled)
damage-processing-latency: how long frames take from the moment we receive the damage event until the packet is queued for sending. This value increases with the number of pixels that are encoded. We want to keep this low.
damage-processing-ratios: as above, but based on the trend: this measure goes up when we queue more damage requests than we can encode.
damage-out-latency: how long frames take from the moment the damage event until the packet containing it has made it out of the network layer (it may still be in the operating system's buffers though). Again, we want to keep this low. This value increases when the network becomes the bottleneck.
damage-network-delay: the difference between
damage processing latency} and
damage send latency, this should equal to the network latency - but because the values are running averages, it is not very reliable.
network-send-speed: we keep track of the socket's performance (in bytes per second) - generally the values are very high because the OS will just buffer things for us, but when things start to back up, we can notice that the send speed drops rapidly
client-decode-speed: how quickly the client is decoding frames
damage-rate: when nothing happens for a while, this means that the window is not busy and therefore we ought to be able to lower the delay
Each factor provides:
- a textual description which can be used for debugging
- a change factor (float number) ranging from 0.0 (lower the delay) to a small positive number (up to 10 - generally, but that is not enforced).. A value of 1.0 means no change.
- a weight value: this is to measure how confident this particular factor is about the change it requests.
- with a weight of 0.0, the factor is irrelevant as it won't be counted
- with a high enough weight, a factor can overrule others
Speed and Quality auto tuning
Please refer to the code for accurate information.
The auto-tuning code should:
- honour the minimum speed and quality settings set via
- tune the speed so that encoding of one frame takes roughly as long as the batch delay (the time spent accumulating updates for the next frame), so that there is only one frame in the pipeline on average. The aim is to always keep the encoder busy, but without any backlog. (note: this does not necessarily lead to the best framerate, since a busy cpu may cause the batch delay to increase... and in turn lower the speed)
- client speed: if the client is struggling to decode frames, then we use a higher speed (which means less effort in decompressing the stream) - the target client decoding speed is 8 MPixels/s
- quality: we try to lower the quality if we find that the client has a backlog of frames to draw/acknowledge (usually a sign of network congestion), or if the measure client latency is higher than normal (also a sign of congestion)
xpra info" provides a lot of details on the current values used for batch delay and encoding speed/quality calculations.
A good start is:
xpra info | grep batch xpra info | grep latency.avg
You should narrow down to the individual window you are interested in.