xpra icon
Bug tracker and wiki

Profiling

First, please see Debugging for information on logging and the environment variables that influence the behaviour of the system.
Window Refresh speed is a special case as it depends on many factors and not just pure code performance, though some parts of this code can obviously benefit from profiling.
If profiling itself slows down your process so much that it cannot connect to the server, or if the server cannot respond quickly enough, increase the default socket timeout (in seconds):

XPRA_SOCKET_TIMEOUT=120 xpra ...

cProfile

cProfile outputs text only, which may be more appropriate in some cases. Simply replace xpra with tests/scripts/cProfile on the command line:

tests/scripts/cProfile start :10

And on exit it will dump the statistics to the terminal, ie:

Mon Jan  7 16:24:19 2013    profile.stats

         28226 function calls (27385 primitive calls) in 4.409 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.958    3.958    4.050    4.050 {gtk._gtk.main}
        1    0.090    0.090    0.113    0.113 /usr/lib64/python2.7/site-packages/gtk-2.0/gtk/__init__.py:23(<module>)
        2    0.050    0.025    0.050    0.025 {wimpiggy.lowlevel.bindings.get_screen_sizes}
        1    0.049    0.049    0.057    0.057 /usr/lib64/python2.7/site-packages/gst-0.10/gst/__init__.py:24(<module>)
        2    0.026    0.013    0.026    0.013 {method 'read' of 'file' objects}
        2    0.014    0.007    0.014    0.007 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.012    0.012    0.012    0.012 {method 'close' of 'gtk.gdk.Display' objects}
(...)

pycallgraph

pycallgraph can generate pretty diagrams. We now ship a wrapper which takes care of simplifying the calls to pycallgraph:

./tests/scripts/pycallgraph

This command includes a number of working examples and lists the groups and threads available for profiling.

Using pycallgraph to identify bottlenecks

General rules and guidelines:

  • beware that changing the include and exclude sets can have a very noticeable impact on performance, so bear that in mind when making comparisons
  • when tracing threads, only the first thread name matching will be traced (so you must connect the client first, and by directly specifying the socket to connect to - automatic mode will cause a one-shot connection!)
  • since the profiling code affects the performance characteristics of the software, once a problem is reasonably well isolated it is often better to switch to dedicated performance monitoring code (using simple print statements for example)
  • start with a small delay (ie: -d 2) to skip the initial setup, potentially longer if you start a server and then also need to connect the client before profiling
  • if testing locally (which is generally more convenient), don't forget to turn off mmap with --no-mmap
  • when comparing versions/changes/settings: use the runtime parameter to ensure that each run will last for exactly the same amount of time - this makes it easier to compare the labels using absolute values, and script the whole thing so the client connects at exactly the same time if profiling the server
  • ensure that the picture quality is the same, if the client or server is struggling, the quality may drop which can result in better numbers (latency, fps) when in fact the performance is worse
  • focus on one thread at a time - there is not much overlap/impact between different threads
  • choose an application to test which will stress the desired portion of the code (ie: if testing the network or encoding thread, an application that generates a high frames-per-second is more appropriate than what may ultimately be the target application for the system)
  • choose an application whose behaviour is predictable: preferably not using the network or relying on any external resources. So the whole thing can be scripted, run repeatedly/reliably for comparing with previous runs.
  • start with most groups enabled
  • most groups can usually be disabled early - unless they are being profiled specifically
  • some Debugging environment variables may be useful
  • save useful graphs with all the information relating to the version used, modifications, environment, etc. So these can be reproduced later.

Fixing Bottlenecks

First, write a test to measure/isolate the problematic part of the code.
There are no magic solutions to apply, here are some solutions which have been used in the past:

  • removing code!
  • threading (we already have too many threads though..)
  • re-writing in C or Cython
  • better algorithms

You can find a detailed example of how Cython was used to eliminate a performance bottleneck here: wiki/Profiling/Examples

Last modified 16 months ago Last modified on 12/30/15 12:55:29