xpra icon
Bug tracker and wiki

This bug tracker and wiki are being discontinued
please use https://github.com/Xpra-org/xpra instead.



Version 10 (modified by Antoine Martin, 8 years ago) (diff)

--

Profiling

First, please see Debugging for information on logging and the environment variables that influence the behaviour of the system.

cProfile

cProfile outputs text only, which may be more appropriate in some cases. Simply replace xpra with tests/scripts/cProfile on the command line:

tests/scripts/cProfile start :10

And on exit it will dump the statistics to the terminal, ie:

Mon Jan  7 16:24:19 2013    profile.stats

         28226 function calls (27385 primitive calls) in 4.409 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.958    3.958    4.050    4.050 {gtk._gtk.main}
        1    0.090    0.090    0.113    0.113 /usr/lib64/python2.7/site-packages/gtk-2.0/gtk/__init__.py:23(<module>)
        2    0.050    0.025    0.050    0.025 {wimpiggy.lowlevel.bindings.get_screen_sizes}
        1    0.049    0.049    0.057    0.057 /usr/lib64/python2.7/site-packages/gst-0.10/gst/__init__.py:24(<module>)
        2    0.026    0.013    0.026    0.013 {method 'read' of 'file' objects}
        2    0.014    0.007    0.014    0.007 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.012    0.012    0.012    0.012 {method 'close' of 'gtk.gdk.Display' objects}
(...)

pycallgraph

pycallgraph can generate pretty diagrams. We now ship a wrapper which takes care of simplifying the calls to pycallgraph:

./tests/scripts/xpra/pycallgraph

This command includes a number of working examples and lists the groups and threads available for profiling.

Using pycallgraph to identify bottlenecks ==

General rules and guidelines:

  • beware that changing the include and exclude sets can have a very noticeable impact on tracing performance, so bear that in mind when making comparisons
  • start with a small delay (ie: -d 2) to skip the initial setup, potentially longer if you start a server and then also need to connect the client before profiling
  • if testing locally (which is generally more convenient), don't forget to turn off mmap with --no-mmap
  • when comparing versions/changes/settings: use the runtime parameter to ensure that each run will last for exactly the same amount of time - this makes it easier to compare the labels using absolute values, and script the whole thing so the client connects at exactly the same time if profiling the server
  • focus on one thread at a time - there is not much overlap/impact between different threads
  • choose an application to test which will stress the desired portion of the code (ie: if testing the network or encoding thread, an application that generates a high frames-per-second is more appropriate than what may ultimately be the target application for the system)
  • choose an application whose behaviour is predictable: preferably not using the network or relying on any external resources. So the whole thing can be scripted, run repeatedly/reliably for comparing with previous runs.
  • start with most groups enabled
  • most groups can usually be disabled early - unless they are being profiled specifically
  • some Debugging environment variables may be useful
  • save useful graphs with all the information relating to the version used, modifications, environment, etc. So these can be reproduced later.

Fixing Bottlenecks

First, write a test to measure/isolate the problematic part of the code.
There are no magic solutions to apply, here are some solutions which have been used in the past:

  • removing code!
  • threading (we already have too many threads though..)
  • re-writing in C or Cython
  • better algorithms




Examples

maths.py re-written in Cython

Merged in r2462


  • Before:
    $ XPRA_CYTHON_MATH=0 ./tests/xpra/test_maths.py 
    calculate_time_weighted_average(100000 records)=(0.5129547191923681, 0.529864013016996)
    elapsed time: 72.6ms
    test_calculate_timesize_weighted_average(100000 records)=(70109.42497899215, 22337.16293859974)
    elapsed time: 204.9ms
    
  • After:
    $ XPRA_CYTHON_MATH=1 ./tests/xpra/test_maths.py 
    calculate_time_weighted_average(100000 records)=(0.5226684212684631, 0.5245391130447388)
    elapsed time: 4.0ms
    test_calculate_timesize_weighted_average(100000 records)=(180154.15625, 35980.69140625)
    elapsed time: 19.7ms
    

Which is 10 to 20 times faster.

high network cost in UI thread

Work in progress, see #231

high cost of logging

This is a recurring problem, see: r2294, r2290, etc

By running the server with:

xpra start :10 --start-child=glxgears

And the client with:

./tests/scripts/pycallgraph -t draw -i '*' -r 20 -- attach :10 --no-mmap

We find that logging is really costing us dearly:
No image "pycallgraph-draw-loggingisexpensive.png" attached to Profiling
So we remove logging from this critical path in r2484, and we now find:
No image "pycallgraph-draw-loggingremoved.png" attached to Profiling
That's a saving of about 35% - not bad for such a small change!