xpra icon
Bug tracker and wiki

Testing

The current milestone can be found on the roadmap page.

Generic Regression Testing Issues

  • test every xpra sub-command (upgrade, info, proxy, etc)
  • Python versions: as of version 0.15, we only support Python 2.6 and 2.7 (ie: CentOS5.x for old versions of python) - code that is not backwards compatible can often make it into the repository and should be fixed before major releases, simply compile-testing it is often enough to spot those, other issues may affect packaging (ie: #116) or also sometimes more complex (r2683), which means testing beta package builds, other bugs can be more difficult to identify and even more difficult to fix (ie: #251, #215). We also want to check that the Python 3.x version can be built via the python3-build script (compile tested), though actually using/testing it is not required at present since it isn't officially supported.
  • gtk/pygtk versions: similar to Python, older versions (ie: 2.17 and older) can cause problems, see: r2863, r2706, r2705, r1498, r555, r554.
  • client applications: it is important to test a wide range of client applications, using a wide variety of UI toolkits and languages: gtk, qt/kde, wx, Java (see #162), etc.. Each can uncover subtle bugs. Then there are specific applications that are known to cause problems because of the way the interact with the X11 server: wine applications?, VMWare? (#199), Firefox? (#220, #158, #96), gtkperf -a is very good at triggering races in window setup/cleanup, etc. Also, newer versions of specific target applications may change the behaviour of the application in ways which have a significant impact on xpra compression/quality.
  • backwards compatibility with old versions: we try to keep backwards compatibility with older versions as much as possible, though some features may not be available. Occasionally we will drop compatibility (ie: #57) to allow the code to move on from old crufty workarounds.
  • unusual setups: although these may not be optimal, people still expect this to work - and it should! Again, the errors that this uncovers may well help in other areas. Things like: running xpra nested (#210), running xpra from "ssh -X" / "ssh -Y" (#207, #3)
  • platform specific quirks: OSX problems (#249), platforms with static builds of the x264 and vpx libraries or those where the dynamic libraries are bundled in a binary image (#103): MS Windows, OSX, CentOS 5.x, CentOS 6.x, Debian Squeeze, Ubuntu Lucid)
  • desktop environments: each DE may handle things slightly differently and uncover bugs, especially when it comes to window placement, resizing, minimizing, etc. Obviously we want to test the major DEs (gnome/cinnamon, KDE, LXDE, XFCE) but it may be worth testing some of the more unusual window managers too (fluxbox, window maker, etc)
  • binary builds with library updates (OSX and MS Windows), in particular: gtk-osx updates and rebuilds, pycrypto, gstreamer, pywin32, etc..
  • installations vs upgrades: sometimes this makes a difference if the OS decides to keep the old default configuration in place..
  • memory leaks (not spotted by automated tests as these do not run for long enough to trigger the problem)

Specific Testing Combinations

The release notes should help in figuring out what has changed and therefore what is likely to require more thorough testing. As can be seen in the number of items listed above, testing every combination is simply impossible. Here are some of the most common setups, and those that are most likely to uncover compatibility issues. Ideally, all of those should be tested before major releases.

  • All MS Windows clients (from XP to 8) with CentOS/RedHat 6.x and 7.x servers
  • OSX clients
  • xpra 0.7.x client and servers
  • CentOS 5.x/6.x clients with both old servers (CentOS 5.x/6.x) and new ones (Fedora 18+ or Debian sid)
  • Debian Squeeze or Ubuntu Lucid packages.

Automated Performance and Regression Testing

The xpra.test_measure_perf script can be used to run a variety of client applications within an xpra session (or optionally vnc) and get statistics on how well the encoding performed. The data is printed out at the end in CSV format which you can then import into any tool to compare results - you can find some examples generated using sofastats here and here. There is a facility for generating charts directly from the CSV data, using a script that we now provide, described below.

It can also be useful to redirect the test's output to a log file to verify that none of the tests failed with any exceptions/errors (looking for exception messages in the log afterwards).

At the moment it does not have a command line interface, and all the options have to be edited directly. However, we have improved that process by splitting out the configuration data into a separate file: xpra.perf_config_default.py.

Note: to take advantage of iptables packet accounting (mostly for comparing with VNC which does not provide this metric), follow the error message and setup iptables rules to match the port being used in the tests, ie: by default:

iptables -I INPUT -p tcp --dport 10000 -j ACCEPT
iptables -I OUTPUT -p tcp --sport 10000 -j ACCEPT

The obvious benchmarking caveats apply:

  • make sure there are no other applications running
  • disable cron jobs or any other scheduled work (systemd makes this a little harder)
  • etc..

Please note caveats in the "Misleading Statistics" section below.

Generating Charts from Performance Test Results

To create multiple output files which can be used to generate charts, using xpra.test_measure_perf_charts:

  • Build a config class by taking a copy from xpra.perf_config_default.py, then making changes as necessary.
  • Determine the values of the following variables: prefix: (a string to identify the data set), ID: (a string to identify the variable that the data set is testing, for example '14', for testing version 14), repetitions: the number of times you want to run the tests.
  • The data file names you will produce will then be in the format: prefix_id_rep.csv.
  • With this information in hand you can now create a script that will run the tests.

For example:

./test_measure_perf.py all_tests_40 ./data/all_tests_40_14_1.csv 1 14 > ./data//all_tests_40_14_1.log
./test_measure_perf.py all_tests_40 ./data/all_tests_40_14_2.csv 2 14 > ./data//all_tests_40_14_2.log

In the above example, test_measure_perf is run twice, using a config class named "all_tests_40.py", and outputting the results to data files using the prefix "all_tests_40", an ID of "14", for repetitions 1 and 2.

The optional arguments "1 14" and "2 14" pass the rep and ID values in to the script, which will store them in the "Custom Params" column of the data.

After running your script, open your copy of xpra.test_measure_perf_charts and change the variables at the top of the file so the script will point to the series of data files that you have generated.

These variables are described in the script. Once this has been done, run the script on the command line, and an HTML file will be generated containing your charts.

This HTML file will be loadable without modifications if it is placed in the trunk/src/tests/xpra directory. If placed elsewhere, then the js and css directories found in that path, will need to be copied to the same location as the HTML file.

Misleading Statistics

One has to be very careful when interpreting performance results, here are some examples of misleading statistics:

  • higher CPU usage is not necessarily a bad thing if the framerate has increased
  • lower latency is good, but not if the highest latency is up
  • when averaging samples, all the tests must have completed (if one test run was cut short, it may be skewed)
  • averaging between throttled and non-throttled test samples is not particularly useful
  • averaging between many different test commands is not particularly useful, especially:
  • gtkperf tests really skew the results and should generally be excluded from sample data (or not run at all)
  • the automated tests may run with the "vpx" and "x264" encodings many more times because of the many extra options that these encodings support (quality, opengl, etc..), giving more sample data, and more varied, this cannot be compared directly with other encodings without some initial filtering.
  • when measuring the performance of video encoders, it is important to take into account both the bandwidth used (mostly tied to the "speed" option) and the quality of the output (mostly tied to the "quality" option, but also heavily influenced by the amount of lossless regions sent)
  • etc..

XDC2015 - Martin Peres - Pitfalls of benchmarking graphics applications for performance tracking talks about similar issues.

Last modified 3 months ago Last modified on 06/25/17 22:26:50