Xpra: Ticket #224: synchronized X11 calls and performance

In trying to fix support for running gtkperf, I found out that r1783 (use synchronized X11 calls by default) caused a fairly big performance regression. The problem is that we cannot simply go back to unsynchronized calls because then we get random blowups in unpredictable places when the system is stressed:

ie: when running gtkperf -a in a loop)
ie: #208, #167, #71, etc

So I've mitigated this by optimizing the hell out of the critical codepaths, see: r2307, r2303, r2302, r2294, r2291, r2290, r2275, r2271, r2270, r2269. Including writing an inlined cython version of trap.call for get_pywindow (see r2289). Also, we now try to group more X11 calls before calling XSync (see r2281) and we keep broken window models around longer to avoid attempting to create them dozens of time before giving up: we take a shortcut (see r2304), which also fixes a bug where the same gdk window would end up having dozens of event receivers (as made apparent by r2267). I've reviewed every single call to trap.call and trap.swallow and replaced them with their more explicit counterpart (sync/unsynced), see: r2285, r2284, r2282, r2280, r2279, r2277, r2276 The general rule is to use synced calls when not in the critical path (ie: during setup or rare/important events) or when we must ensure a consistent state (ie: setting up window model wrappers, etc) - there are only a few cases where we can do unsynced calls: generally when we are certain that the call will be followed by another synced call before returning from the thread, or when the syscall is deemed safe. We now have much better support for profiling CPU usage: see r2298, r2297

What is left:

#225: improve damage handling in python UI thread
check which X11 calls really do need synchronized error handling and which ones do not (in particular, property change/get/set calls - unsure about those).
gtkperf -a, and probably other applications too, can still generate hundreds of DamageNotify events per second, more than we can deal with. So we end up starving the main UI thread and the server latency goes through the roof: often higher than 5 seconds! Which also makes us raise the batch delay way too high - but in a way, this is the right response, so probably no need to change that. What we need to do is ensure we deal with the DamageNotify much more quickly. Options:
- do more profiling, maybe I've missed something..
- do some initial batching in the cython event handler, and pass a list of damage regions (via idle_add or timeout_add) instead of just one damage rectangle.
- avoid most of this indirection stuff with gobject signals going via CompositeHelper then WindowModel then Server (each signal call has to go from one class to another via python to C and back...)
- give up and re-write most of the server in C...

Fri, 21 Dec 2012 04:24:20 GMT - Antoine Martin: status, description changed

Fri, 21 Dec 2012 04:30:45 GMT - Antoine Martin: attachment set

Fri, 21 Dec 2012 04:32:07 GMT - Antoine Martin: attachment set

Fri, 21 Dec 2012 04:32:59 GMT - Antoine Martin: attachment set

Fri, 21 Dec 2012 06:55:48 GMT - Antoine Martin: description changed

Tue, 08 Jan 2013 15:17:30 GMT - Antoine Martin: status changed; resolution set

Also see X11 errors debugging wiki which has a sample error.py which can be used to debug or trace these sync issues.

Sat, 16 Feb 2013 11:14:04 GMT - Antoine Martin:

Looks like pretty much every single X11 call needs to be synced to prevent crashes later on.. sigh