Even though this was much improved in 0.15, we still spend too much time deciding what to do with the pixels.
A first step would be to split it up and rewrite some parts in Cython.
A more interesting approach, also a lot more challenging, would be to delegate this logic to hardware (via cuda, opencl or other) so that we hand over the window buffer update and let the hardware figure it out, giving us the compressed data back. cuda and opencl can easily take care of things like delta (xor), even stream compression could be adapter (chunking the data), see Compression library using Nvidia's CUDA.
Largely superseded by #1700
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/920