Xpra: Ticket #2139: less copying to reassemble packets

At the moment, we use:

            if read_buffer:
                read_buffer = read_buffer + buf
            else:
                read_buffer = buf

Which creates a new buffer every time. Since we read in 64KB chunks, this could cause the same memory to be copied a number of times before we reach the packet size.

We can usually parse the packet size from the very first chunk, and then we don't need to copy anything until we've received the full packet.

Network tracker ticket: #1590

Fri, 08 Feb 2019 05:15:19 GMT - Antoine Martin: status, description changed

status changed from new to assigned
description modified (diff)

Fri, 08 Feb 2019 13:08:55 GMT - Antoine Martin: attachment set

attachment set to protocol-noappend-v1.patch

poc

Fri, 08 Feb 2019 15:09:02 GMT - Antoine Martin:

Interesting stuff: the new unit test added in r21586 + r21587 was showing very low speeds (60MB/s) until I replaced the fake reads with memory slicing in r21589 (now ~1GB/s), which shows that memory copying can be expensive.

Fri, 08 Feb 2019 16:35:23 GMT - Antoine Martin:

r21591 improves the test code to make it more realistic. With this change:

old code: 1.5 to 4Gbps (average around 2Gbps)
new code: 2 to 10Gbps! (average around 3.5Gbps)

r21592 replaces the protocol code with the new one. I would still like to run the profiling code on this.

Sat, 09 Feb 2019 08:36:12 GMT - Antoine Martin:

Rewards from profiling: r21596 improves the performance by simply removing a non-essential logging call.

Sat, 09 Feb 2019 12:11:32 GMT - Antoine Martin: attachment set

attachment set to format-thread-218329.png

sample profiling output

Sat, 09 Feb 2019 12:13:12 GMT - Antoine Martin:

profiling code added in r21602
another logging call removed: r21604

I'm not sure there's much more we can do to streamline things, yet the performance is a tad disappointing, only ~1Gbps. Sample profiling output:

Sun, 10 Feb 2019 04:10:24 GMT - Antoine Martin: attachment set

attachment set to cythonize-protocol.patch

try to speedup protocol using cython

Sun, 10 Feb 2019 05:43:52 GMT - Antoine Martin:

I thought the bottleneck was in python, so I converted the module to cython, but that didn't help at all (less than 1%). Turns out that the performance does increase if we measure things for longer. Cache warming up? Startup costs?

Sun, 10 Feb 2019 14:40:35 GMT - Antoine Martin: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

Minor protocol cleanup in r21610. More improvements to the tests in:

r21609 correctness
r21611 websocket fix
r21612 add websocket test
r21613 prettify output

The websocket protocol is within touching distance of the raw xpra protocol:

xpra      format thread:			227MB/s
xpra      packets formatted per second:		2734
xpra      incoming packet processing speed:	612MB/s
xpra      packet parsed per second:		78501
websocket format thread:			225MB/s
websocket packets formatted per second:		2705
websocket incoming packet processing speed:	401MB/s
websocket packet parsed per second:		64813

There is still room for improvement, but this should be sufficient for some time.

Sat, 23 Jan 2021 05:43:14 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2139