#1926 closed enhancement (fixed)
replace numpy dependency in websockify
Reported by: | Antoine Martin | Owned by: | J. Max Mena |
---|---|---|---|
Priority: | major | Milestone: | 2.4 |
Component: | external | Version: | 2.3.x |
Keywords: | Cc: |
Description
As per ticket:1913#comment:2, importing numpy wastes a lot of memory and the only thing that websockify uses it for is to do a string XOR!
Let's submit a patch to make this more pluggable, we have our own code which can provide it. (our cyxor cython extension)
The import would need to be deferred since the websockify module imports the sub-modules. (ie: WebSocket
constructor could get the xor wrapper function from websockify.xor
which we would then be able to patch up)
Alternatively, we could use the modules context hack to temporarily hide "numpy" from the module loader, then patch the _unmask
method.
This approach would work with older version of websockify - and they have not made any new releases in years now...
See also #1134.
Attachments (1)
Change History (10)
comment:1 Changed 3 years ago by
Owner: | changed from Antoine Martin to J. Max Mena |
---|
comment:2 Changed 3 years ago by
r19989 caused a regression reported in ticket:1941#comment:7 :
disconnect: zlib packet decompression failed must be string or read-only buffer, not memoryview
Fixed in r20229.
comment:4 Changed 2 years ago by
See also: #2104, with websockify > 0.8 we can silence the numpy warning.
comment:5 Changed 2 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Not heard back, so:
- r21389 adds a simple command line tool to compare the performance of the default websockify unmask function (which uses numpy) and our cython implementation
- r21392 adds a unit test to verify that we get the same result as numpy, no matter what the buffer alignment is
Sample output:
$ python ./tests/xpra/net/websockify_numpy_vs_cython.py * slice size: 1KB - websockify via numpy: 119MB/s - xpra cython : 635MB/s * slice size: 8KB - websockify via numpy: 1189MB/s - xpra cython : 4016MB/s * slice size: 64KB - websockify via numpy: 3148MB/s - xpra cython : 9698MB/s * slice size: 1024KB - websockify via numpy: 1207MB/s - xpra cython : 6416MB/s * slice size: 16384KB - websockify via numpy: 1096MB/s - xpra cython : 1170MB/s * slice size: 65536KB - websockify via numpy: 1152MB/s - xpra cython : 1254MB/s
It's much faster on smaller packets (1KB: ~5 times faster!) than on really big ones (64MB: only 10%).
Typical packets are in the 1-64KB range and this is also where we benefit the most.
But I wasn't happy with those values as lz4 can compress data at that speed.
So r21393 optimizes the code for better performance.
Here are some results after setting cpu governor to performance:
$ python ./tests/xpra/net/websockify_numpy_vs_cython.py * slice size: 1KB - xpra cython : 899MB/s - websockify via numpy: 141MB/s * slice size: 8KB - xpra cython : 5330MB/s - websockify via numpy: 1458MB/s * slice size: 64KB - xpra cython : 14233MB/s - websockify via numpy: 4138MB/s * slice size: 1024KB - xpra cython : 11877MB/s - websockify via numpy: 4809MB/s * slice size: 16384KB - xpra cython : 3974MB/s - websockify via numpy: 1173MB/s * slice size: 65536KB - xpra cython : 2833MB/s - websockify via numpy: 1166MB/s
Note: when numpy gets vectorized optimizations (https://lwn.net/Articles/691932/), we can revisit this.
comment:6 Changed 2 years ago by
comment:8 Changed 2 years ago by
comment:9 Changed 4 months ago by
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1926
r19989 does this by pretending numpy is not installed and then monkeypatching a cython function to do the work.
This should be faster than the numpy code it replaces:
The impact should be negligible since the server doesn't do a lot of reading.
importing websockify will now show this warning:
This can safely be ignored.
@maxmylyn: please check that this new code does not cause any performance regressions.