I have been running performance tests on XPRA (between work and home) and I was noticing that if my internet connection got congested with H264 content from my OpenGL window my xterm windows would become very slow and they *never* recover unless I restart the client.
I have now tested things on my gigabit work LAN. Using a piece of software that rate-limits the MS Windows XPRA client. I found that if I impose a 5kb/s upload limit things get very slow, and that when I turn off this limit the recovery is poor. I believe, therefore, that I have found way to reliably simulate the problem that I was seeing from home. When temporary congestion seems to make things slow XPRA seems to not recover. I'm baffled.
Here my procedure:
What happens:
What I'm using:
XPRA_FORCE_BATCH=1 seems to solve the problem on the LAN (later I will look at what happens over broadband). I think that somehow what is going on in my OpenGL window is driving the batch delay up. (When I opened the ticket I didn't understand what batch delay did).
If what is happening is what I'm guessing I think that batch delay for "text" windows should plummet rapidly after a spike because sluggish shells are hard to use (compared to GUIs).
Sounds similar to #1911.
Forcing XPRA_FORCE_BATCH=1
will ensure that we let regions accumulate, giving more opportunity for rectangles to get merged and for packet aggregation (#619) - which means better use of the more limited bandwidth.
Maybe we should always batch by default, or at least always batch when we see any congestion.
How can I reproduce this easily on my laptop?
My procedure is
I have found that with:
Environment=XPRA_FORCE_BATCH=1 Environment=XPRA_BATCH_MAX_DELAY=50
I get pretty good results over broadband and LAN. I'm *not* saying that is the right thing to do, frankly, I have very little clue about this. I'm going to test performance on an LTE hotspot shortly...
With these settings + dbus video-box hinting XPRA is working better than anything else I've ever tried. I'll soon be deploying it to some more of my users.
Using VirtualGL run lsprepost with a nontrivial 3D model. I can provide you with lsprepost and a nontrivial data set. I'm guessing that anything that really congests the network will do the trick.
Can you reproduce with something more widely available? glxgears or glxspheres perhaps?
I set it to limit download to 50KB/s and upload to 5KB/s. Then things get very slow (as expected).
It's a miracle it works at all. 5KB/s is really much lower than anything it was ever designed for!
Then I go ahead and uncheck the limits, and the batch delay is huge and things, especially xterms, become unusable.
Ah.
Environment=XPRA_FORCE_BATCH=1
It should be fine to change this default. In almost all cases, batching is the right thing to do. The minimum, which is 5ms is imperceptible anyway. VNC servers have it always enabled.
Environment=XPRA_BATCH_MAX_DELAY=50
I am less sure about this one. With the mostly dynamic batch delay code, the base value is almost meaningless. Unless when it is high and causes problems... The only issue with capping the batch delay is that we have other heuristics that use the batch delay as input. I'll see what I can do.
I have not disappeared.
The performance of VirtualGL on LTE hotspot was really great. This is a testament to XPRA, not my hotspot, which isn't that great.
I will get back with you on this once I have finished hooking dbus to ls-prepost. I have gotten our developers to give me a callback, and I have figured out how to extract relative coordinates from the glCanvas. Now, I have to write the dbus code...
r21267 enables batching by default. As for the max delay change, my current ideas are:
More improvements in:
This won't fix your problems, but it will help.
Can you please capture the -d stats
debug output of when the batch delay stays high when it shouldn't?
I've attached log1.txt.
At about 9:50:50 I do a rotation and it's good. At about 9:51:00 I choke the bandwidth At about 9:51:22 I restore the bandwidth. The 3D window remains slow, which is not good. Oddly, my xterm comes right back, which is good. At about 9:52:30 The lag seems to be reduced mostly.
Thanks, I can reproduce the problem locally using tc. This is caused by the latency heuristics: an increase in latency causes the batch delay to go up quickly, but a decrease in latency does not bring it back down quickly enough.
More fixes:
@nathan_lstc: is this now usable? (ignoring the pycuda issue from ticket:2022#comment:72 for now)
I have 4 computers runnings XPRA servers right now. Only one of them is running vanilla yum-repo. I have just upgraded that one to r21314. Out of the box it seems to work well. Here is my systemd file:
ExecStart=/bin/sh -c "cd ~;PATH=/opt/xpra/bin:$PATH xpra --no-daemon start --no-printing --start-via-proxy=no --systemd-run=no --start=\"xrdb -merge $HOME/.Xresources\" --start-child=xterm --exit-with-children --mdns=no --xsettings=no :`id -u`" Environment=PYTHONPATH=/opt/xpra/lib64/python2.7/site-packages Environment=LD_LIBRARY_PATH=/opt/libjpeg-turbo/lib64/:/usr/local/cuda/lib64:/usr/lib64/xpra Environment=CUDA_VISIBLE_DEVICES=0
Everything is working right for me (I'll check with the guy having the pycuda issue on Monday). Looking through the patches, I can see they do what I was setting through env varriables. Thanks!
One more question: "xpra info" has a lot of numbers called "delay". How do I interpret them all?
client.batch.delay.50p=112 client.batch.delay.80p=932 client.batch.delay.90p=936 client.batch.delay.avg=412 client.batch.delay.cur=935 client.batch.delay.max=941 client.batch.delay.min=60 client.batch.locked=False client.batch.max-delay=500 client.batch.min-delay=16 client.batch.timeout-delay=15000
client.window.1.batch.actual_delays.90p=63 client.window.1.batch.actual_delays.avg=57 client.window.1.batch.actual_delays.cur=57 client.window.1.batch.actual_delays.max=106 client.window.1.batch.actual_delays.min=46
Is window.1.batch.actual_delays.cur the batch delay that is currently happening? If so, what about "client.batch.delay.cur"? Right now, one number looks okay the other not so good:
[nathan@bobross lsprepost4.7_centos7]$ xpra info :250 |grep -i delay |grep cur client.batch.delay.cur=934 client.window.1.batch.actual_delays.cur=80 client.window.1.batch.delay.cur=80 client.window.8.batch.actual_delays.cur=230 client.window.8.batch.delay.cur=230 [nathan@bobross lsprepost4.7_centos7]$
One more question: "xpra info" has a lot of numbers called "delay". How do I interpret them all?
That's coming from get_weighted_list_stats
in browser/xpra/trunk/src/xpra/simple_stats.py.
Is window.1.batch.actual_delays.cur the batch delay that is currently happening?
For this window, yes.
If so, what about "client.batch.delay.cur"?
That's the global value, which is only used when creating new windows.
Right now, one number looks okay the other not so good: (..)
client.window.8.batch.actual_delays.cur=230
Yes, that's too high.
Can you get the -d stats
log?
Bump.
Fixes to the batch delay changes in r21621. (found thanks to ticket:2140#comment:1) This may explain the high batch delay values. The new dynamic delay code was hiding that somewhat - doing its job, but masking unreasonable values.
I am far from certain, but I just tried a combination of operations over a 1gb LAN that felt slow in the previous revisions, but in this revision it seemed absolutely smooth. I'm going to make a much more extensive test and get back with you.
[nathan@curry lsprepost4.7_centos7]$ xpra info :250 | grep delay | grep cur client.batch.delay.cur=8 client.window.1.batch.actual_delays.cur=60 client.window.1.batch.delay.cur=3 client.window.15.batch.actual_delays.cur=150 client.window.15.batch.delay.cur=11 client.window.16.batch.actual_delays.cur=8 client.window.16.batch.delay.cur=8 [nathan@curry lsprepost4.7_centos7]$
Bump.
See also #2421.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2090