#1851 closed task (fixed)
tune vpx threading
Reported by: | Antoine Martin | Owned by: | Antoine Martin |
---|---|---|---|
Priority: | major | Milestone: | 2.4 |
Component: | encodings | Version: | 2.3.x |
Keywords: | Cc: |
Description (last modified by )
Same as #1840 but for libvpx.
Some links:
- VP9 Video Encoder with Faster Turnaround: .. provides more details about the efficient multi-threading implementation that has enabled a 50-70% improvement in speed with no loss in quality
- Notes on encoding settings: Threads autodetection doesn't work with libvpx so you need to manually raise -threads value
- VP9 Encoder Gets Better Multi-Threading Performance: This has allowed over a 100% speed improvement for 720p/1080p videos with using more than four threads
It looks like part of the reason why vp8 and vp9 are now faster, and why I chose to use vpx more (see ticket:832#comment:22) is that the threading improvements make it faster.
This does mean that reducing the threading might reduce the performance too much.
Attachments (1)
Change History (9)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
I've set up a quick script that should run a series of three tests runs with XPRA_VPX_THREADS
set to 1, 2, and 4. For reference the test box is an 8-core system. I'm more curious to see how much of an impact it has on more low-end machines so I'm going to update one of my low-end test boxes and run the tests again on there.
comment:3 Changed 4 years ago by
Description: | modified (diff) |
---|
comment:4 Changed 3 years ago by
Owner: | changed from J. Max Mena to Jonathan Anthony |
---|---|
Status: | assigned → new |
comment:5 Changed 3 years ago by
Owner: | changed from Jonathan Anthony to Smo |
---|
comment:6 Changed 3 years ago by
Owner: | changed from Smo to Antoine Martin |
---|
I've attached some test data and charts.
The data seems to show that more threads is better.
Can you check these over and let me know if any other action is required.
comment:7 Changed 3 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Interesting data:
- we encode more pixels per second with more threads but when it comes to actually sending ("pixels sent"), the benefits are much lower as other costs come into play (and maybe we're hitting a performance ceiling?)
- there seems to be a sweet spot with 2 threads, at least for the batch delay and damage latency
- going up to 4 threads doesn't gain much (ie: marginal improvement in damage latency and pixels sent per second) - I suspect that this may vary with bigger picture sizes
- decoding takes a little bit longer with more threads - which is fine, we're almost never bound by the client's decoding speed
- 4 threads uses quite a bit more server side memory
So, r23474 makes us use fewer threads by default (was number-of-cpus - 1):
>>> import math >>> for i in range(8): ... print("%-3i: %2i" % (2**i, math.sqrt(2**i+1))) ... 1 : 1 2 : 1 4 : 2 8 : 3 16 : 4 32 : 5 64 : 8 128: 11
This can still be overriden using the env var XPRA_VPX_THREADS=
comment:8 Changed 17 months ago by
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1851
You can choose the maximum number of threads with:
We want to see how this affects frame latency, bandwidth, CPU load, etc.
Unlike x264, it looks like we don't have a lot of room for manoeuver here.
(the current value is "number-of-cpus" minus 1)
Maybe this should be capped at 2 threads.