When we have multiple cards and/or multiple virtual cards (GRID K1, K2 and others) in the same server, we want to ensure that the load is fairly evenly distributed amongst all the
With CUDA, this isn't a problem. But with NVENC, we have no way of knowing how many contexts are still free. What happens when we reach the limit is that creating a new context will just fail... We cannot assume that we are the only user of the device on the system, especially with proxy encoding (#504) where each proxy instances runs in its own process space.
The code added in r5488 moves the CUDA device selection (amongst other things) to a utility module and uses the percentage of free memory to choose the device to use. Since there are normally up to 32 contexts per GPU, this should work as a cheap load balancing solution: even with 4
vGPUs per PCIE slot, things will even out before we reach 20% capacity. This won't take into account the size of the encoding contexts, but since we reserve large context buffers in all cases (see r5442 - done for supporting #410) and since the sizes should be randomly distributed anyway, this should not be too much of a problem.
We lower the NVENC codec score as we create more contexts, and we also keep track of context failures to lower the score further (taking into account how recent the failure was). This should ensure that as we get closer to the limit, we become less likely to try to use NVENC, or that when we do hit the hard limit, we have a gradual grace period until we try again.
What remains to be done:
(r5493 was missing from previous commits - oops)
smo: this is good enough for some testing... and I only have one card, so I cannot really test it very well.
Things to lookout for:
closing for now will reopen if there are issues
See also: new CUDA load balancing feature in #2416.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/520