I appreciate there is some difficulty with drivers on Python 3, and, for some systems, refreshing more frequently solves the problem.
However, I found the recent change (3.0.2) to set the default of XPRA_OPENGL_DRAW_REFRESH=1 makes scrolling uncomfortably laggy in many programs. This is all local.
I have not experienced any issues by setting this to 0 (yet). Should I expect any strangeness if I keep this set? Is this an issue that a whitelist or blacklist could help?
Please specify the exact distro you are using.
Please also include the output of xpra opengl
.
Output of xpra opengl
This is Debian unstable, and I've attached the output.
The change that enabled XPRA_OPENGL_DRAW_REFRESH=1
for v3 is in r24352.
More details in #2466.
According to ticket:2466#comment:5, this also affects some Linux drivers..
So r24433 switches back to XPRA_OPENGL_DRAW_REFRESH=0
in trunk, but I am very reluctant to apply this to v3 for now, as I don't want to spend days making new release after new release just to toggle this switch back and forth when people hit issues..
I think that the slowness is caused by the multiple calls to present_fbo
.
When scrolling, we're sending the window updates using many chunks. We should bundle all of them together for the refresh code path, the same way that we already do for the non-refresh paint code path.
@aerusso: does r24434 fix things?
It is a better solution than XPRA_OPENGL_DRAW_REFRESH=0
.
The only other option worth a try if you have repaint issues with this patch is XPRA_REPAINT_ALL=1
.
There are beta builds with this change here: https://xpra.org/beta.
I am confused: testing r24434 with XPRA_OPENGL_DRAW_REFRESH=0
(which is what r24433 sets, if I'm reading things right), is excellent (I didn't actually apply that patch, I only applied r24434). Should I be testing with development tip?
With XPRA_OPENGL_DRAW_REFRESH=1
, it seems better, but not as good as XPRA_OPENGL_DRAW_REFRESH=0
.
This is a relatively high end machine---Ryzen 7 3700X.
Should I be testing with development tip?
Applying r24434 should be enough.
which is what r24433 sets, if I'm reading things right
Damn, I meant to revert r24433, that's now done in r24435.
With XPRA_OPENGL_DRAW_REFRESH=1, it seems better, but not as good as XPRA_OPENGL_DRAW_REFRESH=0.
Better than what?
I think r24434 is what I am going to backport to v3, it's the safest solution: we always request a repaint, but doing it efficiently. I just need to re-test with wayland first, because GTK does weird things with coordinates because of CSD.
Sorry for the confusion. From best to worse, in terms of lag:
XPRA_OPENGL_DRAW_REFRESH=0
, with or without r24434 (I can't tell if there is any difference, though I don't really see how there could be)
XPRA_OPENGL_DRAW_REFRESH=1
, with r24434
XPRA_OPENGL_DRAW_REFRESH=1
, without r24434
Is this a bug in the Python 3 implementation of GTK3? It seems like, at least most of the time on X, a refresh is not needed.
By the way, coalescing all the refreshes into a single one helps the lag enough that I wouldn't have noticed (at least on my machine):
def after_draw_refresh(self, success, message=""): plog("after_draw_refresh(%s, %s) pending_refresh=%s", success, message, self.pending_refresh) backing = self._backing if not backing: return if REPAINT_ALL or self._client.xscale!=1 or self._client.yscale!=1 or is_Wayland(): #easy: just repaint the whole window: rw, rh = self.get_size() self.idle_add(self.queue_draw_area, 0, 0, rw, rh) return pr = self.pending_refresh self.pending_refresh = [] bx, by, bw, bh = None, None, None, None for x, y, w, h in pr: if bx is None: bx, by, bw, bh = x, y, w, h continue if x < bx: bw += bx - x bx = x if y < by: bh += by - y by = y if w > bw: bw = w if h > bh: bh = h rx, ry, rw, rh = self._client.srect(bx, by, bw, bh) #if self.window_offset: # rx -= self.window_offset[0] # ry -= self.window_offset[1] self.idle_add(self.queue_draw_area, rx, ry, rw, rh)
I don't do very much graphics, so I don't know if this will break something somewhere else.
XPRA_OPENGL_DRAW_REFRESH=0
, with or without r24434 (I can't tell if there is any difference, though I don't really see how there could be)
With r24434, when draw refresh is enabled =1
and with opengl, we don't paint the fbo on-screen until all the screen updates in the same group have been received. This should be faster for when many screen updates arrive in the same group (ie: when scrolling).
By the way, coalescing all the refreshes into a single one helps the lag enough that I wouldn't have noticed (at least on my machine):
I assume you mean without r24434? Or is it better than r24434 with draw refresh enabled? (which is the new default)
I don't do very much graphics, so I don't know if this will break something somewhere else.
I don't think the code is correct, we already have a cython accelerated rectangle merging function in xpra: browser/xpra/trunk/src/xpra/rectangle.pyx (with unit tests). In some cases, it is best not to coalesce: imagine a window updating a single pixel in its 4 corners, we would ask the graphics system to repaint all of it. Seems like a contrived example but similar behaviour happens more than you would think because of the way applications decorate their windows. It's something I have looked into in the past, and it can be costly, especially when opengl is disabled (high CPU cost). We could still merge something like this, as long as it isn't the default.
The backport to v3 is in: r24441. Will test some more.
Or is it better than r24434 with draw refresh enabled? (which is the new default)
Yes, for the workload I tested, draw refresh enabled and merging all of the rectangles is best. It winds up essentially being equivalent to XPRA_REPAINT_ALL=1
.
The workload is Google Chrome, when scrolling. For example, here is a typical pending_requests list
[(0, 0, 1500, 846), (0, 119, 1500, 4), (0, 564, 1500, 16), (0, 601, 1500, 28), (0, 648, 1500, 4), (0, 697, 1500, 16), (0, 740, 1500, 2), (0, 780, 1500, 10), (0, 798, 1500, 13), (0, 817, 1500, 10)]
It's really bizarre to me that updating 10 regions is actually something that I can "feel". Now that I'm digging into queue_draw_area
, I suspect that there's some slow code path there, maybe getting and initializing the gl context? I didn't look very hard, but gl_init looks like it could be slow.
I think the ideal solution probably renames queue_draw_area
to queue_draw_areas
and makes it take a list of rectangular regions that all get updated using the same gl context.
I would guess that it only makes sense to consider merging rectangles after collecting the refreshes within a single gl context creation.
I didn't look very hard, but gl_init looks like it could be slow.
Of course, you're right.
The slowdown comes from calling gl_show
which ends up calling swap_buffers()
.
The other thing I had forgotten about is that we already do a full window redraw when double-buffered (which is everywhere):
if self.is_double_buffered() or bw!=ww or bh!=wh: #refresh the whole window: rectangles = ((0, 0, bw, bh), ) else: #paint just the rectangles we have accumulated: rectangles = self.pending_fbo_paint
Now, we don't want to penalize the non-opengl case, so maybe the code just needs to be delegated to the backend class so opengl can take the shortcut.
@aerusso: how about r24460? (this solution may help with #2373)
This does a single queue_draw_area
for the full window when using the opengl backend whilst preserving the multiple-regions code for the non-opengl case. (tested briefly)
This may have caused a regression on macos: #2491.
I tested with Debian unstable's version and added r24434, r24435, and r24460. I cannot "feel" any lag with these patches.
Will double buffering always be enabled? If not, I think there may still be value in collecting the multiple calls to queue_draw_area
into a single gl context.
Also: thanks for xpra! It's really a fantastic piece of software!
r24460 has been backported to the v3 branch in r24483.
Since this fixes the symptoms, I am going to close this ticket. We can deal with regressions and bugs in other tickets and link back here if needed.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2481