xpra icon
Bug tracker and wiki

Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#146 closed defect (fixed)

Strengthen calls to avcodec_decode_video2

Reported by: ahuillet Owned by: Antoine Martin
Priority: major Milestone: 0.4
Component: client Version: 0.1.0
Keywords: Cc:

Description

Hello,

 * @warning The input buffer must be FF_INPUT_BUFFER_PADDING_SIZE larger than
 * the actual read bytes because some optimized bitstream readers read 32 or 64
 * bits at once and could read over the end.

We don't currently do that, and this could theoretically yield segfaults.

While we're at it, the input buffer should be aligned on a 32bit boundary, and it should end with a 0.

Attachments (5)

xpra-decompress-padded-nogil.patch (2.2 KB) - added by Antoine Martin 8 years ago.
copy the input buffer and pad it with zeroes, allows us to drop the gil
gil-vs-nogil-FPS.svg (43.9 KB) - added by Antoine Martin 8 years ago.
comparing FPS before and after patch
gil-vs-nogil-CPU.svg (48.4 KB) - added by Antoine Martin 8 years ago.
comparing client CPU usage before and after patch
gil-vs-nogil-client-latency.svg (40.4 KB) - added by Antoine Martin 8 years ago.
comparing client UI thread latency before and after patch
x264-gil-vs-nogil.csv (6.1 KB) - added by Antoine Martin 8 years ago.
sample from the test results in raw csv format

Download all attachments as: .zip

Change History (12)

Changed 8 years ago by Antoine Martin

copy the input buffer and pad it with zeroes, allows us to drop the gil

comment:1 Changed 8 years ago by Antoine Martin

Owner: changed from Antoine Martin to ahuillet
Status: newassigned

Please review the patch above (replacing 32 with 8... bits vs bytes, 64 vs 32!)

Looks fine to me, let's also try to get some numbers using the test script, I suspect this will improve parallelism which may be slightly detrimental to "client UI thread latency" (since this code runs in the UI thread)

Padding all pixel packets would be much more costly and impractical.

comment:2 Changed 8 years ago by Antoine Martin

We would need to use posix_memalign to ensure the memory is aligned:

 int posix_memalign (void **memptr, size_t alignment, size_t size)

What does it need to be aligned to? 32bit? alignment=4?

comment:3 Changed 8 years ago by Antoine Martin

Owner: changed from ahuillet to Antoine Martin
Status: assignedaccepted

from ahuillet on irc: "align to 64 bits, align to sizeof(void *) - that's the best"

Changed 8 years ago by Antoine Martin

Attachment: gil-vs-nogil-FPS.svg added

comparing FPS before and after patch

Changed 8 years ago by Antoine Martin

Attachment: gil-vs-nogil-CPU.svg added

comparing client CPU usage before and after patch

comment:4 Changed 8 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed

applied in r973

As can be seen here:
https://www.xpra.org/trac/raw-attachment/ticket/146/gil-vs-nogil-FPS.svg
We can push roughly the same number of frames per second, with the exception of the 'gtkperf' test (which is pathological) and 'eruption' (not sure why).

More importantly, we do not adversely affect CPU usage, which is mostly unchanged (user CPU + system CPU shown):
https://www.xpra.org/trac/raw-attachment/ticket/146/gil-vs-nogil-CPU.svg

Changed 8 years ago by Antoine Martin

comparing client UI thread latency before and after patch

comment:5 Changed 8 years ago by Antoine Martin

Also, since we don't hold the gil as much as before, the client UI thread latency is improved in most cases (except for gtkperf - which can be ignored as it creates many small and short-lived windows, and glxgears - unsure why):
https://www.xpra.org/trac/raw-attachment/ticket/146/gil-vs-nogil-client-latency.svg

Changed 8 years ago by Antoine Martin

Attachment: x264-gil-vs-nogil.csv added

sample from the test results in raw csv format

comment:6 Changed 8 years ago by Antoine Martin

Need to fix win32:

codec.obj : error LNK2019: \
    unresolved external symbol _posix_memalign referenced in function \
    ___pyx_pf_4xpra_4x264_5codec_7Decoder_6decompress_image_to_rgb
Last edited 8 years ago by Antoine Martin (previous) (diff)

comment:7 Changed 8 years ago by Antoine Martin

Here is the win32 equivallent called _aligned_malloc

Note: See TracTickets for help on using tickets.