How to efficiently draw framebuffer content?

G

3

9

I need to display RAM-based framebuffer for a virtual GPU device that doesn't have real display connected to it. What I have is mmap'ed chunk of memory after DRM_IOCTL_MODE_MAP_DUMB in RGB32 format. Currently I'm using MIT-SHM shared pixmap created via XShmCreatePixmap() like this:

shminfo.shmid = shmget(IPC_PRIVATE, bytes, IPC_CREAT|0777);
shminfo.readOnly = False;
shminfo.shmaddr = shmat(shminfo.shmid, 0, 0);
shmctl(shminfo.shmid, IPC_RMID, 0); 
XShmAttach(dpy, &shminfo);
XShmCreatePixmap(dpy, window, shminfo.shmaddr, &shminfo, width, height, 24);

and then simply

while (1) {
    struct timespec ts = {0, 999999999L / 30};

    nanosleep(&ts, NULL);

    memcpy(shminfo.shmaddr, mem, bytes);
    XCopyArea(dpy, pixmap, window, gc, 0, 0, width, height, 0, 0);
    XFlush(dpy);
}

So it loops 30 times per second, doing memcpy followed XCopyArea. The problem is that it uses a lot of CPU: 50% on a powerful machine. Is there any better way? I could think of two possible improvements:

Get rid of memcpy and just pass mmap'ed memory to MIT-SHM but it looks like MIT-SHM API doesn't support this.
Get some kind of 'content changed' notification to get rid of dumb sleeping (but I haven't found anything appropriate).

Any ideas?

Update: Bottleneck is 'memcpy', if removed CPU usage becomes negligible. The problem seems to be that there's no way to share previously mmap'ed memory (if I understood API correctly) so I'm forced to copy whole buffer every time. I've also tried glDrawPixels() and SDL surfaces, both appeared to be even slower than MIT-SHM.

Update: turns out that MIT-SHM isn't well suited for a task like this. It's main purpose is creating buffer and writing (rendering) to it w/o overhead of X IPC. I don't need to write anything but just "forward" existing buffer to X. In this scenario there's no performance difference between shared pixmaps, shared images and regular X images (XCreateImage).

Conclusion: so far I haven't found API that allows rendering existing buffers w/o copying data around every time.

Geraldo answered 26/5, 2014 at 15:13 Comment(13)

A minor note, not related to the actual problem: your code assumes that the copying doesn't take any time, since it uses a fixed delay to achieve a fixed frame rate. That's not right, you need to measure the time since the last iteration, and adjust the delay on the fly. – Krummhorn 26/5, 2014 at 15:22

Out of curiosity: is there any performance difference between XShmCreatePixmap()/XCopyArea() and XShmCreateImage()/XShmPutImage() ? – Purington 30/5, 2014 at 10:49

Havne't tried XShmCreateImage(), but at first I tried using Qt's QImage and QPixmap. Turned out that QImage -> QPixmap conversion is very slow, CPU usage was 100%. – Geraldo 30/5, 2014 at 12:32

Have you try it commenting out stuff to see where is the bottleneck (memcpy or XCopyArea?). Also there is an extra parenthesis at the end of nanosleep and I wonder why the timespec declaration and initialization is not out of the loop. – Jesusa 4/6, 2014 at 13:6

memcpy is the bottleneck, if commented out CPU usage becomes negligible. Regarding extra parenthesis -- I removed error checking for brevity here w/o re-compilation. Also agree regarding initialization, although it doesn't change performance. – Geraldo 4/6, 2014 at 16:47

BTW, why are you using MIT-SHM instead of plain XCreateImage()/XPutImage() ? – Purington 4/6, 2014 at 20:36

I have two questions : is the source buffer updated automatically (not by you) and can you read/write in the source? – Jesusa 5/6, 2014 at 2:45

Buffer is updated automatically. Not sure regarding write access, need to check. But even if I can write to buffer any changes will be overwritten by the next update. – Geraldo 5/6, 2014 at 5:39

One more question : is the source buffer rewriten completely when updated or just parts of it? – Jesusa 5/6, 2014 at 10:35

Not fully sure but I think parts of it, however I have no way to know what exactly was updated. – Geraldo 5/6, 2014 at 11:0

2ninjalj: regarding XShmCreateImage()/XShmPutImage()/XCreateImage()/XPutImage(): turns out that those work as well and have similar performance. – Geraldo 6/6, 2014 at 8:54

I'm an X n00b so this may be a stupid question, but why do you write into mem and then copy to shminfo.shmaddr? Why not write directly into shminfo.shmaddr to avoid the memcpy altogether? – Guck 16/8, 2014 at 19:55

The problem is that I don't need to write anything explicitly (if I did, I would indeed simply write to shmaddr). Instead, I need to somehow say to X: "please take a picture from that chunk of memory" and I haven't found a way to do that. – Geraldo 28/10, 2014 at 14:36

G

1

For X11 use XShmCreateImage, write to XImage.data and make visible with XShmPutImage making sure to pass False for send_event parameter. You may also want to disable exposure events for the current GC; setting PointerMotionHintMask can also help.

SDL1 does most of the above but will use a shadow surface if there is a mismatch between user and display format and may perform unexpected color conversion. SDL2 tries to use hardware acceleration and may perform unexpected scaling and/or filtering. Make sure you're getting what you ask for to avoid hidden ops.

%50 cpu usage sounds like a lot for this blit at 30fps, I'd rewrite the sleep function as follows just in case.

do
    errno = 0;
while ( nanosleep(&ts, &ts) && errno == EINTR );

Griseofulvin answered 25/1, 2015 at 12:38 Comment(0)

A

1

This may be an inherently expensive operation -- you need to move 240 MB/s from program (system) memory into the video card's (device) onboard frame buffer. Not only must that be physically copied, it must cross a device bus. Main memory copy speeds are up in the GB/sec, but device buses are relatively much slower.

Unless you're using a low-end video chip that uses the system memory for its frame buffer... ironically, that might be faster for this case.

Can you make the virtual display smaller?

Acrodont answered 30/11, 2014 at 0:42 Comment(1)

It was needed for a testing tool whose solo purpose is to see what is actually contained in the framebuffer so making display smaller would defeat it's purpose to some extent. But since it's just an internal testing tool CPU usage is not really critical. – Geraldo 30/11, 2014 at 10:15

G

1

For X11 use XShmCreateImage, write to XImage.data and make visible with XShmPutImage making sure to pass False for send_event parameter. You may also want to disable exposure events for the current GC; setting PointerMotionHintMask can also help.

SDL1 does most of the above but will use a shadow surface if there is a mismatch between user and display format and may perform unexpected color conversion. SDL2 tries to use hardware acceleration and may perform unexpected scaling and/or filtering. Make sure you're getting what you ask for to avoid hidden ops.

%50 cpu usage sounds like a lot for this blit at 30fps, I'd rewrite the sleep function as follows just in case.

do
    errno = 0;
while ( nanosleep(&ts, &ts) && errno == EINTR );

Griseofulvin answered 25/1, 2015 at 12:38 Comment(0)

J

0

I am not a graphic, Linux or optimization expert but I think this solution should work if the source is completely redrawn when updated.

Problem is you need to copy a frame buffer as soon as it is updated. Frame buffer is large (1920x1080x4 bytes) and you want to check every 1/30 seconds if it is updated.

I suggest writing a flag in the source buffer and to check every 1/30 seconds if the flag is still there. If it is not, then source changed and you need to recopy to destination and re set the flag.

As a flag you can use a single pixel (white pixel in a corner), or you can hide the flag in many pixels (like an hidden message in BMP). An other idea would be using the fourth byte of any pixel RGB value if the source is true color and the fourth byte is only used for memory alignment purpose.

Jesusa answered 5/6, 2014 at 11:4 Comment(1)

It seems like I can't get write access to the existing buffer, but thank you for suggestion. – Geraldo 6/6, 2014 at 9:5

Recommended topics

Hot tags