Why not use GDI to repeatedly fill a window with RGB data from an array?

Asked 8/11, 2010 at 21:52 Answered 9/11, 2010 at 11:38

This is a follow-up to this question. I'm currently writing a simple game and am looking for the fastest way to (repeatedly) display an array of RGB data in a Win32 window, without flickering or other artifacts.

Several different approaches were recommended in the answers to the previous question, but there was no consensus on which would be the fastest. So, I threw together a test program. The code simply displays a framebuffer on the screen repeatedly, as fast as possible.

These are the results I obtained, for 32-bit data running in a 32-bit video mode - they may surprise some people:

- Direct3D (1):             500 fps
- Direct3D (2):             650 fps
- DirectDraw (3):          1100 fps
- DirectDraw (4):           800 fps
- GDI (SetDIBitsToDevice): 2000 fps

Given these figures:

Why are many people adamant that GDI is simply too slow for this operation?
Is there any reason to prefer DirectDraw or Direct3D over SetDIBitsToDevice?

Here is a brief summary of the calls made by each of the Direct* codepaths. If anyone knows a more efficient way to use DirectDraw/Direct3D, please comment.

1. CreateTexture(D3DUSAGE_DYNAMIC, D3DPOOL_DEFAULT);
       LockRect(); memcpy(); UnlockRect(); DrawPrimitive()

2. CreateTexture(0, D3DPOOL_SYSTEMMEM); CreateTexture(0, D3DPOOL_DEFAULT);
       LockRect(); memcpy(); UnlockRect(); UpdateTexture(); DrawPrimitive()

3. CreateSurface(); SetSurfaceDesc(lpSurface = &frameBuffer[0]);
       memcpy(); primarySurface->Blt();

4. CreateSurface();
       Lock(); memcpy(); Unlock(); primarySurface->Blt();

Floozy answered 8/11, 2010 at 21:52 Comment(5)

I'm not an expert in this field, but I would definitely run this experiment on several different types of graphics hardware before I - ahem - draw any conclusions. – Shirker 8/11, 2010 at 21:58

How are you composing the RGB data? Unless you need hardware acceleration to compose the image, I wouldn't expect DirectX to be faster. It needs to be sent to the video hardware one way or another to be displayed, even via GDI. With the directx approaches you are adding the extra work of creating a surface or texture. That GDI would necessarily be slower is a non sequitur. – Procryptic 8/11, 2010 at 22:9

The one thing that GDI does well, and has been optimized for for 20 odd years now - is moving bits around. There seems to be some kind of conventional "wisdom" nowadays that any API that is 20 years old must be obsolete in some way. – Reverse 9/11, 2010 at 7:53

Hi folks ! Paul, in the end what did you end up choosing and why ? I'm displaying and updating in real time a large array of RGB data representing the raw memory from a process and Direct2D does not seem to be appropriate for that kind of operation since I get a very low frame rate. – Shandashandee 5/3, 2012 at 23:54

FWIW, your DirectX test loops all have one huge flaw, that probably explains the performance problems. Understand that DirectX is optimized for SETTING STUFF UP ONCE, then REUSING over many frames, only changing what needs to be changed. Correct usage is to CreateSurface and CreateTexture ONCE, then RE-USE those entities over the frames. That is, those calls should NOT be inside your timing loop. – Excrescence 14/9, 2017 at 16:53

There are a couple of things to keep in mind here. First of all, a lot of "common knowledge" is based on some facts that no longer really apply.

In the days of AGP, when the CPU talked directly to the GPU, it always used the base PCI protocol, which happened at the "1x" rate (always and inevitably). AGX 2x/4x/8x only applied when the GPU was taking to the memory controller directly. In other words, depending on when you looked, it was up to 8 times as fast to have the GPU load a texture from memory as it was for the CPU to send the same data directly to the GPU. Of course, the CPU also had a great deal more bandwidth to memory than the PCI bus supported.

When things switched to PCI-E, however, that changed completely. While there can be differences in bandwidth depending on path, there's no general rule that memory->GPU will be faster than CPU->GPU. The one generalization that's (mostly) safe is that if you have a dedicated graphics card, then the GPU will almost always have more bandwidth to the memory on the graphics card than it does to main memory on the motherboard.

In your case, that doesn't matter much though -- you're talking about moving data from CPU space to GPU space regardless. The main speed difference with using DirectX (or OpenGL) happens when you keep all (or most) of the computation on the GPU, and avoid using the CPU (or main memory) at all. They don't (now that AGP is history) provide any substantial improvement in memory->display bandwidth.

Crane answered 8/11, 2010 at 22:25 Comment(3)

Do I understand you correctly: On an AGP machine, "CPU => RAM => GPU" may be up to 8 times faster than "CPU => GPU"? So there is a reason to prefer Direct* over GDI on these machines - since DirectDraw and Direct3D use the former method and GDI uses the latter? – Floozy 8/11, 2010 at 22:48

@Paul: Thinking about it, the multiplier might be more than 8x -- IIRC, 1x AGP was already faster than PCI. In any case, yes, that's the general idea (though I suppose at least with some drivers, GDI might have done the job via memory as well). – Crane 9/11, 2010 at 5:11

Btw the reason that AGP was so much faster than PCI (even at 1x) was because it operated at 66Mhz and not at PCI's 33Mhz giving double the bandwidth). So for 1x that gave a total of 266MB/sec (66Mhz * 32-bit). At 2x 532MB/sec (66Mhz * 2 * 32-bit), etc. There was a degree of protocol overhead so such numbers weren't completely attainable. PCI-Express 1x v1 ran at the same bandwidth as AGP 1x. But could be combined up to 16x giving a double performance bonus over AGP. Each version has double the base rate meaning PCIe 1x v3 is 4 times faster than AGP 1x... – Orthogenetic 10/11, 2010 at 18:30

Jerry Coffin makes some good points. The thing to bear in mind is what the DI stands for in SetDIBitsToDevice. It stands for Device Independent. Which means you were ALWAYS at the mercy of drivers. Some drivers used to be complete rubbish and it affected the performance massively. DirectDraw suffered from similar issues as well ... but you also had access to the hardware blitters so it was generally more useful. IHVs also tended to put more time in to writing proper drivers for DirectDraw because of its gaming association. Who wants to be the bottom of the performance pile when the hardware is quite capable of doing better?

These days many graphics cards can accept the bit data directly so no conversion happens. If it does need to be swizzled this is also INCREDIBLY quick in this day and age.

The reason your Direct3D performance is so terrible, by comparison, is that Direct3D, by nature of the fact it is meant to be used totally internally to the GPU, uses odd and complex formats to improve cache performance and so forth.

Couple that with the fact that you aren't testing like for like (with DDraw and D3D) by creating a texture/surface, locking it, copying, unlocking and then drawing over the back buffer (via various methods). To get best performance you'd be best off directly locking the backbuffer using a DISCARD lock then memcpy'ing directly into the returned buffer before unlocking. This will bring your performance much closer to the SetDIBitsToDevice. I still would expect D3D to be slower than DDraw, however, for the reasons outlined above.

Orthogenetic answered 9/11, 2010 at 11:38 Comment(6)

Thanks for your answer. Re "locking the backbuffer" - I know this can be done with D3D and I'll try adding support for this to my test program. Do you know if it is also possible to lock the backbuffer in (windowed) DDraw? – Floozy 9/11, 2010 at 12:34

Also, you say that in DirectDraw it is possible to get "access to the hardware blitters". Does this require any additional work beyond simply calling "IDirectDrawSurface7::Blt"? – Floozy 9/11, 2010 at 12:37

Yes it IS possible. If you go to www.trueharmoniccolours.co.uk (under Coding->DirectX) you can see some code DirectDraw I wrote 13 years ago. In that I directly lock the back buffer and then I write directly to the backbuffer (Note: these days I'm well ware that I am actually writing to uncached memory in that example so it is much slower than it needs to be). And yes that is exactly how you access the hardware blitters ... If they aren't available then it is software emulated. CPUs are so fast nowadays though that you'll likely see little difference between software and hardware blitting. – Orthogenetic 9/11, 2010 at 19:18

I was unable to get the samples from www.trueharmoniccolours.co.uk to run on my machine but, looking at the source, aren't they fullscreen-only? – Floozy 10/11, 2010 at 20:27

@Paul: Well it was written in DX6 .. so it would likely need a fair few modifications. You may be right on the full screen only .. its been a HELL of a long time since I wrote it ;) I don't see why it should matter though as you are always locking an off-screen (ie back) buffer aren't you? – Orthogenetic 10/11, 2010 at 21:14

In windowed DirectDraw, unlike in fullscreen, you can't control when buffer flips occur. Thus there's nothing to prevent the off-screen (back) buffer becoming an on-screen buffer at any time. Therefore, I think locking the back buffer is disallowed in windowed DirectDraw. It is possible to lock the primary surface and draw directly to that - but unfortunately this forces Windows Vista/7 to disable desktop composition. – Floozy 13/11, 2010 at 1:31

The reason you will hear people trounce on GDI is that it used to just be old windows API calls. The newer versions of it (that were called GDI+ when I last looked at em) are actually just an API placed on top of DirectX calls. So using GDI may seem fairly simple programming wise at times, but adding a layer between things always slows things down. As mentioned in the response from Jerry Coffin, your examples are about moving the data, and that is the slow time. I am a bit surprised that DirectX is that much slower though but I can not be much more help with out digging through the DirectX documentation (which has been pretty awesome for quite some time really.. Might want to check out www.codesampler.com. I have always found good starting places from him and actually, while I may be insane for saying this, I would swear the improvements to the DirectX SDK in doc and examples were done based on this guys work!)

As for the DirectDraw vs Direct3D (and not the GDI calls) discussion. I would say go to Direct3D. I believe DirectDraw has been deprecated since 8.0 or so, and 9.0 has been around for quite a long while. And at the end of the day all of DirectX is 3D, it just varies on the levels of helpful 2D apis that are around, but you may find you can do some very interesting things in a 2D environment when you are actually using 3D space. (I had a pretty neat randomly generated lightning weapon for a space invaders clone at one time :))

Anywho, hope this helped!

PS: It should be noted that DirectX is not always the fastest. For keyboard input (unless this has changed in 10 or 11) it has pretty much always been recommended to use the windows events.. as DirectInput was actually just a wrapper for that system!.. XInput however is -awesome-!!

Mastermind answered 9/11, 2010 at 6:59 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags