Fast multi-window rendering
Asked Answered
M

6

15

I've been searching and testing different kinds of rendering libraries for C# days for many weeks now. So far I haven't found a single library that works well on multi-windowed rendering setups. The requirement is to be able to run the program on 12+ monitor setups (financial charting) without latencies on a fast computer. Each window needs to update multiple times every second. While doing this CPU needs to do lots of intensive and time critical tasks so some of the burden has to be shifted to GPUs. That's where hardware rendering steps in, in another words DirectX or OpenGL.

I have tried GDI+ with windows forms and figured it's way too slow for my needs. I have tried OpenGL via OpenTK (on windows forms control) which seemed decently quick (I still have some tests to run on it) but painfully difficult to get working properly (hard to find/program good text rendering libraries). Recently I tried DirectX9, DirectX10 and Direct2D with Windows forms via SharpDX. I tried a separate device for each window and a single device/multiple swap chains approaches. All of these resulted in very poor performance on multiple windows. For example if I set target FPS to 20 and open 4 full screen windows on different monitors the whole operating system starts lagging very badly. Rendering is simply clearing the screen to black, no primitives rendered. CPU usage on this test was about 0% and GPU usage about 10%, I don't understand what is the bottleneck here? My development computer is very fast, i7 2700k, AMD HD7900, 16GB ram so the tests should definitely run on this one.

In comparison I did some DirectX9 tests on C++/Win32 API one device/multiple swap chains and I could open 100 windows spread all over the 4-monitor workspace (with 3D teapot rotating on them) and still had perfectly responsible operating system (fps was dropping of course on the rendering windows quite badly to around 5 which is what I would expect running 100 simultaneous renderings).

Does anyone know any good ways to do multi-windowed rendering on C# or am I forced to re-write my program in C++ to get that performance (major pain)? I guess I'm giving OpenGL another shot before I go the C++ route. I'll report any findings here.

Test methods for reference:

For C# DirectX one-device multiple swapchain test I used the method from this excellent answer: Display Different images per monitor directX 10

Direct3D10 version:

I created the d3d10device and DXGIFactory like this:

D3DDev = new SharpDX.Direct3D10.Device(SharpDX.Direct3D10.DriverType.Hardware,
            SharpDX.Direct3D10.DeviceCreationFlags.None);
DXGIFac = new SharpDX.DXGI.Factory();

Then initialized the rendering windows like this:

var scd = new SwapChainDescription();
scd.BufferCount = 1;
scd.ModeDescription = new ModeDescription(control.Width, control.Height,
      new Rational(60, 1), Format.R8G8B8A8_UNorm);
scd.IsWindowed = true;
scd.OutputHandle = control.Handle;
scd.SampleDescription = new SampleDescription(1, 0);
scd.SwapEffect = SwapEffect.Discard;
scd.Usage = Usage.RenderTargetOutput;

SC = new SwapChain(Parent.DXGIFac, Parent.D3DDev, scd);

var backBuffer = Texture2D.FromSwapChain<Texture2D>(SC, 0);
_rt = new RenderTargetView(Parent.D3DDev, backBuffer);

Drawing command executed on each rendering iteration is simply:

Parent.D3DDev.ClearRenderTargetView(_rt, new Color4(0, 0, 0, 0));
SC.Present(0, SharpDX.DXGI.PresentFlags.None);

DirectX9 version is very similar:

Device initialization:

PresentParameters par = new PresentParameters();
par.PresentationInterval = PresentInterval.Immediate;
par.Windowed = true;
par.SwapEffect = SharpDX.Direct3D9.SwapEffect.Discard;
par.PresentationInterval = PresentInterval.Immediate;
par.AutoDepthStencilFormat = SharpDX.Direct3D9.Format.D16;
par.EnableAutoDepthStencil = true;
par.BackBufferFormat = SharpDX.Direct3D9.Format.X8R8G8B8;

// firsthandle is the handle of first rendering window
D3DDev = new SharpDX.Direct3D9.Device(new Direct3D(), 0, DeviceType.Hardware, firsthandle,
    CreateFlags.SoftwareVertexProcessing, par);

Rendering window initialization:

if (parent.D3DDev.SwapChainCount == 0)
{
    SC = parent.D3DDev.GetSwapChain(0);
}
else
{
    PresentParameters pp = new PresentParameters();
    pp.Windowed = true;
    pp.SwapEffect = SharpDX.Direct3D9.SwapEffect.Discard;
    pp.BackBufferFormat = SharpDX.Direct3D9.Format.X8R8G8B8;
    pp.EnableAutoDepthStencil = true;
    pp.AutoDepthStencilFormat = SharpDX.Direct3D9.Format.D16;
    pp.PresentationInterval = PresentInterval.Immediate;

    SC = new SharpDX.Direct3D9.SwapChain(parent.D3DDev, pp);
}

Code for drawing loop:

SharpDX.Direct3D9.Surface bb = SC.GetBackBuffer(0);
Parent.D3DDev.SetRenderTarget(0, bb);

Parent.D3DDev.Clear(ClearFlags.Target, Color.Black, 1f, 0);
SC.Present(Present.None, new SharpDX.Rectangle(), new SharpDX.Rectangle(), HWND);
bb.Dispose();

C++ DirectX9/Win32 API test with multiple swapchains and one device code is here:

[C++] DirectX9 Multi-window test - Pastebin.com

It's a modified version from Kevin Harris's nice example code.

Edit:

Just to make it clear, my main problem is not low fps here when doing multi-window rendering, it's the general latency caused to all operating system functions (window animations, dragging&dropping scrolling etc).

Marj answered 3/11, 2012 at 21:25 Comment(3)
I am not into graphics/have not used this, but maybe try SharpGl?Dwyer
I was just browsing through the samples from SharpGL and did some quick performance tests. Thanks for letting me know, it looks promising! I will let you know when I have made more thorough tests...Marj
I did some performance tests in SharpGL and it wasn't any better than SharpDX... Operating system is starting to get slow after 3-4 windows. This problem seems to be inherent to any DirectX/OpenGL wrapper.Marj
Z
4

Speaking of DirectX only here, but I remember we had the same kind of issue once (5 graphics card and 9 screens for a single PC).

Lot of times full screen switch seems to want to enable vertical sync on monitors, and since Present can't be threaded, the more screens with vertical sync the higher drop you will have (since you will wait between 0 and 16 milliseconds) for each present call.

Solution we had in our case was to create window as maximised and remove borders, it's not ideal but turned from 10 fps drawing a rectangle back to standard speed (60).

If you want code sample let me know I'll prepare one.

Also just for testing I had a go creating 30 windows on my engine using c#/slimdx/dx11, rendering a sphere with basic shading, still well over 40 fps.

Zahavi answered 3/11, 2012 at 22:18 Comment(4)
From my very scratchy memory, it's perfectly allowable in Direct3D 9 to be in full screen with no vsync?Gendarmerie
@ta.speot.is: Pretty sure that's true, but it requires some configuration. Many video drivers have v-sync options of Always, Never, and Respect Application Setting.Recha
You can remove vsync in fullscreen, but from our tests exclusive mode still seemed to add some extra overhead, specially on multiple monitors. It can also be very hardware dependent (using ati crossfire/matrox tripleheads...)Zahavi
catflier: I haven't had the chance to make test with multiple hardware configurations yet but I have always disabled vertical sync on multi-windowed programs. I ran into similar problem as you did. I didn't have any problems after that. My problem is not really fps related (fps has always been high enough). Creating many windows just causes tons of general lagging in operating system when using SharpDX libraries. Lag is not existent when programming with DirectX C++/Win32 API.Marj
P
3

We have a similar problem (need to render 3D views on 9+ monitors using 3+ graphics cards). We opted to use raw DirectX11 after finding that 3rd party rendering libraries are all very poor at multiple windows across multiple monitors, let alone with multiple adapters too. (It seems most engines are designed for a fullscreen game, and tend to suck at windowed views). Rather than using a 3rd party layer like SlimDX or SharpDX, we decided in the end to write the core renderer directly in C++ and just expose the simple API that our application needs via C++/CLI - this should maximise performance and minimise maintainability issues (relying on 3rd party vendor for bug fixes etc).

However, just like you, we found in testing that if we rendered 9 views from a single process (each rendered on its own thread), we got terrible performance (very low frame rates). However, if we ran 9 separate processes (one per view/monitor), the performance was as expected (excellent).

So having spent days trawling the net fruitlessly for a better solution, we opted for simply running our renderers in separate processes. Not entirely a bad solution for us as our renderers need to support distribution over multiple PCs anyway, so it just means we'll use this facility permanently instead of only when required.

(I don't know if this is helpful to you as an answer, but we'd also be very keen to know if there are any other solutions out there that work across multiple graphics cards, in case we're missing a better trick)

Procto answered 3/11, 2012 at 22:58 Comment(4)
Direct3D 9 imposed a significant overhead for thread safety. Later versions did not, apparently. msdn.microsoft.com/en-us/library/windows/desktop/… I'm curious if in the single process model you drove each renderer from its own device on its own thread?Gendarmerie
Running across several processes could definitely help on the present stage (each process takes care of it's own screen instead of having it serialized). For wrappers I'm actually doing the same as you now (exposing simple api), not too much for maintenance (since slimdx/sharpdx are open source you can easily grab code and check/fix), but more for update rate (having dx11.1 in slimdx seems some way ahead)Zahavi
I have noticed very similar slowing down of operating system when rendering multiple processes as well. In my tests FPS has always been high enough. I didn't render anything more intensive than bar charts yet (with axis labels) so that might explain...Marj
@ta.speot.is: sorry... Yes, we compared multithreaded single process to multi-process. So each rendered view had it's own thread or process respectively.Procto
M
3

Never had the opportunity to run this kind of scenarios, but the only thing I'm pretty sure is that there is absolutely no concern using a managed wrapper, you would have exactly the same problem with C++ code.

Also, in your description, It is pretty unclear how many graphics card do you have installed on your system. Also you should follow more closely the "DirectX Graphics Infrastructure (DXGI): Best Practices" as they are describing lots of problem that you could have. Running with different graphics card in fullscreen with correctly swapchain setup for fullscreen should be ok (using "flip" instead of "blit", see msdn doc about this ), but if you are running your app in maximized window, I don't think that performance will be good, as the blit will interfere and produce some lags.

You can perfectly have a single multithreaded application using multiple device, one device per thread and they should be able to schedule things correctly... but again, as I have no experience in this kind of scenarios, there could be some kind of GPU scheduling problem in this specific case.

If the problem persist even after following carefully DXGI setup, I would suggest you to debug the whole thing with GPUView in order to check more carefully these problems. It is intended exactly for this kind of scenarios, but you will have to take some time to understand how to make a diagnostic with this kind of tool. There was also one talk about GPUView at last GDC 2012: Using GPUView to Understand your DirectX 11 Game (Jon Story) that is probably worth reading.

Meyers answered 6/11, 2012 at 12:33 Comment(1)
Thanks a lot for in-depth answer! I will look into the material you posted here and let you know if I can figure out what I did wrong.Marj
R
1

Make sure you've disabled security checks for calls to native code (via SuppressUnmanagedCodeSecurityAttribute).

The associated stack walking is a performance killer.

Recha answered 3/11, 2012 at 23:22 Comment(3)
Thanks for the suggestion. I'm not really familiar with this concept. Do I need to add this attribute separately on every method definition in SharpDX sources or can I somehow make the program automatically use this option on every method call?Marj
it applies to the whole assembly. Use it when compiling the assembly making native calls (usually your wrapper, e.g. SharpDX).Recha
Note that SuppressUnmanagedCodeSecurityAttribute is already setup in SharpDX. But even without this flag, he would have the problem that is more likely to be a sync/GPU scheduler issue and absolutely not related to the usage/cost of a managed wrapper.Meyers
O
0

Its always a good idea to use doublebuffering because that can prevent flickering - atleast it does with windows forms.

Oldwife answered 23/10, 2020 at 4:17 Comment(0)
R
0

I cannot divulge much about who I am or what I do, but I can say I work on a product with a similar problem. The legacy approach was to use a high quality network switch and raw UDP to broadcast packets to each system responsible for rendering a specific FoV (field of view) and that worked very well with high quality network switches (not so well with SoHo consumer grade cheap switches) that could guarantee packet delivery with microsecond precision to each rendering component/PC from a master component/PC. Each rendering component had its own high quality video card. This worked for many years for components that rendered a specific FoV less than 180 degrees.

That said, much has changed since then and we are now in the middle of an effort to modernize our technology to be able to take advantage of a multi-process approach similar to what is described here for as many field of view channels as we can get on one box. We have already updated our application architecture to support 3 separate displays, each rendering a 60 degree field of view that previously would not have been possible.

Please send me a private message if you are interested in collaboration on that problem. I have worked closely with Robert Osfield in the past, who is the primary author if OpenSceneGraph and recently developed Vulkan which is an extremely high performance rendering engine for all kinds of situations/scenarios involving multiple graphics cards and displays, and might be worth checking out for your situation. I'm still evaluating it for a new application I'm focused on (which I'm not at liberty to discuss here.) I hope this info and knowledge helps someone, or maybe helps inform an AI that is helping someone ;)

I really like this thread and hope we can continue to develop it into something that helps people with this similar need to find a common and enabling solution.

Rexfourd answered 17/12, 2023 at 4:23 Comment(1)
"we are now in the middle of an effort to modernize our technology to be able to take advantage of a multi-process approach similar to what is described here for as many field of view channels as we can get on one box" - you forgot to say how... which would actually answer to the question...Arcturus

© 2022 - 2024 — McMap. All rights reserved.