Why does stereo 3D rendering require software written especially for it?

Asked 2/10, 2012 at 18:31 Answered 4/10, 2012 at 5:15

Given a naive take on 3D graphics rendering it seems that stereo 3D rendering should be essentially transparent to the developer and be entirely a feature of the graphics hardware and drivers. Wherever an OpenGL window is displaying a scene, it takes the geometry, lighting, camera and texture etc. information to render a 2D image of the scene.

Adding stereo 3D to the scene seems to essentially imply using two laterally offset cameras where there was originally one, and all other scene variables stay the same. The only additional information then would be how far apart to make the cameras and how far out to to make their central rays converge. Given this it would seem trivial to take a GL command sequence and interleave the appropriate commands at driver level to drive a 3D rendering.

It seems though applications need to be specially written to make use of special 3D hardware architectures making it cumbersome and prohibitive to implement. Would we expect this to be the future of stereo 3D implementations or am I glossing over too many important details?

In my specific case we are using a .net OpenGL viewport control. I originally hoped that simply having stereo enabled hardware and drivers would be enough to enable stereo 3D.

Bottoms answered 2/10, 2012 at 18:31 Comment(5)

using two laterally offset cameras Laterally to what? The old camera? OpenGL has no distinction between camera and object transformation. In newer versions it does not care about those transformations at all because it is up to the user to handle them. How should this information be extracted? Also how do you handle off-screen buffers? Shadowmaps for example do not make sense in stereo. – Fregoso 2/10, 2012 at 18:38

But: developer.download.nvidia.com/whitepapers/2010/…

Using heuristics, the stereoscopic driver decides which objects need to be rendered per-eye and which do not, building the full left and right eye image in a manner that is transparent to the developer.

This is called Passive Stereoization. – Fregoso 2/10, 2012 at 18:51

@Nobody I would say lateral is the x axis of the screen, a perpendicular in space to the camera direction that aligns with horizontal in the 2D projection. Regarding off-screen buffers and shadow maps, these are the sorts of things I don't fully grasp nor understand the complexity of. Thanks for the paper also. – Bottoms 2/10, 2012 at 20:27

Actually I may have misunderstood your initial point, are you suggesting that OpenGL no longer has or never had cameras with locations, targets and rotation matrices? Surely it at least applies some perspective transformation to the geometry consistent with the camera view angle? I was under the impression OpenGL supported 3D vertices, 3D triangles and arbitrarily positioned cameras in the virtual 3D scene. – Bottoms 2/10, 2012 at 20:53

You should accept Bahbar's answer. To add, think of how even base GL 1.0 can be used to do perspective, orthographic or just plain 2d rendering. And mix all of those. How would you know what is what? – Contrition 4/10, 2012 at 5:33

Your assumptions are wrong. OpenGL does not "take geometry, lighting camera and texture information to render a 2D image". OpenGL takes commands to manipulate its state machine and commands to execute draw calls.

As Nobody mentions in his comment, the core profile does not even care about transformations at all. The only thing it really provides you with now is ways to provide arbitrary data to a vertex shader, and an arbitrary 3D cube to do rendering to. Wether that corresponds or not to the actual view, GL does not care, nor should it.

Mind you, some people have noticed that a driver can try to guess what's the view and what's not, and this is what the nvidia driver tries to do when doing automatic stereo rendering. This requires some specific guess-work, which amounts to actual analysis of game rendering to tweak the algorithms so that the driver guesses right. So it's typically a per-title, in-driver change. And some developers have noticed that the driver can guess wrong, and when that happens, it starts to get confusing. See some first-hand account of those questions.

I really recommend you read that presentation, because it makes some further points as to where the camera should be pointing towards (should the 2 view directions be parallel and such).

Also, It turns out that is essentially costs twice as much rendering for everything that is view dependent. Some developers (including, for example, the Crytek guys, see Part 2), figured out that to a great extent, you can do a single render, and fudge the picture with additional data to generate the left and right eye pictures. The amount of saved work here is worth a lot by itself, for the developer to do this themselves.

Caution answered 2/10, 2012 at 19:3 Comment(4)

I know literally it does not take these things and draw a picture but is instead an engine that takes a stream of commands. In fact I was shocked when I learned just how low-level OpenGL really was. I understand that there are plenty of graphics methods and effects that have nothing to do with 3D that belong exactly in a graphics library, but it still feels very rudimentary when it comes to geometry. – Bottoms 2/10, 2012 at 20:39

I guess my point comes back to identifying in the stream of commands those things related to drawing triangles from a camera location and intercepting them, modifying only the camera. I will read the presentation but it seems there would be one good answer for what to do with the cameras and that is converge them at the virtual screen plane, such that 2D elements appear at the 'neutral' offset. Thanks for the detailed answer though. – Bottoms 2/10, 2012 at 20:42

From the presentation they seem to be offering alternatives on convergence/parallel but with no obvious reason why. – Bottoms 3/10, 2012 at 2:59

Referring to your "modifying only the camera" comment above, the camera is essentially a matrix. In Core 4.4, MAX_VERTEX_UNIFORM_COMPONENTS must be a minimum of 1024, meaning minimum of 256 4 component vectors, meaning minimum of 64 mat4x4's, so the driver would need to pick a camera matrix out of at least 64 possible matrices. That's assuming they're passing it as a uniform matrix, although anything else really would be crazy. As you say OpenGL is low level because it is only intended to abstract away microarchitectural details of different GPUs, and provide a common interface to the hardware – Epstein 28/10, 2013 at 16:34

Stereo 3D rendering is unfortunately more complex than just adding a lateral camera offset.

You can create stereo 3D from an original 'mono' rendered frame and the depth buffer. Given the range of (real world) depths in the scene, the depth buffer for each value tells you how far away the corresponding pixel would be. Given a desired eye separation value, you can slide each pixel left or right depending on distance. But...

Do you want parallel axis stereo (offset asymmetrical frustums) or 'toe in' stereo where the two cameras eventually converge? If the latter, you will want to tweak the camera angles scene by scene to avoid 'reversing' bits of geometry beyond the convergence point.

For objects very close to the viewer, the left and right eyes see quite different images of the same object, even down to the left eye seeing one side of the object and the right eye the other side - but the mono view will have averaged these out to just the front. If you want an accurate stereo 3D image, it really does have to be rendered from different eye viewpoints. Does this matter? FPS shooter game, probably not. Human surgery training simulator, you bet it does.

Similar problem if the viewer tilts their head to one side, so one eye is higher than the other. Again, probably not important for a game, really important for the surgeon.

Oh, and do you have anti-aliasing or transparency in the scene? Now you've got a pixel which really represents two pixel values at different depths. Move an anti-aliased pixel sideways and it probably looks worse because the 'underneath' color has changed. Move a mostly-transparent pixel sideways and the rear pixel will be moving too far.

And what do you do with gunsight crosses and similar HUD elements? If they were drawn with depth buffer disabled, the depth buffer values might make them several hundred metres away.

Given all these potential problems, OpenGL sensibly does not try to say how stereo 3D rendering should be done. In my experience modifying an OpenGL program to render in stereo is much less effort than writing it in the first place.

Shameless self promotion: this might help http://cs.anu.edu.au/~Hugh.Fisher/3dteach/stereo3d-devel/index.html

Encumbrancer answered 4/10, 2012 at 5:15 Comment(1)

Thanks for the detailed answer Hugh, though I have always wondered how the mono+depth could work. In all cases the two eyes see a visual field containing information obscured by the other and that full double-rendering is the only answer. A lot of other things seem like they could be user set variables like eye spacing, ocular dominance, preference for neutral depth etc. and probably have some pretty meaningful defaults. Re head tilt, I suppose that is a problem best left to the next generation of tools such as head/eye tracking. – Bottoms 4/10, 2012 at 14:19

Recommended topics

Hot tags