OpenGL voxel engine slow

Asked 30/12, 2010 at 15:8 Answered 26/4, 2012 at 23:2

I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.

When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.

So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.

I've tried VBO's, I don't like them and they do not significantly increase the fps.

How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.

Any ideas?

Altar answered 30/12, 2010 at 15:8 Comment(7)

How are you generating your geometry? For instance, if you have a 3x3x3 box of cubes, do you generate/render each cube (including the invisible center cube), or do you analyze the connectivity and just generate triangles for the outer, visible surface? – Letha 30/12, 2010 at 16:36

possible duplicate of Culling techniques for rendering lots of cubes – Ethmoid 30/12, 2010 at 16:36

When you say you tried VBOs, did you have a single VBO containing a single cube that you glTranslate()d all over the place, or did you pack a whole bunch of cubes into one VBO? – Letha 30/12, 2010 at 16:37

I don't believe that Minecraft actually displays more than a couple thousand cubes at once, and most of the game, it could easily be 20-30. However, a relatively modern card should have no problem with far more than 12k cubes. – Polyvinyl 30/12, 2010 at 16:51

@genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d. – Altar 30/12, 2010 at 18:47

Use VBO's. They will help. – Arizona 23/1, 2014 at 3:9

I currently render 80000+ voxels at 50 fps using instanced rendering, VBO's and VAO's. It really does make a difference. Batch size is about 4096 instances at once. – Skull 17/9, 2014 at 11:30

You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.

I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.

Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.

gDEBugger is a great tool that will help you find bottlenecks with OpenGL.

Viceroy answered 30/12, 2010 at 15:24 Comment(1)

Thanks, will try profiling. My approach is very straight forward since I basically don't have any 3D experience what so ever... so algorithms? – Altar 30/12, 2010 at 15:26

@genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d

I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.

Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.

Landscapist answered 30/12, 2010 at 21:37 Comment(0)

I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.

Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.

gDEBugger is a great tool that will help you find bottlenecks with OpenGL.

Viceroy answered 30/12, 2010 at 15:24 Comment(1)

Thanks, will try profiling. My approach is very straight forward since I basically don't have any 3D experience what so ever... so algorithms? – Altar 30/12, 2010 at 15:26

I don't know if it's ok here to "bump" an old question but a few things came up my mind:

If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)

If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)

Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.

Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.

Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.

PS: sorry for my english, it's not the clearest....

Keefe answered 26/4, 2012 at 23:2 Comment(1)

I tried octrees successfully, but Minecraft has a point of doing "chunks" rather than octrees: you are basically on a plane and on average there are much more visible planes in the x-y direction than z, so a 2D mesh requires much less computation as it's less complicated to do a 2D visibility check. This is something really specific to this game though. – Altar 23/5, 2012 at 11:30

There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.

Ammonium answered 5/3, 2011 at 20:8 Comment(0)

Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?

Frisbee answered 30/12, 2010 at 15:17 Comment(3)

Yes, every face is in a display list and every cube holds the data on which faces to draw and the cube position (glTranslatef and then glCallList).Also the drawing part is skipped when the cube is not in view thanks to the octree. – Altar 30/12, 2010 at 15:21

I think the cube (and not the face) should be in the display list. – Frisbee 30/12, 2010 at 15:33

In this case I will draw all the faces no matter what, even the ones between 2 cubes (not seen), which takes even more resources. – Altar 30/12, 2010 at 15:47

Recommended topics

Hot tags