glColorMask and glDepthMask determine, which parts of the frame buffer are actually written to.
The idea of early Z culling is, to first render only the depth buffer part first -- the actual savings come from sorting the geometry near to far, so that the GPU can quickly discard occluded fragments. However while drawing the Z buffer you don't want to draw the color component: This allows you to switch of shaders, texturing, i.e. in short everything that's computationally intense.
A word of warning: Early Z only works with opaque geometry. Actually the whole depth buffer algorithm only works for opaque stuff. As soon as you're doing blending, you'll have to sort far to near and don't use depth buffering (search for "order independent transparency" for algorithms to overcome the associated problems).
S if you've got anything that's blended, remove it from the 'early Z' stage.
In the first pass you set
glDepthMask(1); // enable depth buffer writes
glColorMask(0,0,0); // disable color buffer writes
glDepthFunc(GL_LESS); // use normal depth oder testing
glEnable(GL_DEPTH_TEST); // and we want to perform depth tests
After the Z pass is done you change the settings a bit
glDepthMask(0); // don't write to the depth buffer
glColorMask(1,1,1); // now set the color component
glDepthFunc(GL_EQUAL); // only draw if the depth of the incoming fragment
// matches the depth already in the depth buffer
GL_LEQUAL does the job, too, but also lets fragments even closer than that in the depth buffer pass. But since no update of the depth buffer happens, anything between the origin and the stored depth will overwrite it, each time something is drawn there.
A slight change of the theme is using an 'early Z' populated depth buffer as a geometry buffer in multiple deferred shading passes afterwards.
To save further geometry, take a look into Occlusion Queries. With occlusion queries you ask the GPU how many, if any fragments pass all tests. This being a voxel engine you're probably using an octree or Kd tree. Drawing the spatial dividing faces (with glDepthMask(0), glColorMask(0,0,0)) of the tree's branches before traversing the branch tells you, if any geometry in that branch is visible at all. That combined with a near to far sorted traversal and a (coarse) frustum clipping on the tree will give you HUGE performance benefits.
world_display()
already use vertex arrays (VAs) or vertex buffer objects (VBOs) for geometry submission? – Pimp