OpenGL performance for 10,000 static cubes

Asked 27/3, 2012 at 4:20 Answered 13/6, 2012 at 12:2

I'm running the following Scala code. It compiles a single display list of 10,000 cubes. Then it displays them in the display loop with an animator that runs as fast as it can. But the FPS is only around 20. I had thought that using display lists would be able to handle this very quickly. I have a situation where I need to be able to display 10k-100k's of objects. Is there a better way to do so? In the display loop, pretty much all it does is call gluLookAt and glCallList (it's the last method).

I'm using JOGL 2.0-rc5 from jogamp.org which says it supports "OpenGL 1.3 - 3.0, 3.1 - 3.3, ≥ 4.0, ES 1.x and ES 2.x + nearly all vendor extensions"

class LotsOfCubes extends GLEventListener {
  def show() = {
    val glp = GLProfile.getDefault();
    val caps = new GLCapabilities(glp);
    val canvas = new GLCanvas(caps);
    canvas.addGLEventListener(this);

    val frame = new JFrame("AWT Window Test");
    frame.setSize(300, 300);
    frame.add(canvas);
    frame.setVisible(true);
  }

  override def init(drawable: GLAutoDrawable) {
    val gl = drawable.getGL().getGL2()
    gl.glEnable(GL.GL_DEPTH_TEST)

    gl.glNewList(21, GL2.GL_COMPILE)
    var i = -10.0f
    var j = -10.0f
    while (i < 10.0f) {
      while (j < 10.0f) {
        drawItem(gl, i, j, 0.0f, 0.08f)
        j += 0.1f
      }
      i += 0.1f
      j = -10f
    }
    gl.glEndList()

    val an = new Animator(drawable);
    drawable.setAnimator(an);
    an.setUpdateFPSFrames(100, System.out)
    an.start();
  }

  override def dispose(drawable: GLAutoDrawable) {
  }

  override def reshape(drawable: GLAutoDrawable, x: Int, y: Int, width: Int, height: Int) {
    val gl = drawable.getGL().getGL2();
    val glu = new GLU
    gl.glMatrixMode(GLMatrixFunc.GL_PROJECTION);
    gl.glLoadIdentity();
    glu.gluPerspective(10, 1, -1, 100);
    gl.glViewport(0, 0, width, height);
    gl.glMatrixMode(GLMatrixFunc.GL_MODELVIEW);
  }

  def drawBox(gl: GL2, size: Float) {
    import Global._
    gl.glBegin(GL2.GL_QUADS);
    for (i <- 5 until -1 by -1) {
      gl.glNormal3fv(boxNormals(i), 0);
      val c = colors(i);
      gl.glColor3f(c(0), c(1), c(2))
      var vt: Array[Float] = boxVertices(boxFaces(i)(0))
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
      vt = boxVertices(boxFaces(i)(1));
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
      vt = boxVertices(boxFaces(i)(2));
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
      vt = boxVertices(boxFaces(i)(3));
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
    }
    gl.glEnd();
  }

  def drawItem(gl: GL2, x: Float, y: Float, z: Float, size: Float) {
    gl.glPushMatrix()
    gl.glTranslatef(x, y, z);
    gl.glRotatef(0.0f, 0.0f, 1.0f, 0.0f); // Rotate The cube around the Y axis
    gl.glRotatef(0.0f, 1.0f, 1.0f, 1.0f);
    drawBox(gl, size);
    gl.glPopMatrix()
  }

  override def display(drawable: GLAutoDrawable) {
    val gl = drawable.getGL().getGL2()
    val glu = new GLU
    gl.glClear(GL.GL_COLOR_BUFFER_BIT | GL.GL_DEPTH_BUFFER_BIT)
    gl.glLoadIdentity()
    glu.gluLookAt(0.0, 0.0, -100.0f,
      0.0f, 0.0f, 0.0f,
      0.0f, 1.0f, 0.0f)
    gl.glCallList(21)
  }
}

Cchaddie answered 27/3, 2012 at 4:20 Comment(7)

what hardware are you using ? Is double "buffering" enabled ? – Nonstriated 27/3, 2012 at 4:44

I added caps.setDoubleBuffered(true) and it didn't affect performance. As for hardware, I have a mid-range nvidia graphics card from a year or two ago. CPUs are 2 dual-core opterons from years ago. – Cchaddie 27/3, 2012 at 4:52

Second, please specify the OpenGL version you use. Does GL2 indicate OpenGL 2? Oh, this is JOGL, and GL2 means this is OpenGL 3. Searching for scala GL2 didn't result in much hits... – Munn 27/3, 2012 at 4:56

Note: When you use glNewLists, you're supposed to provide it a display list returned from glGenLists. You don't really have to, but it's common courtesy to allocate what you want. – Meader 27/3, 2012 at 5:21

You might want to replace to for comprehension in drawBox with a while loop. drawBox seems to be called very often and for comprehensions are not that performant. – Newsman 27/3, 2012 at 6:33

You're right, for comprehensions in Scala are slow. However, drawBox is only called during the creation of the display list. So, it shouldn't affect the FPS at all. – Cchaddie 27/3, 2012 at 6:41

Ah, ok. The name drawBox confused me I guess ;-). – Newsman 27/3, 2012 at 6:49

You may want to think about using a Vertex Buffer, which is a way to store drawing information for faster rendering.

See here for an overview:

http://www.opengl.org/wiki/Vertex_Buffer_Object

Nies answered 27/3, 2012 at 4:43 Comment(3)

Why does it talk about that page having deprecated stuff on it? Are VBO's deprecated, or what is on that page? It's confusing. – Cchaddie 27/3, 2012 at 5:10

@taotree: It's calling about the glVertexPointer, glTexCoordPointer and other stuff. That's been removed. Buffer objects are still there. I haven't gotten around to cleaning up that page. – Meader 27/3, 2012 at 5:15

I tried this example that uses VBO's: wadeawalker.wordpress.com/2010/10/17/… and it was able to do 1 million simple shapes at about 28 fps. – Cchaddie 27/3, 2012 at 6:27

If you store the vertex information in a vertex buffer object, then upload it to OpenGL, you will probably see a great increase in performance, particularly if you are drawing static objects. This is because the vertex data stays on the graphics card, rather than fetching it from the CPU every time.

Lutero answered 27/3, 2012 at 5:8 Comment(1)

I thought display lists store the data on the graphics card. – Cchaddie 27/3, 2012 at 5:10

You create a display list in which you call drawItem for each cube. Inside drawItem for each cube you push and pop the current transformation matrix and inbetween rotate and scale the cube to place it correctly. In principle that could be performant since the transformations on the cube coordinates could be precomputed and hence optimized by the driver. When I tried to do the same (display lots of cubes like in minecraft) but without rotation, i.e. I only used glPush/glPopMatrix() and glTranslate3f() , I realized that actually these optimizations, i.e. getting rid of the unneccessary matrix pushes/pops and applications, were NOT done by my driver. So for about 10-20K cubes I only got around 40fps and for 200K cubes only about 6-7 fps. Then, I tried to do the translations manually, i.e. I added the respective offset vectors to the vertices of my cubes directly, i.e. inside the display list there was no matrix push/pop and no glTranslatef anymore, I got a huge speed up, so my code ran about 70 times as fast.

Tayib answered 13/6, 2012 at 12:2 Comment(0)

Recommended topics

Hot tags