Linking graphs of structures by pointer vs. index in array
Asked Answered
C

2

5

This is question about good practice

Consider situation which is typical e.g. in 3D engines, physics engines, Finite element method or classical molecular dynamics solvers: You have objects of various types ( e.g. vertexes, edges, faces, bounded solid volumes ) which are cross-linked to each other (e.g. vertex know which edge are connected to it and vice versa). For performance and convenience of usage of such engine is crucial to be able quickly browse the network of such connections.

The question is: Is it better to point to the linked object by index in array, or by pointer ? ... especially performance-wise

typedef index_t uint16_t;

class Vertex{
    Vec3 pos;
    #ifdef BY_POINTER
    Edge*   edges[nMaxEdgesPerVertex];
    Face*   faces[nMaxFacesPerVertex];
    #else
    index_t edges[nMaxEdgesPerVertex];
    index_t faces[nMaxFacesPerVertex];
    #endif
}

class Edge{
    Vec3   direction;
    double length;
    #ifdef BY_POINTER
    Vertex* verts[2];
    Faces*  faces[nMaxFacesPerEdge];  
    #else
    index_t verts[2];
    index_t faces[nMaxFacesPerEdge];
    #endif
}

class Face{
    Vec3   normal;
    double isoVal; // Plane equation: normal.dot(test_point)==isoVal
    #ifdef BY_POINTER
    Vertex* verts[nMaxVertsPerFace];
    Edge*   edges[nMaxEdgesPerFace];
    #else
    index_t verts[nMaxVertsPerFace];
    index_t edges[nMaxEdgesPerFace];
    #endif
}

#ifndef BY_POINTER
// we can use other datastructure here, such as std:vector or even some HashMap 
int nVerts,nEdges,nFaces;
Vertex verts[nMaxVerts];
Edge   edges[nMaxEdges];
Vertex faces[nMaxFaces];
#endif 

Advantages of index:

  • using index can be more memory efficient when we use uint8_t or uint16_t for index instead of 32-bit or 64-bit pointer
  • index can carry some additional information ( e.g. about orientation of edge ) encoded in some bits;
  • the ordering of objects in array can carry some information about structure ( e.g. vertexes of cube could be ordered as {0b000,0b001,0b010,0b011,0b100,0b101,0b110,0b111} ). This information is not visible in pointers

Advantages of pointers:

  • We don't need to care about the arrays (or other data-structures) to store the objects. The objects can be simply allocated dynamically on the heap by new Vertex().
  • May be faster (?) because it does not need to add the base address of the array (?). But this is probably negligible with respect to memory latency (?)
Crippling answered 21/4, 2016 at 9:24 Comment(0)
S
5

using index can be more memory efficient when we use uint8_t or uint16_t for index instead of 32-bit or 64-bit pointer

True. Having a small representation reduce the total size of the structure, reducing cache miss when traversing it.

index can carry some additional information ( e.g. about orientation of edge ) encoded in some bits;

True.

We don't need to care about the arrays (or other data-structures) to store the objects. The objects can be simply allocated dynamically on the heap by new Vertex().

This is exactly what you don't want to do, speaking of performances. You want to be sure Vertex are all packed, to avoid unnecessary cache missing. In this case the array would save you from that wrong temptation. You also want to access them sequentially, at least as much as possible, again to minimize cache miss.

How much you data structure are packed, small and accessed sequentially, is what actually drive performances.

May be faster (?) because it does not need to add the base address of the array (?). But this is probably negligible with respect to memory latency (?)

Possibly negligible. Probably depends on specific hardware and|or compiler.

Another missing advantage about index: easier to manage when you reallocate. Consider a structure that can grow, like the following:

struct VertexList
{
  std::vector<Vertex> vertices;

  Vertex *start; // you can still access using vector if you prefer; start = &vertices[0];
}

If you are referencing a given vertex using pointers, and a reallocation occurs, you will end up with an invalid pointer.

Scurrilous answered 21/4, 2016 at 10:54 Comment(1)
Something else that bothers me about the index way is what happens when you have multiple instances of your container - indices from one are not valid in another, something which is impossible with pointers. Say you have a function that takes a graph and a list of indices - you have to be sure you used the right graph. Indices can be stored more naturally, though. It's good to avoid giving a collection of pointers-to-mutable to some other code to look at. However defaulting to const doesn't always work well, unless the container provides some sort of "launder" method.Sedberry
D
5

For performance, what matters is the speed with which you can read the "next" element in whatever traversal order(s) are commonly done in the hot path.

For example, if you have a series of edges which represent some path, you would want them to be stored contiguously in memory (not using new for each one), in the order in which they are connected.

For this case (edges forming a path), it's clear that you do not need pointers, and you also do not need indexes. The connections are implied by the storage locations, so you just need a pointer to the first and perhaps the last edges (i.e. you can store the whole path in a std::vector<Edge>).

A second example illustrating domain knowledge that we can exploit: imagine we have a game supporting up to 8 players and want to store "who has visited each of the edges in a path." Again we do not need pointers nor indexes to refer to the 8 players. Instead, we can simply store a uint8_t inside each Edge and use the bits as flags for each player. Yes, this is low-level bit banging, but it gives us compact storage and efficient lookup once we have an Edge*. But if we need to do the lookup in the other direction, from players to Edges, the most efficient will be to store e.g. a vector of uint32_t inside each player and do indexing into the Edge array.

But what if edges can be added and removed in the middle of a path? Well then we might want a linked list. In this case we should use an intrusive linked list, and allocate the Edges in a pool. Having done this, we can store pointers to the Edges in each player object, and they will never change or need to be updated. We use an intrusive linked list with the understanding that an Edge is only ever part of a single path, so extrinsic storage of the linked-list pointers would be wasteful (std::list needs to store pointers to each object; an intrusive list does not).

So, each case must be considered individually, with as much knowledge of the domain as we can discover in advance. Neither pointers and indexing should be the first approach.

Deliquesce answered 21/4, 2016 at 9:39 Comment(4)
true, but make specialized traversal algorithm for each particular shape is tedious and cannot be applied when actual shape is set by user. My question mostly concerning about making of such general engine / algorithm working for shape unknown in advance.Crippling
Data structures don't exist in vacuums. A specific task might require a graph, an acyclic graph, a binary tree, a path, etc. There is no one-size-fits-all solution when you are concerned about performance (which you said is a key focus for you). If all you want is the most generic possible solution, use pointers, use the STL, and say goodbye to optimal performance.Deliquesce
sure, but real situation is a compromise resp. multi-objective optimization ... best solution should be fast, simple and general. (but that is already quite vague philosophical discussion)Crippling
Pointers are fast, simple, and general. They are not the fastest however.Deliquesce
S
5

using index can be more memory efficient when we use uint8_t or uint16_t for index instead of 32-bit or 64-bit pointer

True. Having a small representation reduce the total size of the structure, reducing cache miss when traversing it.

index can carry some additional information ( e.g. about orientation of edge ) encoded in some bits;

True.

We don't need to care about the arrays (or other data-structures) to store the objects. The objects can be simply allocated dynamically on the heap by new Vertex().

This is exactly what you don't want to do, speaking of performances. You want to be sure Vertex are all packed, to avoid unnecessary cache missing. In this case the array would save you from that wrong temptation. You also want to access them sequentially, at least as much as possible, again to minimize cache miss.

How much you data structure are packed, small and accessed sequentially, is what actually drive performances.

May be faster (?) because it does not need to add the base address of the array (?). But this is probably negligible with respect to memory latency (?)

Possibly negligible. Probably depends on specific hardware and|or compiler.

Another missing advantage about index: easier to manage when you reallocate. Consider a structure that can grow, like the following:

struct VertexList
{
  std::vector<Vertex> vertices;

  Vertex *start; // you can still access using vector if you prefer; start = &vertices[0];
}

If you are referencing a given vertex using pointers, and a reallocation occurs, you will end up with an invalid pointer.

Scurrilous answered 21/4, 2016 at 10:54 Comment(1)
Something else that bothers me about the index way is what happens when you have multiple instances of your container - indices from one are not valid in another, something which is impossible with pointers. Say you have a function that takes a graph and a list of indices - you have to be sure you used the right graph. Indices can be stored more naturally, though. It's good to avoid giving a collection of pointers-to-mutable to some other code to look at. However defaulting to const doesn't always work well, unless the container provides some sort of "launder" method.Sedberry

© 2022 - 2024 — McMap. All rights reserved.