improving C circular buffer efficiency

Asked 15/3, 2012 at 10:44 Answered 8/3, 2019 at 1:30

I'd like some help improving the efficiency of my circular buffer code.

I had a look around stackoverflow and found that (nearly) all of the topics on circular buffers are about the uses of such a buffer or the basic implementation of a circular buffer. I really need information about how to make it super efficient.

The plan is to use this buffer with the STM32F4 microcontroller which has a single precicion FPU. I plan to make heavy use of especially the write() and readn() functions. We're literally talking a few million calls a second here so shaving of a few clock cycles here and there is really going to make a difference.

I'll put the most important bits of code here, the full buffer code is available via http://dl.dropbox.com/u/39710897/circular%20buffer.rar

Can anyone provide me with a few pointers on how to improve the efficiency of this buffer?

#define BUFF_SIZE 3             // buffer size set at compile time

typedef struct buffer{
    float buff[BUFF_SIZE];
    int readIndex;
    int writeIndex;
}buffer;

/********************************\
* void write(buffer* buffer, float value)
* writes value into the buffer
* @param buffer* buffer
*   pointer to buffer to be used
* @param float value
*   valueto be written in buffer
\********************************/
void write(buffer* buffer,float value){
    buffer->buff[buffer->writeIndex]=value;
    buffer->writeIndex++;
    if(buffer->writeIndex==BUFF_SIZE)
        buffer->writeIndex=0;
}

/********************************\
* float readn(buffer* buffer, int Xn)
* reads specified value from buffer
* @param buffer* buffer
*   pointer to buffer to be read from
* @param int Xn
*   specifies the value to be read from buffer counting backwards from the most recently written value
*   i.e. the most recently writen value can be read with readn(buffer, 0), the value written before that with readn(buffer, 1)
\********************************/
float readn(buffer* buffer, int Xn){
    int tempIndex;

    tempIndex=buffer->writeIndex-(Xn+1);
    while(tempIndex<0){
        tempIndex+=BUFF_SIZE;
    }

    return buffer->buff[tempIndex];
}

Yellowthroat answered 15/3, 2012 at 10:44 Comment(1)

readindex is indeed not used in the functions listed above. It is used though in the read() function that can be found in the attached rar file. The readn() function listed here serves to read a specific value from the buffer (ie the second last written value) – Yellowthroat 15/3, 2012 at 11:0

As "Oli Charlesworth" suggested - you'd be able to simplify things if your buffer size is a power of 2. I'd like to write the read/write function bodies, so that the intent is more clear.

#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)

struct buffer {
    float buff[BUFF_SIZE];
    unsigned writeIndex;
};

void write(struct buffer *buffer, float value) {
    buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}

float readn(struct buffer *buffer, unsigned Xn){
    return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}

Some explanations. Note that there's no branching (if) at all. We don't limit the array index to the array bounds, instead we're AND-ing it with the mask.

Carri answered 15/3, 2012 at 11:5 Comment(9)

As soon as you use bitmasks you should simply switch to unsigned index types. Then you don't have to argue about 2-complement or stuff like that. – Materse 15/3, 2012 at 12:14

I thank you a LOT, this has made my code roughly 6.5 times faster! I'd call that a rather dramatic improvement – Yellowthroat 15/3, 2012 at 12:34

since I can set only 1 answer as accepted, I'll to that with this one because of the examples even though Oli Charlesworth came up with a similar answer. – Yellowthroat 15/3, 2012 at 12:37

It might make sense to try to make ReadArray and WriteArray functions that should work with arrays instead of single values. Or to make write and readn as macroses or inline procedures (in case of C++ compiler). In that case you avoid some stack workload, that may give you cpu ticks. – Guaranty 29/3, 2013 at 7:32

@Carri writeIndex will eventually overflow – Indoxyl 7/2, 2016 at 11:30

As @Indoxyl noted, this answer is incorrect C code, and will invoke undefined behaviour. I tried to edit the answer: stackoverflow.com/review/suggested-edits/22381560 but was rejected. Please do not use this code. – Usia 8/3, 2019 at 1:13

@EvanBenn: why do you think it invokes undefined behavior? There is no problem with writeIndex overflow. – Carri 8/3, 2019 at 6:30

When you say there is no problem, what do you mean? If signed writeIndex overflows that is undefined behaviour. unsigned integers wrap around. – Usia 10/3, 2019 at 23:16

try using the modulus operator. It's job is to only give you the remainder, so if you modulus the index by the total size of the buffer, it will do the wrap for you at fast speed: ++write_index % INDEX_SIZE – Cryometer 21/12, 2023 at 19:1

If you can make your buffer size a power-of-2, then the check against zero can be replaced with unconditional bit-masking. On most processors, this should be faster.

Thresher answered 15/3, 2012 at 10:46 Comment(3)

Agree. Also there'll be no need to check for index over/under-flow at all. Just increment and decrement it without any checking, just AND-mask it for the actual element access – Carri 15/3, 2012 at 10:53

I'll be investigating this, making it a power of 2 is easy enough (just set BUFF_SIZE to 4). I'll have to dig a bit for the bit-masking though – Yellowthroat 15/3, 2012 at 11:1

@Yellowthroat I assume the final application will have a much larger buffer size? Otherwise I would strongly question the use of the ring buffer ADT in the first place. – Vere 15/3, 2012 at 11:59

This may not be seem elegant but is efficient. Accessing structure elements through the pointer is taking up a lot of instructions. Why not remove the structure altogether and make buffer and writeIndex as global variables? This will considerably decrease the size of your readn and write functions.

I tried in gcc and here is the output with and without the structure

With Structure

_write:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %ecx
    movl    8(%ebp), %eax
    movl    16(%eax), %edx
    movl    12(%ebp), %eax
    movl    %eax, (%ecx,%edx,4)
    movl    8(%ebp), %eax
    incl    16(%eax)
    movl    8(%ebp), %eax
    cmpl    $3, 16(%eax)
    jne L1
    movl    8(%ebp), %eax
    movl    $0, 16(%eax)
L1:
    popl    %ebp
    ret

Without Structure. ie Making buffer and writeIndex as global

_write:
    pushl   %ebp
    movl    %esp, %ebp
    movl    _writeIndex, %edx
    movl    8(%ebp), %eax
    movl    %eax, _buff(,%edx,4)
    incl    _writeIndex
    cmpl    $3, _writeIndex
    jne L1
    movl    $0, _writeIndex
L1:
    popl    %ebp
    ret

Cercaria answered 15/3, 2012 at 11:43 Comment(7)

With optimizations enabled there shouldn't be such a big difference. One may resolve once the address of buff and writeIndex, all the rest should be the same. OTOH without optimizations any access such as buff->... would be long – Carri 15/3, 2012 at 11:51

This won't work if the OP needs more than one buffer instance. – Thresher 15/3, 2012 at 12:1

@Carri Hmmm.. But the compiler seems to be doing something else. I tried with -O2 and the resultant code seems to be much bigger than the above ones :O – Cercaria 15/3, 2012 at 12:2

@OliCharlesworth You are right. He would then have to create more global buffers/indices which would go on becoming clumsier. That's why I added the not so elegant phrase in my answer :) – Cercaria 15/3, 2012 at 12:6

I'll be keeping this one in mind, but I doubt that I'll use it. I do indeed need multiple instances and this just seems unmaintanable to me.. I'll just get confused too often – Yellowthroat 15/3, 2012 at 12:12

The assembler really looks ineffient. I am sure you don't use the correct options for a modern platform. (-O3 -march=native should do for gcc). Also, the lengthy preamble is in that assembler because this is seen as a function. when declared as inline and then inlined into the call side, most of this noise should go away. – Materse 15/3, 2012 at 12:20

@Stacker, I meant "assembler" as a short hand for "assembly listing". – Materse 15/3, 2012 at 19:2

Keeping track of the start and the end of the circular buffer with pointers, is likely a bit faster than array indexing, since the address will computed in runtime in case of the latter. Try to replace readIndex and writeIndex with float* instead. The code would then be

*buffer->writeIndex = value;
buffer->writeIndex++;
if(buffer->writeIndex == buffer + BUFF_SIZE)
  buffer->writeIndex=buffer->buff;

buffer + BUFF_SIZE will still be a constant expression and the compiler will translate that to a fixed address at compile time.

Vere answered 15/3, 2012 at 11:54 Comment(0)

The accepted answer contains code that is incorrect and will invoke undefined behavior. Correction below:

#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)

struct buffer {
    float buff[BUFF_SIZE];
    unsigned writeIndex;
};

void write(struct buffer *buffer, float value) {
    buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}

float readn(struct buffer *buffer, unsigned Xn){
    return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}

The error in the original answer was to assume that 'int' will wrap around. Using binary masks with int is also unwise.

Usia answered 8/3, 2019 at 1:30 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags