improving C circular buffer efficiency
Asked Answered
Y

5

10

I'd like some help improving the efficiency of my circular buffer code.

I had a look around stackoverflow and found that (nearly) all of the topics on circular buffers are about the uses of such a buffer or the basic implementation of a circular buffer. I really need information about how to make it super efficient.

The plan is to use this buffer with the STM32F4 microcontroller which has a single precicion FPU. I plan to make heavy use of especially the write() and readn() functions. We're literally talking a few million calls a second here so shaving of a few clock cycles here and there is really going to make a difference.

I'll put the most important bits of code here, the full buffer code is available via http://dl.dropbox.com/u/39710897/circular%20buffer.rar

Can anyone provide me with a few pointers on how to improve the efficiency of this buffer?

#define BUFF_SIZE 3             // buffer size set at compile time

typedef struct buffer{
    float buff[BUFF_SIZE];
    int readIndex;
    int writeIndex;
}buffer;

/********************************\
* void write(buffer* buffer, float value)
* writes value into the buffer
* @param buffer* buffer
*   pointer to buffer to be used
* @param float value
*   valueto be written in buffer
\********************************/
void write(buffer* buffer,float value){
    buffer->buff[buffer->writeIndex]=value;
    buffer->writeIndex++;
    if(buffer->writeIndex==BUFF_SIZE)
        buffer->writeIndex=0;
}

/********************************\
* float readn(buffer* buffer, int Xn)
* reads specified value from buffer
* @param buffer* buffer
*   pointer to buffer to be read from
* @param int Xn
*   specifies the value to be read from buffer counting backwards from the most recently written value
*   i.e. the most recently writen value can be read with readn(buffer, 0), the value written before that with readn(buffer, 1)
\********************************/
float readn(buffer* buffer, int Xn){
    int tempIndex;

    tempIndex=buffer->writeIndex-(Xn+1);
    while(tempIndex<0){
        tempIndex+=BUFF_SIZE;
    }

    return buffer->buff[tempIndex];
}
Yellowthroat answered 15/3, 2012 at 10:44 Comment(1)
readindex is indeed not used in the functions listed above. It is used though in the read() function that can be found in the attached rar file. The readn() function listed here serves to read a specific value from the buffer (ie the second last written value)Yellowthroat
C
16

As "Oli Charlesworth" suggested - you'd be able to simplify things if your buffer size is a power of 2. I'd like to write the read/write function bodies, so that the intent is more clear.

#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)

struct buffer {
    float buff[BUFF_SIZE];
    unsigned writeIndex;
};

void write(struct buffer *buffer, float value) {
    buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}

float readn(struct buffer *buffer, unsigned Xn){
    return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}

Some explanations. Note that there's no branching (if) at all. We don't limit the array index to the array bounds, instead we're AND-ing it with the mask.

Carri answered 15/3, 2012 at 11:5 Comment(9)
As soon as you use bitmasks you should simply switch to unsigned index types. Then you don't have to argue about 2-complement or stuff like that.Materse
I thank you a LOT, this has made my code roughly 6.5 times faster! I'd call that a rather dramatic improvementYellowthroat
since I can set only 1 answer as accepted, I'll to that with this one because of the examples even though Oli Charlesworth came up with a similar answer.Yellowthroat
It might make sense to try to make ReadArray and WriteArray functions that should work with arrays instead of single values. Or to make write and readn as macroses or inline procedures (in case of C++ compiler). In that case you avoid some stack workload, that may give you cpu ticks.Guaranty
@Carri writeIndex will eventually overflowIndoxyl
As @Indoxyl noted, this answer is incorrect C code, and will invoke undefined behaviour. I tried to edit the answer: stackoverflow.com/review/suggested-edits/22381560 but was rejected. Please do not use this code.Usia
@EvanBenn: why do you think it invokes undefined behavior? There is no problem with writeIndex overflow.Carri
When you say there is no problem, what do you mean? If signed writeIndex overflows that is undefined behaviour. unsigned integers wrap around.Usia
try using the modulus operator. It's job is to only give you the remainder, so if you modulus the index by the total size of the buffer, it will do the wrap for you at fast speed: ++write_index % INDEX_SIZECryometer
T
12

If you can make your buffer size a power-of-2, then the check against zero can be replaced with unconditional bit-masking. On most processors, this should be faster.

Thresher answered 15/3, 2012 at 10:46 Comment(3)
Agree. Also there'll be no need to check for index over/under-flow at all. Just increment and decrement it without any checking, just AND-mask it for the actual element accessCarri
I'll be investigating this, making it a power of 2 is easy enough (just set BUFF_SIZE to 4). I'll have to dig a bit for the bit-masking thoughYellowthroat
@Yellowthroat I assume the final application will have a much larger buffer size? Otherwise I would strongly question the use of the ring buffer ADT in the first place.Vere
C
2

This may not be seem elegant but is efficient. Accessing structure elements through the pointer is taking up a lot of instructions. Why not remove the structure altogether and make buffer and writeIndex as global variables? This will considerably decrease the size of your readn and write functions.

I tried in gcc and here is the output with and without the structure

With Structure

_write:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %ecx
    movl    8(%ebp), %eax
    movl    16(%eax), %edx
    movl    12(%ebp), %eax
    movl    %eax, (%ecx,%edx,4)
    movl    8(%ebp), %eax
    incl    16(%eax)
    movl    8(%ebp), %eax
    cmpl    $3, 16(%eax)
    jne L1
    movl    8(%ebp), %eax
    movl    $0, 16(%eax)
L1:
    popl    %ebp
    ret

Without Structure. ie Making buffer and writeIndex as global

_write:
    pushl   %ebp
    movl    %esp, %ebp
    movl    _writeIndex, %edx
    movl    8(%ebp), %eax
    movl    %eax, _buff(,%edx,4)
    incl    _writeIndex
    cmpl    $3, _writeIndex
    jne L1
    movl    $0, _writeIndex
L1:
    popl    %ebp
    ret
Cercaria answered 15/3, 2012 at 11:43 Comment(7)
With optimizations enabled there shouldn't be such a big difference. One may resolve once the address of buff and writeIndex, all the rest should be the same. OTOH without optimizations any access such as buff->... would be longCarri
This won't work if the OP needs more than one buffer instance.Thresher
@Carri Hmmm.. But the compiler seems to be doing something else. I tried with -O2 and the resultant code seems to be much bigger than the above ones :OCercaria
@OliCharlesworth You are right. He would then have to create more global buffers/indices which would go on becoming clumsier. That's why I added the not so elegant phrase in my answer :)Cercaria
I'll be keeping this one in mind, but I doubt that I'll use it. I do indeed need multiple instances and this just seems unmaintanable to me.. I'll just get confused too oftenYellowthroat
The assembler really looks ineffient. I am sure you don't use the correct options for a modern platform. (-O3 -march=native should do for gcc). Also, the lengthy preamble is in that assembler because this is seen as a function. when declared as inline and then inlined into the call side, most of this noise should go away.Materse
@Stacker, I meant "assembler" as a short hand for "assembly listing".Materse
V
2

Keeping track of the start and the end of the circular buffer with pointers, is likely a bit faster than array indexing, since the address will computed in runtime in case of the latter. Try to replace readIndex and writeIndex with float* instead. The code would then be

*buffer->writeIndex = value;
buffer->writeIndex++;
if(buffer->writeIndex == buffer + BUFF_SIZE)
  buffer->writeIndex=buffer->buff;

buffer + BUFF_SIZE will still be a constant expression and the compiler will translate that to a fixed address at compile time.

Vere answered 15/3, 2012 at 11:54 Comment(0)
U
1

The accepted answer contains code that is incorrect and will invoke undefined behavior. Correction below:

#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)

struct buffer {
    float buff[BUFF_SIZE];
    unsigned writeIndex;
};

void write(struct buffer *buffer, float value) {
    buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}

float readn(struct buffer *buffer, unsigned Xn){
    return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}

The error in the original answer was to assume that 'int' will wrap around. Using binary masks with int is also unwise.

Usia answered 8/3, 2019 at 1:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.