I'm working on a project on a STM32F4 CPU, generating signals.
I have a generic timer on CPU clock (no prescaler) on a STM32 triggering interrupts on overflow, to generate a periodic signal with GPIO afterwards.
I need to trigger thr GPIO at a very precise time (basically down to one CPU cycle precision). I've managed to reduce this jitter to +-5 cycles by setting priorities & al, but this jitter exists, depending on what the CPU was doing.
I need to compensate this few cycles jitter. Adding a few cycles more latency isn't a problem as long as I toggle GPIOs at a precise time.
My idea was to read the current value of the counter, and have an active loop of FIXED_NUMBER-CURRENT_VALUE time, ensuring I would exit the loop at precise times.
However, doing a simple loop in C - being a FOR loop, or a while(counter->value < TARGET) doesn't work as it ADDS jitter instead of reducing it.
Am I doing something wrong / naive ? Should I do it in assembly ? how would that be different from C (I checked the disassembly with GCC to check loop was not optimized away nor was I hitting memory ?)
(I ensured with empty, non optimized but not hitting memory loop body)
edit : see this example on AVR (much more stable I know) See by example http://lucidscience.com/pro-vga%20video%20generator-7.aspx (search for "jitter")
edit2 : I tried a simple loop in assembly such as (r0 is my counter, number of cycles to wait, in a register)
loop : SUBS r0,#1 ; tried with 2 also
BGE loop
and, again, jitter is better without it.
To sumit up, I already know how much I should delay. I just need a way to have a branch of code consume reliably N cycles in a case and M in another. Unfortunately, branches alone don't seem to work because a pipeline refill doesn't seem to take a reliable number of cycles, and conditional expressions don't either because they always take the same number of cycles (sometimes doing nothing).
Would running from RAM instead of flash improve consistency ? (NB stm32f4 have a flash prefetch..)