Generally you figure out how many clock cycles you need to burn, then write a loop. Consult your datasheet to determine how many cycles your loop takes and calculate how many iterations you need.
ldi r16, x ; 1 cycle
loop: nop ; 1 cycle
dec r16 ; 1 cycle
brne loop1 ; 2 cycles when jumping, 1 otherwise
Depending on the value of x
, this loop will take x * 4
cycles. With a 16MHz board 1ms is 16000 cycles, so 5ms would be 80000 cycles. That's more than this 8 bit loop can manage so we need to make a 16 bit counter.
ldi r16, x ; 1 cycle
ldi r17, y ; 1 cycle
loop: nop ; 1 cycle
dec r16 ; 1 cycle
brne skip ; 2 cycles when jumping, 1 otherwise
dec r17 ; 1 cycle
skip: brne loop ; 2 cycles when jumping, 1 otherwise
Okay so our loop body now takes 6 cycles per iteration. Notice that it's 6 cycles no matter if r16
is wrapping or not. The setup takes 2 cycles but the final brne
gives us 1 cycle back so we got 1 cycle overhead. That means we need 79999 cycles which is 13333 iterations and one more cycle to waste. Thus x=low(13333)=21
and y=high(13333)=52
and add a nop
.
That's the general idea, I hope I have not miscalculated anything. If you intend to make a function of this, factor in the overhead of the call and return. Also, you can make it parametrized.