Best practice for function to handle 1-256 bytes
Asked Answered
A

3

9

I have some functions that are designed to handle 1-256 bytes, running on an embedded C platform where passing a byte is much faster and more compact than passing an int (one instruction versus three), what is the preferred way of coding it:

  1. Accept an int, early-exit if zero, and otherwise copy the LSB of the count value to an unsigned char and use that in a do {} while(--count); loop (a parameter value of 256 will get converted to 0, but will run 256 times)
  2. Accept an unsigned char, early-exit if zero, and have a special version of the function for 256 bytes (those cases will be known in advance).
  3. Accept an unsigned char, and run 256 times if it's zero.
  4. Have a function like the above, but call it via wrappers functions that behave as (0-255) and (256 only).
  5. Have a function like the above, but call it via wrapper macros that behave as (0-255) and (256 only).

It is expected that the inner loop of the function will probably represent 15%-30% of processor execution time when the system is busy; it will sometimes be used for small numbers of bytes, and sometimes for large ones. The memory chip used by the function has a per-transaction overhead, and I prefer to have my memory-access function do the start-transaction/do-stuff/end-transaction sequence internally.

The most efficient code would be to simply accept an unsigned char and regard a parameter value of 0 as a request to do 256 bytes, relying on the caller to avoid any accidental attempts to read 0 bytes. That seems a bit dangerous, though. Have others dealt with such issues on embedded systems? How were they handled?

EDIT The platform is a PIC18Fxx (128K code space; 3.5K RAM), connecting to an SPI flash chip; reading 256 bytes when fewer are expected would potentially overrun read buffers in the PIC. Writing 256 bytes instead of 0 would corrupt data in the flash chip. The PIC's SPI port is limited to one byte every 12 instruction times if one doesn't check busy status; it will be slower if one does. A typical write transaction requires sending 4 bytes in addition to the data to be received; a read requires an extra byte for "SPI turnaround" (the fastest way to access the SPI port is to read the last byte just before sending the next one).

The compiler is HiTech PICC-18std.

I've generally liked the HiTech's PICC-16 compilers; HiTech seems to have diverted their energies away from the PICC-18std product toward their PICC-18pro line which has even slower compilation times, seems to require the use of 3-byte 'const' pointers rather than two-byte pointers, and has its own ideas about memory allocation. Maybe I should look more at the PICC-18pro, but when I tried compiling my project on an eval version of PICC-18pro it didn't work and I didn't figure out exactly why--perhaps something about variable layout not agreeing with my asm routines--I just kept using PICC-18std.

Incidentally, I just discovered that PICC-18 particularly likes do {} while(--bytevar); and particularly dislikes do {} while(--intvar); I wonder what's going through the compiler's "mind" when it generates the latter?

  do
  {
    local_test++;
    --lpw;
  } while(lpw);

  2533                           ;newflashpic.c: 792: do
  2534                           ;newflashpic.c: 793: {
  2535  0144A8  2AD9                incf    fsr2l,f,c
  2536                           ;newflashpic.c: 795: } while(--lpw);
  2537  0144AA  0E00                movlw   low ?_var_test
  2538  0144AC  6EE9                movwf   fsr0l,c
  2539  0144AE  0E01                movlw   high ?_var_test
  2540  0144B0  6EEA                movwf   fsr0h,c
  2541  0144B2  06EE                decf    postinc0,f,c
  2542  0144B4  0E00                movlw   0
  2543  0144B6  5AED                subwfb  postdec0,f,c
  2544  0144B8  50EE                movf    postinc0,w,c
  2545  0144BA  10ED                iorwf   postdec0,w,c
  2546  0144BC  E1F5                bnz l242

The compiler loads a pointer to the variable, not even using the LFSR instruction (which would take two words) but a combination of MOVLW/MOVWF (taking four). Then it uses this pointer to do the decrement and compare. While I'll admit that do{}while(--wordvar); cannot yield as nice code as do{}while(wordvar--); the code is better than what the latter format actually generates. Doing a separate decrement and while-test (e.g. while (--lpw,lpw)) yields sensible code, but it seems a bit ugly. The post-decrement operator could yield the best code for a down-counting loop:

  decf _lpw
  btfss _STATUS,0 ; Skip next inst if carry (i.e. wasn't zero)
   decf _lpw+1
  bc    loop  ; Carry will be clear only if lpw was zero

but it instead generates worse code than --lpw. The best code would be for an up-counting loop:

  infsnz  _lpw
   incfsz _lpw+1
   bra loop

but the compiler doesn't generate that.

EDIT 2 Another approach I might use: allocate a global 16-bit variable for the number of bytes, and write the functions so that the counter is always zeroed before exit. Then if only an 8-bit value is required, it would only be necessary to load 8 bits. I'd use macros for stuff so they could be tweaked for best efficiency. On the PIC, using |= on a variable which is known to be zero is never slower than using =, and is sometimes faster. For example, intvar |= 15 or intvar |= 0x300 would be two instructions (each case only has to bother with one byte of the result and can ignore the other); intvar |= 4 (or any power of 2) is one instruction. Obviously on some other processors, intvar = 0x300 would be faster than intvar |= 0x300; if I use a macro it could be tweaked as appropriate.

Attrahent answered 19/8, 2010 at 16:0 Comment(7)
Is there any real danger of reading 0 bytes except that you may waste time?Pratincole
If the amount of data to read is stored in a byte, and there needs to be the ability to read 256 bytes, then it's reasonable to interpret 0 as 256. In which case "reading 0 bytes" would actually read 256 bytes that potentially aren't ready to be read, and could seriously mess things up.Thaliathalidomide
Could you please share the info about your platform? It is quite interesting in the age to find something byte-oriented. Most embedded CPUs opt for strict alignment, as that allows to spare few wires in the critical silicon parts.Bivalve
@Dummy00001, the PIC18 is an 8 bit processor; the RAM is typically only a couple K bytes, if that.Franzoni
A unsigned char can ONLY accept values from 0 to 255 (0xFF). You can never get a 256 value in a unsigned char!Frequentation
@Zardoz89: True, but a do {...} while(--byteVar); loop (or the ASM equivalent) will loop 256 times if passed a value of zero. On 1980's computers, there were probably as many routines written where a loop count of zero would request 256 loops as ones where it would skip the loop altogether. The idea that a value of 0 represents a "larger" value than the maximum passable value is a little odd, to be sure, but it's not inconsistent with some OS delay routines which accept an unsigned timeout value but use 0 as a special case for "wait indefinitely for an event".Attrahent
@Zardoz89: In the particular scenario at issue, the hardware can usefully program 1-256 bytes with a single command. Programming zero would be useless, and attempting to programming more than 256 would cause a buffer overrun.Attrahent
I
0

FWIW, I'd choose some variant of option #1. The function's interface remains sensible, intuitive, and seems less likely to be called incorrectly (you might want to think about what you want to do if a value larger than 256 is passed in - a debug-build-only assertion might be appropriate).

I don't think the minor 'hack'/micro-optimization to loop the correct number of times using an 8-bit counter would really be a maintenance problem, and it seems you've done considerable analysis to justify it.

I wouldn't argue against wrappers if someone preferred them, but I'd personally lean toward option 1 ever-so-slightly.

However, I would argue against having the public interface require the caller to pass in a value one less than they wanted to read.

Ignitron answered 19/8, 2010 at 18:20 Comment(1)
@Micnael Burr: Supplying the actual number of bytes to process, whatever that may be, "feels" right, though I hate wasting code space. Certainly it has a better "feel" than passing n-1. It might not be unreasonable to handle an arbitrary number of bytes, even though I don't expect any buffers larger than 256 bytes. Some routines like blank-check-range could well be used with counts larger than 256 bytes, and if the inner loop only uses an 8-bit variable, the cost of accepting a larger one wouldn't be huge. I'll post some further thoughts above.Attrahent
F
2

Your inner function should copy count + 1 bytes, e.g.,

 do /* copy one byte */ while(count-- != 0);

If the post-decrement is slow, other alternatives are:

 ... /* copy one byte */
 while (count != 0) { /* copy one byte */; count -= 1; }

or

 for (;;) { /* copy one byte */; if (count == 0) break; count -= 1; }

The caller/wrapper can do:

if (count > 0 && count <= 256) inner((uint8_t)(count-1))

or

if (((unsigned )(count - 1)) < 256u) inner((uint8_t)(count-1))

if its faster in your compiler.

Franzoni answered 19/8, 2010 at 16:28 Comment(1)
You can wrap this in a macro so that the exported interface is sane.Blench
B
0

If an int parameter costs 3 instructions and a char parameter costs 1, you could pass an extra char parameter for the extra 1 bit you're missing. It seems pretty silly that your (presumably 16-bit) int takes more than twice as many instructions as an 8-bit char.

Blench answered 19/8, 2010 at 17:49 Comment(2)
The compiler will allow one char parameter to be passed in the W register; all other parameters have RAM locations allocated to them. In many cases, it would be advantageous if the compiler could pass the LSB of a word parameter in W, but it doesn't. Actually, if I were designing a compiler, I would probably specify that every function that doesn't accept a char but does take a word would have two entry points; one entry point would store W to the LSB and then fall through to the other. Other routines could call whichever one was most advantageous.Attrahent
That's nuts. I thought PICs were supposed to be RISCy. Oh well.Blench
I
0

FWIW, I'd choose some variant of option #1. The function's interface remains sensible, intuitive, and seems less likely to be called incorrectly (you might want to think about what you want to do if a value larger than 256 is passed in - a debug-build-only assertion might be appropriate).

I don't think the minor 'hack'/micro-optimization to loop the correct number of times using an 8-bit counter would really be a maintenance problem, and it seems you've done considerable analysis to justify it.

I wouldn't argue against wrappers if someone preferred them, but I'd personally lean toward option 1 ever-so-slightly.

However, I would argue against having the public interface require the caller to pass in a value one less than they wanted to read.

Ignitron answered 19/8, 2010 at 18:20 Comment(1)
@Micnael Burr: Supplying the actual number of bytes to process, whatever that may be, "feels" right, though I hate wasting code space. Certainly it has a better "feel" than passing n-1. It might not be unreasonable to handle an arbitrary number of bytes, even though I don't expect any buffers larger than 256 bytes. Some routines like blank-check-range could well be used with counts larger than 256 bytes, and if the inner loop only uses an 8-bit variable, the cost of accepting a larger one wouldn't be huge. I'll post some further thoughts above.Attrahent

© 2022 - 2024 — McMap. All rights reserved.