Note: As @Damien_The_Unbeliever pointed out in the comments, the upperHigh
and lowerLow
tables are identical. So they can be combined into a single table. However, that optimization will make the code harder to read, and the explanation harder to write, so combining the tables is left as an exercise for the reader.
The code below shows how to generate the quotient and remainder when dividing a 16-bit unsigned value by 7. The simplest way to explain the code (IMO) is with an example, so let's consider dividing 0xa732
by 7. The expected result is:
quotient = 0x17e2
remainder = 4
We start by considering the input as two 8-bit values, the upper
byte and the lower
byte. The upper
byte is 0xa7
and the lower
byte is 0x32
.
We compute a quotient and remainder from the upper
byte:
0xa700 / 7 = 0x17db
0xa700 % 7 = 3
So we need three tables:
upperHigh
stores the high byte of the quotient: upperHigh[0xa7] = 0x17
upperLow
stores the low byte of the quotient: upperLow[0xa7] = 0xdb
upperRem
stores the remainder: upperRem[0xa7] = 3
And we compute the quotient and remainder from the lower
byte:
0x32 / 7 = 0x07
0x32 % 7 = 1
So we need two tables:
lowerLow
stores the low byte of the quotient: lowerLow[0x32] = 0x07
lowerRem
stores the remainder: lowerRem[0x32] = 1
Now we need to assemble the final answer. The remainder is the sum of two remainders. Since each remainder is in the range [0,6] the sum is in the range [0,12]. So we can use two 13 byte lookups to convert the sum into a final remainder and a carry.
The low byte of the quotient is the sum of that carry and the values from the lowerLow
and upperLow
tables. Note that the sum may generate a carry into the high byte.
The high byte of the quotient is the sum of that carry and the value from the upperHigh
table.
So, to complete the example:
remainder = 1 + 3 = 4 // simple add (no carry in)
lowResult = 0x07 + 0xdb = 0xe2 // add with carry from remainder
highResult = 0x17 // add with carry from lowResult
The assembly code to implement this consists of 7 table lookups, an add-without-carry instruction and two add-with-carry instructions.
#include <stdio.h>
#include <stdint.h>
uint8_t upperHigh[256]; // index:(upper 8 bits of the number) value:(high 8 bits of the quotient)
uint8_t upperLow[256]; // index:(upper 8 bits of the number) value:(low 8 bits of the quotient)
uint8_t upperRem[256]; // index:(upper 8 bits of the number) value:(remainder when dividing the upper bits by 7)
uint8_t lowerLow[256]; // index:(lower 8 bits of the number) value:(low 8 bits of the quotient)
uint8_t lowerRem[256]; // index:(lower 8 bits of the number) value:(remainder when dividing the lower bits by 7)
uint8_t carryRem[13] = { 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 };
uint8_t combinedRem[13] = { 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5 };
void populateLookupTables(void)
{
for (uint16_t i = 0; i < 256; i++)
{
uint16_t upper = i << 8;
upperHigh[i] = (upper / 7) >> 8;
upperLow[i] = (upper / 7) & 0xff;
upperRem[i] = upper % 7;
uint16_t lower = i;
lowerLow[i] = lower / 7;
lowerRem[i] = lower % 7;
}
}
void divideBy7(uint8_t upperValue, uint8_t lowerValue, uint8_t *highResult, uint8_t *lowResult, uint8_t *remainder)
{
uint8_t temp = upperRem[upperValue] + lowerRem[lowerValue];
*remainder = combinedRem[temp];
*lowResult = upperLow[upperValue] + lowerLow[lowerValue] + carryRem[temp];
uint8_t carry = (upperLow[upperValue] + lowerLow[lowerValue] + carryRem[temp]) >> 8; // Note this is just the carry flag from the 'lowResult' calcaluation
*highResult = upperHigh[upperValue] + carry;
}
int main(void)
{
populateLookupTables();
uint16_t n = 0;
while (1)
{
uint8_t upper = n >> 8;
uint8_t lower = n & 0xff;
uint16_t quotient1 = n / 7;
uint16_t remainder1 = n % 7;
uint8_t high, low, rem;
divideBy7(upper, lower, &high, &low, &rem);
uint16_t quotient2 = (high << 8) | low;
uint16_t remainder2 = rem;
printf("n=%u q1=%u r1=%u q2=%u r2=%u", n, quotient1, remainder1, quotient2, remainder2);
if (quotient1 != quotient2 || remainder1 != remainder2)
printf(" **** failed ****");
printf("\n");
n++;
if (n == 0)
break;
}
}
022222
like this. I think you can also use a LUT for the shift 3 in the above approach because you don't have a barrel shifter. This way you can also do a modulo 7 in parallel because it's just the modulo 7 of the sum of digits in base 8 – Sydelle