As @harold points out, a ^ b = (a | b) - (a & b)
.
This is also an answer on XOR from only OR and AND - union minus intersection, no borrow.
This instruction-set can only AND or OR with a memory destination (or with an immediate into the accumulator), not between it's primary and secondary accumulators (A
and B
). It looks like instructions that access B
just treat it as the high half of a 16-bit integer in B:A, exchange B with A, or inc/dec it.
So we need some scratch space in the "internal" RAM pointed to by P
, because ANMA
[P] and A -> [P]
is the only bitwise AND where both operands are runtime-variable, not immediate. (Or else we need self-modifying code.) Same for or
with ORMA
(OR Memory Accumulator; [P] |= A
). Actually same for subtraction (-
); there's no add/sub between two register values either, only into internal memory.
## Preconditions: one value (x) in A, other value (y) pointed-to by DP
## post-condition: A = XOR of the inputs
## clobbers: P and 2 bytes at scratch_space. DP unmodified
xor_A_DP:
lp scratch_space
mvmd # [P] = [DP]
orma # [P] |= A # scratch_space[0] = x|y
incp
mvmd # [P] = [DP]
anma # [P] &= A # scratch_space[1] = x&y
ldm # A = [P] # x&y
decp
sbm # [P] -= A # (x|y) - (x&y)
ldm # A = x^y
# rtn if this is a function
The comments reiterate what the instruction does because I've never worked with this ISA before; it's not generally a good comment style. (Usually you want to describe the higher-level thing that's being implemented by each instruction.)
Reading [DP]
twice feels wasteful, but "external" memory is not slow (mvmd
is 1 byte, 3 cycles, same as instructions that RMW internal memory), and I think avoiding it would cost more instructions. There isn't an instruction that does [P] = A
other than film
which takes a repeat count to Fill Memory, and exam
to exchange. So to copy I guess one can do it with exam
;ldm
which is slightly faster than anma
; orma
. Feels weird there's no inverse for ldm
that goes the other direction like there is for std
([DP] = A
)
Patrick's version implementing (x+y) - 2*(x&y)
is 21 instructions (including the rtn
and some pointer setup to point DP at one input and at a DP output, which is something I'm already assuming.) Mine is 11 instructions (including an rtn
) all single-byte, but I think that's mostly due to choosing to use 2 bytes of internal memory for scratch space instead of 1 rather than push/pop and exchanges, and taking input in a register and an already-set pointer. (Counting bytes or cycles would be more useful than instructions, but would take more effort.)
Subtracting twice with 3-cycle SBM
, or using an SL
instruction to Shift Left before an SBM
, would also be an efficiency and size improvement to Patrick's code. This ISA doesn't seem to ADD or SUB any more flexibly than it can AND or OR (except for BCD ops over ranges of memory, or 16-bit add/sub), so I think the best implementation of either expressions would be the same except for an extra sl
for (x+y) - 2*(x&y)
. Unless self-modifying code is even better.
Is it useful or better to start by filling 2 bytes of memory with copies of A
, using film
with I=1?
lii 1 # 2 bytes
lp scratch_space
film # [P++] = A twice - for(d=I ; decrement d until it wraps)
# P points past the end of 2 copies of the first input
lp scratch_space
ldd # A = [DP] so A holds second input
orma # scratch_space[0] = x|y
lp scratch_space+1 # incp is also one byte and just as fast
anma # x&y
ldm # A = [P] # A = x&y
decp
sbm # [P] = (x|y) - (x&y)
ldm
# rtn # 13 total instructions including this
So no, this sucks, and lii 1
is 2 bytes, and film
is slower than most.