How to XOR on a CPU that doesn't have an XOR instruction

Asked 14/8, 2015 at 20:58 Answered 29/11, 2023 at 5:5

Solved assembly bit-manipulation bitwise-operators boolean-logic bitwise-xor

This is more of a just for fun question. I’m working on a SC61860 CPU, which is an 8-bit CPU for a Sharp PC-1360 Pocket Computer from 1987 (also used in PC-1401 & 1403’s). Its instruction set doesn’t actually include an XOR. It does have AND, OR, compare, subtraction and addition instructions.

I have tried a few variations of ANDing and ORing values to get result that XOR would produce, but no luck. I was hoping to avoid comparing, but it looks like I don’t have a choice.

If you're interested, you can check out the instruction set (alternatives github and PC-1350 reference).

BTW, this CPU has been great to learn assembly on. Nice and simple, and slow enough (768kHz) that machine language is noticeably faster then using the computers built in BASIC ;) I usually program in C/C++/Java. Assembly has been a breath of fresh air.

Portauprince answered 14/8, 2015 at 20:58 Comment(0)

From boolean algebra we know that:

A XOR B = (NOT(A) AND B) OR (A AND NOT(B))

Update: Thank @Brett Hale, @slebetman, as the CPU surprisingly not supporting the NOT instruction, it can be emulated by arithmetic negation and subtraction, assuming 2's complement negatives representation):

NOT(A) = (-1) - A

Or in case of different negative representation the -1 can be replaced by the corresponding storage type maximum value (i.e. 255 for 8 bit registers or 65565 for 16 bit registers).

Stallion answered 14/8, 2015 at 21:0 Comment(5)

Well, no reason for this not to work other than the fact that CPU doesn't have a NOT instruction either. – Stealer 14/8, 2015 at 22:53

@Ross Ridge Well, that sucks :) I'll update the answer a bit later about how to emulate it with arithmetic ops. – Stallion 14/8, 2015 at 23:26

NOT(A) = - 1 - A , which is: 255 - A or 65535 - A for 8 / 16 bit registers. – Syncope 14/8, 2015 at 23:47

@BrettHale: Be careful in assuming 2s complement. Some of these weird CPUs with weird instruction sets sometimes use signed integer instead of 2s complement. Therefore it's safer to do 255 - A in case -1 is actually equal to 128 instead of 255. – Advocacy 17/8, 2015 at 2:15

@slebetman: Or just use unsigned, like C -1U as a way to write 255 for 8-bit. The 2's-complement identities are true for unsigned as well, because the whole point of 2's complement is that it's + and - operations are the same as for unsigned binary. – Flash 28/11, 2023 at 22:19

As @harold points out, a ^ b = (a | b) - (a & b).
This is also an answer on XOR from only OR and AND - union minus intersection, no borrow.

This instruction-set can only AND or OR with a memory destination (or with an immediate into the accumulator), not between it's primary and secondary accumulators (A and B). It looks like instructions that access B just treat it as the high half of a 16-bit integer in B:A, exchange B with A, or inc/dec it.

So we need some scratch space in the "internal" RAM pointed to by P, because ANMA [P] and A -> [P] is the only bitwise AND where both operands are runtime-variable, not immediate. (Or else we need self-modifying code.) Same for or with ORMA (OR Memory Accumulator; [P] |= A). Actually same for subtraction (-); there's no add/sub between two register values either, only into internal memory.

## Preconditions: one value (x) in A, other value (y) pointed-to by DP
## post-condition: A = XOR of the inputs
## clobbers: P and 2 bytes at scratch_space.  DP unmodified

xor_A_DP:
  lp    scratch_space
  mvmd             # [P] = [DP]
  orma             # [P] |= A   # scratch_space[0] = x|y
  incp
  mvmd             # [P] = [DP]
  anma             # [P] &= A   # scratch_space[1] = x&y
  ldm              # A = [P]    # x&y
  decp
  sbm              # [P] -= A   # (x|y) - (x&y)
  ldm              # A = x^y

# rtn  if this is a function

The comments reiterate what the instruction does because I've never worked with this ISA before; it's not generally a good comment style. (Usually you want to describe the higher-level thing that's being implemented by each instruction.)

Reading [DP] twice feels wasteful, but "external" memory is not slow (mvmd is 1 byte, 3 cycles, same as instructions that RMW internal memory), and I think avoiding it would cost more instructions. There isn't an instruction that does [P] = A other than film which takes a repeat count to Fill Memory, and exam to exchange. So to copy I guess one can do it with exam;ldm which is slightly faster than anma ; orma. Feels weird there's no inverse for ldm that goes the other direction like there is for std ([DP] = A)

Patrick's version implementing (x+y) - 2*(x&y) is 21 instructions (including the rtn and some pointer setup to point DP at one input and at a DP output, which is something I'm already assuming.) Mine is 11 instructions (including an rtn) all single-byte, but I think that's mostly due to choosing to use 2 bytes of internal memory for scratch space instead of 1 rather than push/pop and exchanges, and taking input in a register and an already-set pointer. (Counting bytes or cycles would be more useful than instructions, but would take more effort.)

Subtracting twice with 3-cycle SBM, or using an SL instruction to Shift Left before an SBM, would also be an efficiency and size improvement to Patrick's code. This ISA doesn't seem to ADD or SUB any more flexibly than it can AND or OR (except for BCD ops over ranges of memory, or 16-bit add/sub), so I think the best implementation of either expressions would be the same except for an extra sl for (x+y) - 2*(x&y). Unless self-modifying code is even better.

Is it useful or better to start by filling 2 bytes of memory with copies of A, using film with I=1?

  lii   1                    # 2 bytes
  lp    scratch_space
  film            # [P++] = A twice  - for(d=I ; decrement d until it wraps)
  # P points past the end of 2 copies of the first input

  lp    scratch_space
  ldd             # A = [DP]   so A holds second input
  orma            # scratch_space[0] = x|y
  lp    scratch_space+1         # incp is also one byte and just as fast
  anma            # x&y
  ldm             # A = [P]      # A = x&y
  decp
  sbm             # [P] = (x|y) - (x&y)
  ldm
# rtn          # 13 total instructions including this

So no, this sucks, and lii 1 is 2 bytes, and film is slower than most.

Flash answered 29/11, 2023 at 5:5 Comment(7)

The big instruction number difference for my or 8BitCoders comes from the precondition of using 3 bytes in memory for the parameter. See the edit in my answer to see the difference If I use your preconditions. It just uses 1 instruction more than yours. – Coagulase 29/11, 2023 at 7:31

SL instruction is not ideal as it is in reality a "ROL" instruction and requires here to reset the carry flag before usage. Shifting is therefore 2 instructions and 4 cycles. – Coagulase 29/11, 2023 at 8:2

Oh, wow, the ISA quick reference the OP linked didn't say that! Some other 8-bit micros are like that, only having rotate-through-carry (what x86 calls rcl), or for left shift equivalent to adc with itself. incp and decp write C, so in this case we can arrange for C=0 before sl by using them instead of lp. (Or using incp / lp if this ISA uses C as a not-borrow like ARM does for decrement / subtraction) – Flash 29/11, 2023 at 8:10

INCP an DECP do not touch the flags. P is not a internal memory backed register. The INCr and DECr do touch the flags because they apply to the internal memory. – Coagulase 29/11, 2023 at 8:34

@PatrickSchlüter So aldweb.com/articles.php?lng=en&pg=53 is just flat out wrong about that? Yikes. It shows "C, Z" in the flags column for incp and decp. Unlike with shifts where some people might consider rotate-through-carry as a normal meaning for "shift", this is unambiguous. Perhaps there are variants of the ISA with different behaviour details? Also, I hadn't noticed the registers were memory-mapped in internal memory. Interesting, so that's what your code was doing with lp B_REG. So we can AND / OR / ADD / SUB into a register after all, by pointing P at it. – Flash 29/11, 2023 at 8:46

I use github.com/utz82/SC61860-Instruction-Set and pockemul.com/wp-content/uploads/2022/12/PC1350_ML_EN.pdf as my sources for instructions. The github is active and accepts pull requests when there errors. The ISA is indeed quite challenging as it has very unusual instructions (BCD, looped ops, nibble shifts, switch/case, I/O polling) but lacks simple stuff like XOR, NOT or NEG. – Coagulase 29/11, 2023 at 8:59

Might want to edit those links into your answer where they're more likely to survive and be noticed. Or into the question alongside the OP's link; that's maybe a better fit for it. – Flash 29/11, 2023 at 9:2

Bitwise OR can also be replaced by the following operation

 x ^ y = (x + y) - 2*(x & y)

as explained in this stackoverflow answer for a similar question.

Here's a code example (not tested) for an implementation:

            org     &C030

            jp     start

start:      lp      B_REG           # point P to regB
            lidp    numX            # load x
            ldd                     #             A=x
            push
            exab                    # B=x         A=?
            lidl    <numY           # load y
            ldd                     # B=x         A=y
            anma                    # B=x&y       A=y
            ldm                     # B=x&y       A=x&y
            adm                     # B=2*x&y     A=x&y
            pop                     # B=2*x&y     A=x
            exab                    # B=x         A=2*x&y
            push
            ldd                     # B=x         A=y
            adm                     # B=x+y       A=y
            pop                     # B=x+y       A==2*x&y
            sbm                     # B=x+y-2*x&y A==2*x&y
            exab                    # B=2*x&y     A==x+y-2*x&y
            lidl    <numZ           # save result C
            std
end:        rtn

EDIT: if I use the preconditions and style of Peter Cordes' answer, my formula becomes following:

## Preconditions: one value (x) in A, other value (y) pointed-to by DP
## post-condition: A = XOR of the inputs
## clobbers: P and 2 bytes at scratch_space.  DP unmodified
xor_A_DP:
            lp    scratch_space
            mvmd             # [P] = [DP]
            adm              # [P] += A   # scratch_space[0] = x+y
            incp
            mvmd             # [P] = [DP]
            anma             # [P] &= A   # scratch_space[1] = x&y
            ldm              # A = [P]    # x&y
            decp   
            sbm              # [P] -= A   # (x+y) - (x&y)
            sbm              # [P] -= A   # (x+y) - 2*(x&y)
            ldm              # A = x^y

EDIT2: One difficulty with the Sharp ESR-H SC61860 µ-controller is that it is barely documented. Good references are rare and often incomplete or contain errors.
I use following sites as references:

A PC-1350 reference on the Pockemul page.
There is also a github page with the instruction references.

Nakashima answered 28/11, 2023 at 16:14 Comment(7)

Another expression like that is a ^ b = (a | b) - (a & b), maybe that's slightly shorter? – Selfevident 28/11, 2023 at 19:39

I posted an answer with much shorter code using 2 bytes of internal memory to avoid push/pop and exchanges. But I don't know this ISA, perhaps I got something wrong or it's mostly the choice of calling convention (A and [DP] for inputs) that saved me so many instructions. If you do know it, maybe you can take a look. – Flash 29/11, 2023 at 5:29

I checked your code. It seems correct but it is the precondition that makes the big difference of instructions, not the algorithm. I added a variant of my code using the pre-conditions of your 1st program and there's only 1 instruction more because of the by 2 multiplication of the carry adjust. – Coagulase 29/11, 2023 at 7:39

@harold yes, your formula is one instruction shorter as we don't need to multiply by 2 the a&b expression. – Coagulase 29/11, 2023 at 8:5

Ok, that's what I thought, that this formula could be done with just 1 more instruction. But couldn't we use the same strategy to handle both inputs being in static storage, with only a few extra instructions first: lidp / / ldd / lidl gets us to the state my function wants, A = x, DP pointing at y. Is it not normal to reserve 2 bytes of scratch space in internal memory for uses like this? – Flash 29/11, 2023 at 8:6

I just re-used the preconditions of the other answer. In praxis I would indeed also avoid working too much with external memory and use a style closer to what you used. – Coagulase 29/11, 2023 at 8:13

I see. The OP implied they only recently learned assembly on that machine, so demonstrating a good coding style would be a good way to go, vs. mirroring a beginner's clunky style. For a rare ISA where there aren't many good examples, it's maybe worth making the effort to show best practices in an answer about something else, unlike for x86 answers where it's not worth the effort and people who care can just look at compiler output for good idioms in most cases. (Food for thought if you have spare time for this or future answers. I already upvoted this.) – Flash 29/11, 2023 at 9:23

Slick! Thanks, that worked really well.

9 XOR 11 = 2

a XOR b = (NOT(a) AND b) OR (a AND NOT(b))
        = ((255-9) and 11) or (9 and (255-11))
        = (246 and 11) or (9 and 244)    
        = 2 or 0
        = 2

Here is the program using 8 bit registers. To use 2 registers together, like for ANDing, you have to point P to the second register. Then you can [P] and A -> P:

##############################################################################
## xor.asm
## perform a logical XOR operation on numA and numB and store to numC
## numC = numA XOR numB
##############################################################################

                $JR
                org     &C030

                jp     start

I_REG           equ     0               # Internal Registers
J_REG           equ     1
A_REG           equ     2
B_REG           equ     3
XL_REG          equ     4
XH_REG          equ     5
YL_REG          equ     6
YH_REG          equ     7
K_REG           equ     8
L_REG           equ     9
M_REG           equ     10
N_REG           equ     11
PORTC           equ     95

numA:           db      9
numB:           db      11
numC:           db      0

# EXAB          A <-> B 
# SBM           [P] - A -> [P] 
# ANMA          [P] and A -> [P] 
# ORMA          [P] or A -> [P]
# PUSH          push A onto the stack
# POP           pop A from the stack

start:          lp      B_REG           # point P to the B register
                lidp    numA            # (not a & b)
                ldd                     # A = numA
                lib     255
                sbm                     # B = 255 - numA
                lidp    numB            # A = numB
                ldd
                anma                    # B = 255-numA & numB 
                exab                    # swap A and B to store result on the stack 
                push                    # push A (result) to the stack

                lidp    numB            # (a & not b)
                ldd
                lib     255
                sbm                     # B = 255 - numB, B = not b
                lidp    numA
                ldd
                anma                    # B = numA & (255 - numB)
                pop                     # grab the first result
                orma                    # B = final result
                lidp    numC
                exab
                std                     # numC = numA XOR numB

end:            rtn

Portauprince answered 15/8, 2015 at 5:54 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags