Game Boy: Half-carry flag and 16-bit instructions (especially opcode 0xE8)

Asked 16/9, 2019 at 14:8 Answered 17/9, 2019 at 20:52

Solved assembly embedded emulation cpu-architecture gameboy

Like so many others, I am writing a Game Boy emulator and I have a couple of questions regarding the instruction 0xE8 (ADD SP, n with an 8-bit immediate).

It is claimed here that in 16-bit instructions the half-carry flag is set if a carry occurs from bit 7 to bit 8, whereas here it is said that the half-carry flag indicates carry from bit 11 to bit 12. In this Reddit thread there seems to be a bit of confusion regarding the issue, and the (notoriously flawed, I hear) Game Boy CPU manual doesn't seem to have anything useful to say either.

My questions are the following:

How does the half-carry flag behave in opcode 0xE8?
How is the opcode 0xE8 implemented in the physical hardware?
Which is right, that half-carry occurs from bit 7 to bit 8 or that half-carry occurs from bit 11 to bit 12 (in the case of 16-bit instructions)?

Memorialist answered 16/9, 2019 at 14:8 Comment(16)

While I don't know the answer to this, perhaps Blargg's CPU test ROM could be of use to you. They are pretty thorough. – Passageway 16/9, 2019 at 14:45

@Passageway Thanks for commenting. I am aware of said ROM and will happily use it once my emulator is in the required shape (it is not a coincidence that the question has to do with an operation starting with the letter "A" ;)). – Memorialist 16/9, 2019 at 15:21

I guess it makes sense to have an 8-bit immediate operand in ADD SP,n. Makes the instruction smaller (saves you a byte), faster (saves you a fetch), while sufficient for its typical use case (push/pop a stack frame). Why are you interested in its effect on the half-carry flag? I would expect that flag to be relevant only for DAA. – Propaedeutic 16/9, 2019 at 15:42

@RuudHelderman It does make a lot of sense in the hardware, just not in the light of the documentation we have. And yes, as far as I know the half-carry flag is only there for the DAA instruction. However, I wish to make my emulator as accurate as possible (within reason, obviously), so I would very much like to implement this instruction the way it actually works in hardware. – Memorialist 16/9, 2019 at 15:47

was 0xE8 a custom instruction that was not part of the documented instruction set? If not what does the documentation say for that instruction? – Nonalignment 16/9, 2019 at 17:15

@Nonalignment That depends on what you mean. As far as I can tell, it's not in the Z80 so it would seem to be "custom" in that sense. The Game Boy CPU manual only says that the H and C flags are "set or reset according to operation" and I can't seem to find anything better in any of the other formal pieces of documentation. – Memorialist 16/9, 2019 at 17:34

The 7-to-8 claim is not backed by references. The 11-to-12 claims are backed by some very good references, but unfortunately, these all apply to the original Zilog Z-80 implementation. Gameboy's CPU adopted (part of) the same instruction set, but that doesn't guarantee it has the same 4-bit ALU. Sharp may well have taken the liberty to wire the half-carry flag in a totally different way, especially when it concerns instructions where half-carry is deemed irrelevant. Only a real Gameboy can tell. – Propaedeutic 16/9, 2019 at 17:55

@ihonen yeah that's what I meant was it added by [I cant remember the chip vendor]. – Nonalignment 16/9, 2019 at 19:52

maybe you have to determine it experimentally. – Nonalignment 16/9, 2019 at 19:53

@RuudHelderman: DAA might be the only instruction that reads the half-carry flag, but lots of instructions can write it. (Assuming it's anything like x86's AF nibble-carry flag). Anyway yes, add reg16, sign_extended_imm8 is obviously good for code density and not weird at all. Most ISAs with registers wider than 8-bit have immediates narrower than operand-size, and only a few (like x86) have encodings with full-width immediates (like add eax, imm32 as an option vs. add r/m32, imm8). IMO the question would be better without that sidetrack. – Willowwillowy 17/9, 2019 at 2:15

Typically you don't check the flags after moving the stack pointer. For this reason I even doubt that the behavior of C and H flags after the E8 operation was well specified - which would mean that different versions of the CPU might even set the two flags differently. I doubt that any existing software is using one of these two flags after the E8 operation. So it would not be a problem if your emulator does not emulate the flags of an E8 operation. – Signora 17/9, 2019 at 5:43

@MartinRosenau I'm aware of that, it's just that this issue bothers me and I'd like to know the technically correct answer. – Memorialist 17/9, 2019 at 6:40

@PeterCordes As I pointed out, the instruction is weird in terms of the documentation we have since the semantics regarding the half-carry flag is very non-obvious, not necessarily in terms of having operands of different sizes. – Memorialist 17/9, 2019 at 6:45

Yes, that's why I upvoted your question. You're agreeing with me that the tangent about the 8-bit immediate is unrelated to your question, so I'll remove it for you. – Willowwillowy 17/9, 2019 at 6:58

github.com/mamedev/mame/blob/… – Gaffer 17/9, 2019 at 7:40

Just for future reference, this site might be a better fit for these kind of questions: retrocomputing.stackexchange.com. – V2 17/9, 2019 at 9:33

The SM83 CPU core used in Game Boy almost certainly has a 8-bit ALU, which means 16-bit ALU operations are actually composed of two 8-bit operations. Like a normal Z80 CPU, it also has a dedicated 16-bit increment/decrement/load unit, which can handle certain 16-bit operations quickly but can't update the flags. Basically:

if flags are updated, a 16-bit operation definitely involves the ALU, so it actually uses two 8-bit ALU operations under the hood
if flags are not updated, and the 16-bit operation is just +1 / -1 / load, it's done with the 16-bit incrementer unit

So, whenever you're dealing with flags, try to think in terms of 8-bit operations (low byte first, then the high byte) if you want to reason about the operation.

How does the half-carry flag behave in opcode 0xE8?

As pointed out in the other answer, H is set when there's a carry from bit 3. (And C is set when there's a carry from bit 7).

Here's an interesting thought exercise: if SP=$FFFF and you execute ADD SP, -1, you get SP=$FFFE and both H and C are set. Can you understand why?

Due to how signed numbers work, the low byte operation is in this case basically just a normal addition. -1 = $FF, so it's calculating $FF+ $FF.

Hint above ↑

How is the opcode 0xE8 implemented in the physical hardware?

We don't yet have a full understanding of it at the lowest possible level, but I know that there are two 8-bit operations. With my Game Boy testbench system I've confirmed that there's first an ALU operation that updates the flags (H, C) but not SP, then some other operation, and finally SP is updated atomically in one go. This suggests that ADD SP, e might actually calculate the result into some temporary register (for example, a real Z80 has an invisible WZ temporary register for some ALU operations) in two separate 8-bit operations, and then load SP from it.

I think ADD HL, BC is a bit more interesting example...with my testbench I've confirmed that it updates L first and then H, and flags are updated twice. This means that it literally executes something like

ADD L, C
ADC H, B

The latter 8-bit operation updates the flags, so we never see the resulting flags of ADD L, C. But the half-carry flag might be temporarily set if there's a carry from L bit 3!

Which is right, that half-carry occurs from bit 7 to bit 8 or that half-carry occurs from bit 11 to bit 12 (in the case of 16-bit instructions)?

It depends on the instruction, but the flags are always updated based on the same bit positions if you think in terms of 8-bit values...it just varies whether we're talking about the high or low byte of the 16-bit value. Bit 11 is just bit 3 of the high byte.

ADD SP, e: H from bit 3, C from bit 7 (flags from low byte op)
LD HL, SP+e: H from bit 3, C from bit 7 (flags from low byte op)
ADD HL, rr: H from bit 11, C from bit 15 (flags from high byte op)
INC rr: no flag updates (executed by the 16-bit inc/dec unit)
DEC rr: no flag updates (executed by the 16-bit inc/dec unit)

Eruptive answered 17/9, 2019 at 20:52 Comment(3)

"Game Boy almost certainly has a 8-bit ALU" It has a 4-bit ALU – Millenarianism 24/9, 2019 at 1:7

@Millenarianism Please supply any source to confirm this claim; I could only find some related to the Zilog Z80, none to the Sharp LR35902. Binary compatibility does not necessarily imply identical architecture; keep in mind that the trade-off between transistors and microcode gradually changed between 1976 and 1989. – Propaedeutic 1/10, 2019 at 14:48

Oh man, it's been a long time... but I used to study the reverse engineering of the z80 and the dmg CPU. I will look for the actual reference, but for now please consider my comment BS (or personal revelation) – Millenarianism 11/10, 2019 at 16:6

TL;DR: For ADD SP,n, the H-flag is set when carry occurs from bit 3 to bit 4.

I decided to test this on real hardware, so I wrote a simple test ROM in GB-Z80 assembly that tests the following scenarios:

[SP = $000F]
ADD SP,$01

[SP = $00F0]
ADD SP,$10

[SP = $0FF0]
ADD SP,$10

For each case I store the value of register F after the ADD in memory, and I later display bit 5 (the H-flag) of each of those bytes on the screen.

I ran this on 3 different models (Gameboy Pocket, Gameboy Color, and Gameboy Advance SP), and got the following output on all 3 devices: 1 0 0. So a carry from bit 3->4 caused H to be set, while a carry from 7->8 or 11->12 did not.

For ADD HL,rr (where rr is BC/DE/HL/SP) it appears to be a different story. Based on my testing, H is set if carry occurs from bit 11 to bit 12.

Passageway answered 17/9, 2019 at 16:36 Comment(1)

As a former professional GBC/DMG developer, I am impressed and horrified that you were able to answer this :) But yes, bit's 3 to 4 are the answer as far as I remember as well, though I cannot remember ever using it. – Bohun 17/9, 2019 at 16:44

if flags are updated, a 16-bit operation definitely involves the ALU, so it actually uses two 8-bit ALU operations under the hood
if flags are not updated, and the 16-bit operation is just +1 / -1 / load, it's done with the 16-bit incrementer unit