Reading from memory in 8086 real mode while using 'ORG 0x0000'
Asked Answered
S

1

2

I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine. Here's the code:

ORG 0
BITS 16

push word 0xB800        ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds                  ; ds = cs
pop es                  ; es = 0xB800
jmp start

; input = di (position*2), ax (character and attributes)
putchar:
    stosw
    ret

; input = si (NUL-terminated string)
print:
    cli
    cld
    .nextChar:
        lodsb   ; mov al, [ds:si] ; si += 1
        test al, al
        jz .finish
        call putchar
        jmp .nextChar
    .finish:
        sti
        ret

start:
    mov ah, 0x0E
    mov di, 8

    ; should print P
    mov al, byte [msg]
    call putchar

    ; should print A
    mov al, byte [msg + 1]
    call putchar

    ; should print O
    mov al, byte [msg + 2]
    call putchar

    ; should print !
    mov al, byte [msg + 3]
    call putchar

    ; should print X
    mov al, 'X'
    call putchar

    ; should print Y
    mov al, 'Y'
    call putchar

    cli
    hlt

msg: db 'PAO!', 0

; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0

; Header
db 0x55
db 0xAA

The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.

It prints something like: VirtualBox

Siam answered 22/12, 2015 at 8:21 Comment(7)
You need to use ORG 0x7c00 for if you set DS to 0, which is what you're doing with push cs; pop dsPropinquity
That worked wonders, thank you! Can you explain why?Siam
With ORG 0 the assembler assumes that the resulting binary is loaded at offset 0 of some segment.. However the BIOS actually loads it at offset 0x7c00 of segment 0. This means with ORG 0 the assembler thinks msg is somewhere between offset 0x0000 and 0x0200 but really it's somewhere between offset 0x7c00 and 0x7e00. With ORG 0x7c00 the assembler now knows where its actually loaded so it can use the correct value for msg.Propinquity
ah, alright that cleared a lot up and actually answered another question I had.Siam
if Ross Ridge isn't willing to post his comment as an answer, you (DUUI) should post the answer and accept it. thanks!Dvina
@RossRidge : What happens when you hit an emulator or hardware that jumps to far address 0x07c0:0x0000 (which many environments still do). I'd really recommend setting your ORG and coding the value into DS that you want. So if you want to use [ORG 0x7c00] your bootloader can start with xor ax, ax mov ds, ax . If you want to use [ORG 0x0000] then start with mov ax, 0x07c0 mov ds, ax . The idea would be that you explicitly set DS based on what satisfies the ORG you use.Revitalize
I cover some of the issue in this the question in my General Bootloader Tips Stackoverflow Answer. Recently I wrote a section in another Stack Overflow Answer Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00 that may be useful for writing bootloaders that target a wider variety of real and emulated hardware.Revitalize
R
13

The ORG Directive

When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .

If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.

We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.

Example using ORG 0x0000:

BITS 16
ORG 0x0000

start:
    push cs
    pop  ds      ; DS=CS
    push 0xb800
    pop es       ; ES = 0xB800 (video memory)
    mov ah, 0x0E ; AH = Attribute (yellow on black)

    mov al, byte [msg]
    mov [es:0x00], ax   ; This should print letter 'P'
    mov al, byte [msg+1]
    mov [es:0x02], ax   ; This should print letter 'A'
    mov al, 'O'
    mov [es:0x04], ax   ; This should print letter 'O'
    mov al, '!'
    mov [es:0x06], ax   ; This should print letter '!'

    cli
    hlt

msg: db "PA"

; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55

If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.


VirtualBox / CS:IP / Segment:Offset Pairs

In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .

If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .

Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.


What is Going Wrong with Your Bootloader?

Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:

objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin

00000000 <.data>:
   0:   0e                      push   cs
   1:   1f                      pop    ds
   2:   68 00 b8                push   0xb800
   5:   07                      pop    es
   6:   b4 0e                   mov    ah,0xe
   8:   a0 24 00                mov    al,ds:0x24
   b:   26 a3 00 00             mov    es:0x0,ax
   f:   a0 25 00                mov    al,ds:0x25
  12:   26 a3 02 00             mov    es:0x2,ax
  16:   b0 4f                   mov    al,0x4f
  18:   26 a3 04 00             mov    es:0x4,ax
  1c:   b0 21                   mov    al,0x21
  1e:   26 a3 06 00             mov    es:0x6,ax
  22:   fa                      cli
  23:   f4                      hlt
  24:   50                      push   ax          ; Letter 'P'
  25:   41                      inc    cx          ; Letter 'A'
        ...
 1fe:   55                      push   bp
 1ff:   aa                      stos   BYTE PTR es:[di],al

Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).

If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:

   0:   0e                      push   cs
   1:   1f                      pop    ds

Since CS was zero, then DS is zero. Now let us look at this line:

   8:   a0 24 00                mov    al,ds:0x24

ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!

There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:

    push cs
    pop  ds      ; DS=CS

With:

    push 0x07c0
    pop  ds      ; DS=0x07c0 

This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:

   8:   a0 24 00                mov    al,ds:0x24

Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.

Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.


Why Do Immediate Values Print?

With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?

Let us examine the disassembly of the 3rd character O more carefully:

  16:   b0 4f                   mov    al,0x4f        ; 0x4f = 'O'

Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.


Ross Ridge's Suggestion and Why it Works in VirtualBox

Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?

Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:

objdump -Mintel -mi8086 -D -b binary  --adjust-vma=0x7c00 boot.bin

boot.bin:     file format binary   
Disassembly of section .data:

00007c00 <.data>:
    7c00:       0e                      push   cs
    7c01:       1f                      pop    ds
    7c02:       68 00 b8                push   0xb800
    7c05:       07                      pop    es
    7c06:       b4 0e                   mov    ah,0xe
    7c08:       a0 24 7c                mov    al,ds:0x7c24
    7c0b:       26 a3 00 00             mov    es:0x0,ax
    7c0f:       a0 25 7c                mov    al,ds:0x7c25
    7c12:       26 a3 02 00             mov    es:0x2,ax
    7c16:       b0 4f                   mov    al,0x4f
    7c18:       26 a3 04 00             mov    es:0x4,ax
    7c1c:       b0 21                   mov    al,0x21
    7c1e:       26 a3 06 00             mov    es:0x6,ax
    7c22:       fa                      cli
    7c23:       f4                      hlt
    7c24:       50                      push   ax          ; Letter 'P'
    7c25:       41                      inc    cx          ; Letter 'A'
        ...
    7dfe:       55                      push   bp
    7dff:       aa                      stos   BYTE PTR es:[di],al

VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:

    7c08:       a0 24 7c                mov    al,ds:0x7c24

Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.

Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:

    7c08:       a0 24 7c                mov    al,ds:0x7c24

DS=0x07c0 in this scenario. We now have something resembling mov al,0x07c0:0x7c24 when the program actually runs. Ut-oh, that looks bad. What does that translate to as a physical address? (0x07c0<<4)+0x7c24 = 0x0F824. That is somewhere above our bootloader and it will contain whatever happens to be there after the computer boots. Likely zeros, but it should be assumed to be garbage. Clearly not where our msg string was loaded!

So how do we resolve this? To amend what Ross Ridge suggested, and to heed the advice I previously gave about explicitly setting DS to the segment we really want (don't assume CS is correct and then blindly copy to DS) we should place 0x0000 into DS when our bootloader starts if we use ORG 0x7c00. So we can change this code:

ORG 0x7c00

start:
    push cs
    pop  ds      ; DS=CS

to:

ORG 0x7c00

start:
    xor ax, ax   ; ax=0x0000
    mov ds, ax   ; DS=0x0000

Here we don't rely on an untrusted value in CS. We simply set DS to the segment value that makes sense given the ORG we used. You could have pushed 0x0000 and popped it into DS as you have been doing. I am more accustomed to zeroing out a register and moving that to DS.

By taking this approach, it doesn't matter what value in CS might have been used to reach our bootloader, the code would still reference the appropriate memory location for our data.


Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00

In my General Bootloader Tips that I wrote in a previous StackOverflow answer, tip #1 is very important:

  • When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x07c00 and that the boot drive number is loaded into the DL register.

The BIOS could have FAR JMP'ed (or equivalent) to our code with jmp 0x07c0:0x0000, and some emulators and real hardware do it this way. Others use jmp 0x0000:0x7c00 like VirtualBox does.

We should account for this by setting DS explicitly to what we need, and set it to what makes sense for the value we use in our ORG directive.


Summary

Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.

Your code could be fixed to use either ORG 0x0000 as you originally had it, if we set DS appropriately to 0x07c0 as previously discussed. That could look like:

ORG 0
BITS 16

push word 0xB800        ; Address of text screen video memory in real mode for colored monitors
push 0x07c0
pop ds                  ; DS=0x07c0 since we use ORG 0x0000
pop es

Alternatively we could have used ORG 0x7c00 like this:

ORG 0x7c00
BITS 16

push word 0xB800        ; Address of text screen video memory in real mode for colored monitors
push 0x0000
pop ds                  ; DS=0x0000 since we use ORG 0x7c00
pop es
Revitalize answered 22/12, 2015 at 22:28 Comment(3)
@DUUI Your question gets asked indirectly in other questions, but yours was the first in the last few months that I had an opportunity to write a generic enough answer that would help you, and hopefully help someone with a similar (but maybe not duplicate) question. Often what will happen is that you move your code to another environment (something other than VirtualBox) and all of a sudden it doesn't work for inexplicable reasons. Felt an answer for that was needed.Revitalize
push with an immediate operand is an 186+ level instruction.Defeatist
@Defeatist that is very true. This is an old answer, but reviewing the question it is clear my code was based on the fact the OP was pushing an immediate in his own code.Revitalize

© 2022 - 2024 — McMap. All rights reserved.