Custom bootloader booted via USB drive produces incorrect output on some computers
Asked Answered
P

2

14

I am fairly new to assembly, but I'm trying to dive into the world of low level computing. I'm trying to learn how to write assembly code that would run as bootloader code; so independent of any other OS like Linux or Windows. After reading this page and a few other lists of x86 instruction sets, I came up with some assembly code that is supposed to print 10 A's on the screen and then 1 B.

      BITS 16
start: 
    mov ax, 07C0h       ; Set up 4K stack space after this bootloader
    add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
    mov ss, ax
    mov sp, 4096

    mov ax, 07C0h       ; Set data segment to where we're loaded
    mov ds, ax

    mov cl, 10          ; Use this register as our loop counter
    mov ah, 0Eh         ; This register holds our BIOS instruction

.repeat:
    mov al, 41h         ; Put ASCII 'A' into this register
    int 10h             ; Execute our BIOS print instruction
    cmp cl, 0           ; Find out if we've reached the end of our loop
    dec cl              ; Decrement our loop counter
    jnz .repeat         ; Jump back to the beginning of our loop
    jmp .done           ; Finish the program when our loop is done

.done:
    mov al, 42h         ; Put ASCII 'B' into this register
    int 10h             ; Execute BIOS print instruction
    ret


times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
dw 0xAA55

So the output should look like this:

AAAAAAAAAAB

I assembled the code using the nasm assembler running on the Windows 10 Ubuntu Bash program. After it produced the .bin file, I opened it using a hex editor. I used the same hex editor to copy the contents of that .bin file into the first 512 bytes of a flash drive. Once I had written my program to the flash drive, I disconnected it and plugged it into a computer with an Intel Core i3-7100. On bootup, I selected my USB flash drive as the boot device, only to get the following output:

A

After changing various things in the program, I finally got frustrated and tried the program on another computer. The other computer was a laptop with an i5-2520m. I followed the same process as I mentioned before. Sure enough, it gave me the expected output:

AAAAAAAAAAB

I immediately tried it on my original computer with the i3, but it still didn't work.

So my question is: Why does my program work with one x86 processor but not the other? They both support the x86 instruction set. What gives?


Solution:
Ok, I've been able to track down the real solution with some help. If you read Michael Petch's answer below, you'll find a solution that will fix my problem, and another problem of a BIOS looking for a BPB.

Here was the problem with my code: I was writing the program to the first bytes of my flash drive. Those bytes were loaded into memory, but some BIOS interrupts were using those bytes for itself. So my program was being overwritten by the BIOS. To prevent this, you can add a BPB description as shown below. If your BIOS works the same way mine does, it will simply overwrite the BPB in memory, but not your program. Alternatively, you can add the following code to the top of your program:

jmp start
resb 0x50

start: 
;enter code here

This code (courtesy of Ross Ridge) will push your program to memory location 0x50 (offset from 0x7c00) to prevent it from being overwritten by the BIOS during execution.

Also keep in mind that whenever you call any subroutine, the values of the registers you were using could be overwritten. Make sure you either use push, pop or save your values to memory before calling a subroutine. Look at Martin Rosenau's answer below to read more about that.

Thank you to all who replied to my question. I now have a better understanding of how this low-level stuff works.

Pownall answered 14/11, 2017 at 4:31 Comment(21)
It's not always just the code/instruction set but the hardware, how it boots, etc.Pretermit
Crashing with ret instead of using cli /hlt may possibly have prevented your BIOS from actually getting all the characters displayed. (I think I've read that int 10h can return before the characters are actually displayed. I might be totally wrong here, but ending with ret looks obviously wrong. Why would there be a return address on the stack?)Gautea
Try adding jmp start and resb 0x50 after the BITS 16 line and before the start: line.Photomap
As others indicated, x86 Assembly instructions, setting aside some assembler pseudo code, etc., is not necessarily the same across all hardware. You will also encounter this in x86-64 and ARMv7 based CPUs. They key is to read the documentation for the target hardware; especially regarding boot-up.Voiced
Although all the comments have been good ones, there is an issue that stands out to me the most that I think is the likely cause. I think @RossRidge is thinking same thing I am. If you are booting from USB with floppy drive emulation it is quite possible that your BIOS has overwritten part of your code with data it thinks should be in the BIOS Parameter block. You can add a fake BPB to the beginning of your boot sector so your code doesn't get arbitrarily clobbered by the BIOS.Quartersaw
In my other SO Answer: https://mcmap.net/q/190561/-int-13h-42h-doesn-39-t-load-anything-in-bochs I have a section Real Hardware / USB / Laptop Issues . It shows how you can create a fake BPB and you just add your code to the bottom. I'd be curious if this resolves your issue. My belief is that is what @RossRidge may have been getting at (his suggestion is to create an area that the BIOS may overwrite thinking it is a BPB)Quartersaw
Your update makes it sound even more likely that your BIOS is overwriting your code as described by Michael Petch in the previous two comments.Photomap
WHOAH. Sorry @MichaelPetch, I didn't see your comment before. I tried adding that snippet of code and it worked! I would like to do more research to see why that is...Pownall
USB emulation of drives is peculiar. The BIOS after reading your boot sector (and before transferring control to it) may update the beginning of the boot sector with drive geometry info (maximum number of heads, sectors etc). Simple BIOSes may just assume the BIOS Parameter Block always exists and blindly update the bytes. In your case those bytes happen to be your actual code. Other BIOSes may attempt to detect the presence of a BPB and only write to that area if it thinks the BPB exists. More often than not on modern hardware BIOSes will simply assume it exists and blindly write.Quartersaw
Did that answer below actually solve your problem? or Just shift it? I'd be hard pressed believing the answer accepted fixed it properly. Maybe I am just being arrogant, or skeptical (or both)Quartersaw
More on The BIOS Parameter Block can be found here: en.wikipedia.org/wiki/BIOS_parameter_blockQuartersaw
Here's what I would do. forget you saw Martin's answer and take your original code as is and append it to my header (with a fake BPB) and see if it works. I should point out that replacing ret with the endless loop should be done to avoid problems, I doubt it caused the issue you saw. The push cx/pop cx likely fixes nothing and was a guess by Martin. Although setting the stack is a good idea, the BIOS sets one up that is designed to be big enough for most BIOS call needs. Usually you need to set your own stack when you start writing to memory outside the 512 bytes of the bootloader.Quartersaw
Michael, Ross, and Martin, all of your responses have been very helpful. After playing around some more, I did realize that Michael's fix worked, but so did Ross's. So one thing's for sure: my code was being overwritten while it was in memory. In my case, I don't think the BIOS was looking for a BPB, but including one allowed for some space to be allocated that wasn't my code. Instead of overwriting my program, the BIOS was overwriting the fake BPB. I think martin's points are valid, which is why I accepted it. Should I just upvote it instead? I just want to give credit where it's due.Pownall
@MichaelPetch, I can't notify multiple people, so please look at my most recent comment.Pownall
@RossRidge I can't notify multiple people, so please look at my most recent comment.Pownall
@MartinRosenau I can't notify multiple people, so please look at my most recent comment.Pownall
@RossRidge 's answer and mine are related (I suggested that Ross and I were thinking the same thing earlier). I just filled in the bootloader BPB as a data structure with real values. Ross suggested just creating an area of memory the size of a BPB and jumping over it. We both were assuming the BIOS was overwriting the area at the beginning of your bootloader. We were just suggesting ways to prevent it.Quartersaw
@MichaelPetch I totally agree. Thank you for your answer! Is there some way I can help your reputation? You didn't exactly post your answer as a SO answer... I apologize for my lack of understanding of SO. I didn't really feel the need to create an account and post anything until now, so I'm still learning the platform. Usually I can find a similar question to mine.Pownall
@MichaelPetch Aww man, I didn't see that. I'm as blind as a bat. I'm sorry, I feel like a jerk. I will accept your answer.Pownall
I didn't post the answer until I was sure that the issue was what Ross and I thought it might be, so you may have missed it. No problem.Quartersaw
I've updated my answer with some new test code that may be able to tell you what bytes your BIOS actually changed. Might be interesting to see which ones (although usually it will at least be the ones involving drive geometry)Quartersaw
Q
14

This could probably be made into a canonical answer on this subject.

Real Hardware / USB / Laptop Issues

If you are attempting to use USB to boot on real hardware then you may encounter another issue even if you get it working in BOCHS and QEMU. If your BIOS is set to do USB FDD emulation (and not USB HDD or something else) you may need to add a BIOS Parameter Block(BPB) to the beginning of your bootloader. You can create a fake one like this:

org 0x7c00
bits 16

boot:
    jmp main
    TIMES 3-($-$$) DB 0x90   ; Support 2 or 3 byte encoded JMPs before BPB.

    ; Dos 4.0 EBPB 1.44MB floppy
    OEMname:           db    "mkfs.fat"  ; mkfs.fat is what OEMname mkdosfs uses
    bytesPerSector:    dw    512
    sectPerCluster:    db    1
    reservedSectors:   dw    1
    numFAT:            db    2
    numRootDirEntries: dw    224
    numSectors:        dw    2880
    mediaType:         db    0xf0
    numFATsectors:     dw    9
    sectorsPerTrack:   dw    18
    numHeads:          dw    2
    numHiddenSectors:  dd    0
    numSectorsHuge:    dd    0
    driveNum:          db    0
    reserved:          db    0
    signature:         db    0x29
    volumeID:          dd    0x2d7e5a1a
    volumeLabel:       db    "NO NAME    "
    fileSysType:       db    "FAT12   "

main:
    [insert your code here]

Adjust the ORG directive to what you need or omit it if you just need the default 0x0000.

If you were to modify your code to have the layout above the Unix/Linux file command may be able to dump out the BPB data that it thinks makes up your VBR in the disk image. Run the command file disk.img and you may get this output:

disk.img: DOS/MBR boot sector, code offset 0x3c+2, OEM-ID "mkfs.fat", root entries 224, sectors 2880 (volumes <=32 MB) , sectors/FAT 9, sectors/track 18, serial number 0x2d7e5a1a, unlabeled, FAT (12 bit)


How the Code in this Question Could be Modified

In the case of this OPs original code it could have been modified to look like this:

bits 16

boot:
    jmp main
    TIMES 3-($-$$) DB 0x90   ; Support 2 or 3 byte encoded JMPs before BPB.

    ; Dos 4.0 EBPB 1.44MB floppy
    OEMname:           db    "mkfs.fat"  ; mkfs.fat is what OEMname mkdosfs uses
    bytesPerSector:    dw    512
    sectPerCluster:    db    1
    reservedSectors:   dw    1
    numFAT:            db    2
    numRootDirEntries: dw    224
    numSectors:        dw    2880
    mediaType:         db    0xf0
    numFATsectors:     dw    9
    sectorsPerTrack:   dw    18
    numHeads:          dw    2
    numHiddenSectors:  dd    0
    numSectorsHuge:    dd    0
    driveNum:          db    0
    reserved:          db    0
    signature:         db    0x29
    volumeID:          dd    0x2d7e5a1a
    volumeLabel:       db    "NO NAME    "
    fileSysType:       db    "FAT12   "

main:
    mov ax, 07C0h       ; Set up 4K stack space after this bootloader
    add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
    mov ss, ax
    mov sp, 4096

    mov ax, 07C0h       ; Set data segment to where we're loaded
    mov ds, ax

    mov cl, 10          ; Use this register as our loop counter
    mov ah, 0Eh         ; This register holds our BIOS instruction

.repeat:
    mov al, 41h         ; Put ASCII 'A' into this register
    int 10h             ; Execute our BIOS print instruction
    cmp cl, 0           ; Find out if we've reached the end of our loop
    dec cl              ; Decrement our loop counter
    jnz .repeat         ; Jump back to the beginning of our loop
    jmp .done           ; Finish the program when our loop is done

.done:
    mov al, 42h         ; Put ASCII 'B' into this register
    int 10h             ; Execute BIOS print instruction
    ret

times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
dw 0xAA55

Other Suggestions

As has been pointed out - you can't ret to end a bootloader. You can put it into an infinite loop or halt the processor with cli followed by hlt.

If you ever allocate a large amount of data on the stack or start writing to data outside the 512 bytes of your bootloader you should set your own stack pointer (SS:SP) to a region of memory that won't interfere with your own code. The original code in this question does setup a stack pointer. This is a general observation for anyone else reading this Q/A. I have more information on that in my Stackoverflow answer that contains General Bootloader Tips.


Test Code to See if Your BIOS is Overwriting the BPB

If you want to know if the BIOS might be overwriting data in the BPB and to determine what values it wrote you could use this bootloader code to dump the BPB as the bootloader sees it after control is transferred to it. Under normal circumstances the first 3 bytes should be EB 3C 90 followed by a series of AA. Any value that isn't AA was likely overwritten by the BIOS. This code is in NASM and can be assembled into a bootloader with nasm -f bin boot.asm -o boot.bin

; Simple bootloader that dumps the bytes in the BIOS Parameter
; Block BPB. First 3 bytes should be EB 3C 90. The rest should be 0xAA
; unless you have a BIOS that wrote drive geometry information
; into what it thinks is a BPB.

; Macro to print a character out with char in BX
%macro print_char 1
    mov al, %1
    call bios_print_char
%endmacro

org 0x7c00
bits 16

boot:
    jmp main
    TIMES 3-($-$$) DB 0x90   ; Support 2 or 3 byte encoded JMPs before BPB.

    ; Fake BPB filed with 0xAA
    TIMES 59 DB 0xAA

main:
    xor ax, ax
    mov ds, ax
    mov ss, ax              ; Set stack just below bootloader at 0x0000:0x7c00
    mov sp, boot
    cld                     ; Forward direction for string instructions

    mov si, sp              ; Print bytes from start of bootloader
    mov cx, main-boot       ; Number of bytes in BPB
    mov dx, 8               ; Initialize column counter to 8
                            ;     So first iteration prints address
.tblloop:
    cmp dx, 8               ; Every 8 hex value print CRLF/address/Colon/Space
    jne .procbyte
    print_char 0x0d         ; Print CRLF
    print_char 0x0a
    mov ax, si              ; Print current address
    call print_word_hex
    print_char ':'          ; Print ': '
    print_char ' '
    xor dx, dx              ; Reset column counter to 0
.procbyte:
    lodsb                   ; Get byte to print in AL
    call print_byte_hex     ; Print the byte (in BL) in HEX
    print_char ' '
    inc dx                  ; Increment the column count
    dec cx                  ; Decrement number of  bytes to process
    jnz .tblloop

    cli                     ; Halt processor indefinitely
.end:
    hlt
    jmp .end

; Print the character passed in AL
bios_print_char:
    push bx
    xor bx, bx              ; Attribute=0/Current Video Page=0
    mov ah, 0x0e
    int 0x10                ; Display character
    pop bx
    ret

; Print the 16-bit value in AX as HEX
print_word_hex:
    xchg al, ah             ; Print the high byte first
    call print_byte_hex
    xchg al, ah             ; Print the low byte second
    call print_byte_hex
    ret

; Print lower 8 bits of AL as HEX
print_byte_hex:
    push bx
    push cx
    push ax

    lea bx, [.table]        ; Get translation table address

    ; Translate each nibble to its ASCII equivalent
    mov ah, al              ; Make copy of byte to print
    and al, 0x0f            ;     Isolate lower nibble in AL
    mov cl, 4
    shr ah, cl              ; Isolate the upper nibble in AH
    xlat                    ; Translate lower nibble to ASCII
    xchg ah, al
    xlat                    ; Translate upper nibble to ASCII

    xor bx, bx              ; Attribute=0/Current Video Page=0
    mov ch, ah              ; Make copy of lower nibble
    mov ah, 0x0e
    int 0x10                ; Print the high nibble
    mov al, ch
    int 0x10                ; Print the low nibble

    pop ax
    pop cx
    pop bx
    ret
.table: db "0123456789ABCDEF", 0

; boot signature
TIMES 510-($-$$) db 0
dw 0xAA55

Output should look like this for any BIOS that didn't update the BPB before transferring control to the bootloader code:

7C00: EB 3C 90 AA AA AA AA AA
7C08: AA AA AA AA AA AA AA AA
7C10: AA AA AA AA AA AA AA AA
7C18: AA AA AA AA AA AA AA AA
7C20: AA AA AA AA AA AA AA AA
7C28: AA AA AA AA AA AA AA AA
7C30: AA AA AA AA AA AA AA AA
7C38: AA AA AA AA AA AA
Quartersaw answered 16/11, 2017 at 1:31 Comment(4)
If one will be implementing the ext2 file system, the BIOS Parameter Block(BPB) is not required, correct? If this is correct, then how do you prevent the BIOS from expecting a BPB?Javed
@DanHoynoski : If booting from USB using FDD emulation you will still need to allocate space for a BPB (because the BIOS will overwrite part of that area) but it could be all zeroes as EXT2 doesn't rely on the data in the BPB where as FAT does.Quartersaw
@MichaelPetch in .tblloop you mov the si to bx, this is wrong. print_word_hex uses the ax register.Estrada
@Estrada : in the time between your first comment a few mins ago that you removed and your new comment I had actually fixed the issue and updated the answer ;-). The original version of the function I used passed the parameter in BX and at some point I standardized it with AX but didn't update the code in the question. Thanks for identifying a problem.Quartersaw
S
6

Assembly code only works on one of my two x86 processors

It is not the processors but the BIOSes:

The int instruction actually is a special variant of the call instruction. The instruction calls some sub-routine (typically written in assembler).

(You can even replace that sub-routine by your own one - which is actually done by MS-DOS, for example.)

On two computers you have two different BIOS versions (or even vendors) which means that the sub-routine called by the int 10h instruction has been written by different programmers and therefore does not exactly do the same.

only to get the following output

The problem I suspect here is that the sub-routine called by int 10h on the first computer does not save the register values while the routine on the second computer does.

In other words:

On the first computer the routine called by int 10h may look like this:

...
mov cl, 5
mov ah, 6
...

... so after the int 10h call the ah register does no longer contain the value 0Eh and it may even be the case that the cl register is modified (which will end in an endless loop then).

To avoid the problem you could save the cl register using push (you have to save the entire cx register) and restore it after the int instruction. You also have to set the value of the ah register before each call of the int 10h sub-routine because you cannot be sure that it has not modified since then:

push cx
mov ah, 0Eh
int 10h
pop cx

mov sp, ... ... ret

Please think about Peter Cordes' comment:

How does the ret instruction work and how is it related to the sp and ss registers?

The ret instruction here will definitely not do what you expect!

On floppy disks the boot sectors typically contain the following code instead:

mov ax, 0  ; (may be written as "xor ax, ax")
int 16h
int 19h

int 19h does exactly what you expect from the ret instruction.

However the BIOS will boot the computer again which means that it will load the code from your USB stick and execute it again.

You'll get the following result:

AAAAABAAAAABAAAAABAAAAAB...

Therefore the int 16h instruction is inserted. This will wait for the user to press a key on the keyboard when the ax register has the value 0 before calling the int 16h sub-routine.

Alternatively you can simply add an endless loop:

.endlessLoop:
    jmp .endlessLoop

mov ss, ...

When an interrupt occurs between these two instructions:

mov ss, ax
    ; <--- Here
mov sp, 4096

... the combination of the sp and ss registers does not represent a "valid" representation of values.

If you are unlucky the interrupt will write data somewhere to memory where you don't want it. It may even overwrite your program!

Therefore you typically lock interrupts when modifying the ss register:

cli          ; Forbid interrupts
mov ss, ax
mov sp, 4096
sti          ; Allow interrupts again
Shumway answered 14/11, 2017 at 7:0 Comment(8)
Disabling interrupts while setting up the machine might be a good idea. But unless you have an original 8086, the CPU will not allow an interrupt to slip in between MOV SS and MOV SP. Already my old 286 manual has wording for that.Recommend
To be more specific, mov ss, ... disables interrupts until after the next instruction, so you only need cli / sti if that instruction isn't mov sp, ... or you're on buggy hardware that doesn't implement that correctly (some did exist, I think).Gautea
Where do you recommend disabling the interrupts? I noticed that I have mov ss, ax and mov sp, 4096 at the top of my program. Should I put that somewhere else? I also noticed that when I put xor cx, cx right before mov cl, 10, cl seems to just stay as 0. Why is that do you think?Pownall
@PeterCordes : I think you are probably referring to this bug on some 8088 CPUs . I've spoken of it before.Quartersaw
@MichaelPetch: Yes, thanks, that's what I remembered reading. I'd forgotten it was only on 8088. These days it makes sense to just omit the cli/sti and not worry about those old 8088 chips, I'd guess. There are some real 8086 chips still being made as microcontrollers, but not buggy 8088 I hope.Gautea
@PeterCordes : Back in the old day (80s early 90s) bootloaders pretty much didn't take any chances so once the bug was identified even IBM-DOS and MS-DOS bootloaders would use CLI/STI. But almost 30 years later the chances of running into an 8088 (and specifically a defective one) is probably pretty low.Quartersaw
@AMFTech: The hardware already makes it safe to set SS and then SP the way you do, so you don't need cli / sti there. You can put cli / hlt at the end to put the CPU in a low-power state with interrupts disabled.Gautea
@AMFTech: if xor cx,cx / mov cl,10 doesn't produce cl=10, you're using your debugger or some other tools incorrectly >.< Are you sure you actually stepped past the mov to cl, instead of stopping before it executed? Are you sure you rebuild your source after editing? The only way it could not work is if an interrupt handler zeroed cx / cl after the mov instruction. XOR-zero the whole register and then write the low part is a well known way to zero-extend.Gautea

© 2022 - 2024 — McMap. All rights reserved.