GDB-remote + qemu reports unexpected memory address for static C variable
Asked Answered
W

2

5

Remote debugging a code running in Qemu with GDB, based on an os-dev tutorial.
My version is here. The problem only happens when remote-debugging code inside qemu, not when building a normal executable to run directly inside GDB under the normal OS.

Code looks something like this:

#define BUFSIZE 255
static char buf[BUFSIZE];

void foo() {
  // Making sure it's all zero.
  for (int i = 0; i < BUFSIZE; i++) buf[i] = 0;

  // Setting first char:
  buf[0] = 'a';

  // >> insert breakpoint right after setting the char <<

  // Prints 'a'.
  printf("%s", buf);
}

If I place a breakpoint at the marked spot and print the buffer with p buf I get random values from random places, seemingly from my code section. If I get the address by p &buf I get something that does not look correct, for two things:

  1. If I do a char* p_buf = buf and I check the address with p p_buf it gives me a totally different address, which is stable across executions (the other was not). Then I inspect that memory section with x /255b 0x____ and I can see the a and then zeros (97 0 0 0 ... 0).

  2. The next command (printf("%s", buf);) does actually prints a.

This leaves me believing it might be GDB not knowing the correct location if I only inspect the static variable.

Where should I start debugging this?


Details about the compile conditions:

  • Compile flags: -g -Wall -Wextra -pedantic -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
  • qemu-system-i386
  • Gcc: i386 elf target

Example output from GDB:

(gdb) p buf
$1 = "dfghjkl;'`\000\\zxcvbnm,./\000*\000 ", '\000' <repeats 198 times>...
(gdb) p p_buf
$2 = 0x40c0 <buf+224> "a"
(gdb) p &buf
$3 = (char (*)[255]) 0x3fe0 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x3fe0.

Update 2:

Disassembled a version of the code that shows the discrepancy:

; void foo
0x19f1 <foo>            push   %ebp
0x19f2 <foo+1>          mov    %esp,%ebp
0x19f4 <foo+3>          sub    $0x10,%esp

; char* p_buf = char_buf; --> `p &char_buf` is 0x4040 (incorrect) but `p p_buf` is 0x4100
0x19f7 <foo+6>          movl   $0x4100,-0x4(%ebp)

; void* p_p_buf = (void*)p_buf; --> `p p_p_buf` gives 0x4100
0x19fe <foo+13>         mov    -0x4(%ebp),%eax
0x1a01 <foo+16>         mov    %eax,-0x8(%ebp)

; void* p_char_buf = (void*)&char_buf; --> `p p_char_buf` gives 0x4100
0x1a04 <foo+19>         movl   $0x4100,-0xc(%ebp)

; char_buf[0] = 'a'; --> correct address
0x1a0b <foo+26>         movb   $0x61,0x4100

; char_buf[1] = 'b'; --> correct address (asking `p &char_buf` here is still incorrectly 0x4040)
0x1a12 <foo+33>         movb   $0x62,0x4101

; void foo return
0x1a19 <foo+40>         nop
0x1a1a <foo+41>         leave
0x1a1b <foo+42>         ret

My Makefile for building the project looks like:

C_SOURCES = $(wildcard kernel/*.c drivers/*.c)
C_HEADERS = $(wildcard kernel/*.h drivers/*.h)
OBJ = ${C_SOURCES:.c=.o kernel/interrupt_table.o}
CC = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gcc
# GDB = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gdb
GDB = /usr/bin/gdb
CFLAGS = -g -Wall -Wextra -ffreestanding -fno-exceptions -pedantic -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
QEMU = qemu-system-i386

os-image.bin: boot/boot.bin kernel.bin
    cat $^ > $@

kernel.bin: boot/kernel_entry.o ${OBJ}
    i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary

kernel.elf: boot/kernel_entry.o ${OBJ}
    i386-elf-ld -o $@ -Ttext 0x1000 $^

kernel.dis: kernel.bin
    ndisasm -b 32 $< > $@

run: os-image.bin
    ${QEMU} -drive format=raw,media=disk,file=$<,index=0,if=floppy

debug: os-image.bin kernel.elf
    ${QEMU} -s -S -drive format=raw,media=disk,file=$<,index=0,if=floppy &
    ${GDB} -ex "target remote localhost:1234" -ex "symbol-file kernel.elf" -ex "tui enable" -ex "layout split" -ex "focus cmd"

%.o: %.c ${C_HEADERS}
    ${CC} ${CFLAGS} -c $< -o $@

%.o: %.asm
    nasm $< -f elf -o $@

%.bin: %.asm
    nasm $< -f bin -o $@

build: os-image.bin
    echo Pass

clean:
    rm -rf *.bin *.o *.dis *.elf
    rm -rf kernel/*.o boot/*.bin boot/*.o
Wen answered 5/12, 2020 at 6:17 Comment(8)
That "Making sure it's all zero" is not needed, it's guaranteed by the C standard.Imputation
Please provide some minimal reproducible example in your question. Explain how you did compile your code. With a recent GCC use at least gcc -Wall -Wextra -gGarland
I wish I could. I'm following an OS tutorial (hence Qemu and remote GDB) - on local CPU everything works as expected. Hence my hunch on GDB. I'm aware this heavily limits how much help I can get.Wen
@Wen the OS tutorial is very important information and should be mentioned in the question along with the name of and a link to the tutorial. It is very likely that the tutorial itself is wrong.Danelledanete
Sure, what I'm following is github.com/cfenollosa/os-tutorial/blob/master/21-shell/drivers/… and my version is github.com/itarato/VamOS/blob/master/kernel/shell.c#L18 - and I'm definitely not expecting people to fix my code or do the homework I should do.Wen
Can you edit your question to show the output of disas /s foo ?Baeda
@MarkPlotnick added.Wen
Yesss, that was it - ld to ELF and objcopy to binary.Wen
P
3

This is an interesting problem. It comes down to the fact that the code generated by LD (linker) for the ELF executable kernel.elf is different from that of the code generated by LD for kernel.bin when using the --oformat binary option. While one would expect these to be the same, they are not.

More simply put these Makefile rules do not produce the same code as you might expect:

kernel.elf: boot/kernel_entry.o ${OBJ}
        i386-elf-ld -o $@ -Ttext 0x1000 $^

and

kernel.bin: boot/kernel_entry.o ${OBJ}
        i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary

It appears the difference is in how the linker is aligning the sections when used with and without --oformat binary. The ELF file (and the symbols used for debugging) are seen to be in one place while the binary file that is actually running in QEMU had code and data generated at different offsets.

I hadn't ever observed this issue because I use my own linker scripts and I always generate the binary file from the ELF executable with OBJCOPY rather than using LD to link twice. OBJCOPY can take an ELF executable and convert it to a binary file. The Makefile rules could be amended to look like:

kernel.bin: kernel.elf
        i386-elf-objcopy -O binary $^ $@

kernel.elf: boot/kernel_entry.o ${OBJ}
        i386-elf-ld -o $@ -Ttext 0x1000 $^

Doing it this way will ensure the binary file that is generated matches what was produced for the ELF executable.

Pragmatist answered 16/12, 2020 at 13:53 Comment(0)
Z
4

For me, this doesn't seem to happen:

Breakpoint 1, main () at test65.c:16
16    printf("%s", buf);
(gdb) p buf
$2 = "a", '\000' <repeats 253 times>

Where should I start debugging this?

It seems like there are two things that might go wrong:

1. GDB might be reading from wrong location

I'm not sure what could cause this, but it is easy enough to verify. Check what address p &buf gives you. Then compare it to what you get from p_buf and also to what info address buf shows you.

Note that due to address space layout randomization the address of static variables will change at the point when you start the process. So before run command the address could be e.g. 0x4040 and then change to 0x555555558040 once the code is running:

(gdb) info address buf
Symbol "buf" is static storage at address 0x4040.
(gdb) run
....
Breakpoint 1, main () at test65.c:16
16    printf("%s", buf);
(gdb) p &buf
$1 = (char (*)[255]) 0x555555558040 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x555555558040.

2. GDB is reading correct place, but data is not there yet

It sounds like a typical debugging problem caused by compiler optimizations. For example, the compiler might move the setting of buf[0] = a after the point where your breakpoint lands, though it must set it before printf() gets called. You could try compiling with -O0 to see if it changes anything.

You can also check the disassembly with disas command, to see what has executed up to that point:

(gdb) disas
Dump of assembler code for function main:
   0x000055555555517b <+50>:    movb   $0x61,0x2ebe(%rip)        # 0x555555558040 <buf>
=> 0x0000555555555182 <+57>:    lea    0x2eb7(%rip),%rsi        # 0x555555558040 <buf>
   0x0000555555555189 <+64>:    lea    0xe74(%rip),%rdi        # 0x555555556004
   0x0000555555555190 <+71>:    mov    $0x0,%eax
   0x0000555555555195 <+76>:    callq  0x555555555050 <printf@plt>

For me the breakpoint lands at the point right after movb sets 0x61 (letter a) to buf.

If you use stepi command until you are at callq printf instruction, you can be sure you see the buffer exactly like printf would see it.

Zed answered 5/12, 2020 at 6:35 Comment(2)
Added an example output to the question description showing the 2 different addresses. I've called them way after using both variables, comfirmed in the asm section their use has passed already.Wen
@jpa: -O0 is the GDB default, and the OP's gcc options didn't include any -O. It's worth mentioning anyway for future readers, or in case that was inaccurate, but an explicit -O0 won't help the OP. (The edit to add the asm confirms it's a -O0 build for consistent debugging.)Ammo
P
3

This is an interesting problem. It comes down to the fact that the code generated by LD (linker) for the ELF executable kernel.elf is different from that of the code generated by LD for kernel.bin when using the --oformat binary option. While one would expect these to be the same, they are not.

More simply put these Makefile rules do not produce the same code as you might expect:

kernel.elf: boot/kernel_entry.o ${OBJ}
        i386-elf-ld -o $@ -Ttext 0x1000 $^

and

kernel.bin: boot/kernel_entry.o ${OBJ}
        i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary

It appears the difference is in how the linker is aligning the sections when used with and without --oformat binary. The ELF file (and the symbols used for debugging) are seen to be in one place while the binary file that is actually running in QEMU had code and data generated at different offsets.

I hadn't ever observed this issue because I use my own linker scripts and I always generate the binary file from the ELF executable with OBJCOPY rather than using LD to link twice. OBJCOPY can take an ELF executable and convert it to a binary file. The Makefile rules could be amended to look like:

kernel.bin: kernel.elf
        i386-elf-objcopy -O binary $^ $@

kernel.elf: boot/kernel_entry.o ${OBJ}
        i386-elf-ld -o $@ -Ttext 0x1000 $^

Doing it this way will ensure the binary file that is generated matches what was produced for the ELF executable.

Pragmatist answered 16/12, 2020 at 13:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.