Why does my data section appear twice in the compiled binary? Ubuntu, x86, nasm, gdb, readelf
Asked Answered
S

1

6

A prior related question was answered. Thank you! However this creates a new question for me. Why does nasm put data bytes at two different memory locations? I include program information and other data dump below.

---------- code snippet compiled with nasm, ld -----------------
section .text
...
zero: jmp short two
one:  pop ebx
      xor eax, eax
      mov [ebx+12], eax
      mov [ebx+8], ebx
      mov [ebx+7], al
      lea ecx, [ebx+8]
      lea edx, [ebx+12]
      mov al, 11
      int 0x80
two:  call one
section .data align=1
msg:   db '/bin/sh0argvenvp' 

-------- readelf output to show load locations --------
readelf -Wl myshdb

Elf file type is EXEC (Executable file)
Entry point 0x8048080
There are 2 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x0009d 0x0009d R E 0x1000
  LOAD           0x00009d 0x0804909d 0x0804909d 0x00010 0x00010 RW  0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .text 
   01     .data 

-------------- run with gdb and debug step to mov instructions ----------
---------------registers--------------
EAX: 0x0 
EBX: 0x804809d ("/bin/sh0argvenvp")

----------- memory address checks ------------
gdb-peda$ p zero
$15 = {<text variable, no debug info>} 0x8048080 <zero>
gdb-peda$ p one
$16 = {<text variable, no debug info>} 0x8048082 <one>
gdb-peda$ p two
$17 = {<text variable, no debug info>} 0x8048098 <two>
gdb-peda$ p $ebx
$18 = 0x804809d
gdb-peda$ p msg
$19 = 0x6e69622f
gdb-peda$ x 0x804809d
0x804809d:  "/bin/sh0argvenvp"
gdb-peda$ x msg
0x6e69622f: <error: Cannot access memory at address 0x6e69622f>

In other words, the string message is available from a memory location directly after code (0x804809d). Yet msg label maps to 0x6e69622f, which is the label to my data. How can use gdb to see data at the second address? Is nasm putting the data at two different locations? Why?

Slice answered 13/7, 2020 at 22:40 Comment(4)
That's a side effect of the mappings. The same bytes from the file are mapped twice due to different protection settings. I believe newer toolchain version pads the section so this doesn't happen. Note that 0x6e69622f is not a label or an address, that's the actual string. That is due to data type mismatch. p/s msg will probably work.Mclendon
See also my answer here.Mclendon
@Jester, I will go study more about the type mismatch and p/s msg. I can see the data in the .text section (read only). I still hope to discover the address of the copy residing in the .data section (writeable). In practice, I designated .text as writeable when linking and am using the copy of data there.Slice
More recent ld's default linker script avoids this duplication, so the only non-zero bytes in executable pages are ones that need to be there, so .data initializers can't be part of ROP or Spectre gadgets. Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs? (ELF segments are padded to page boundaries.)Shy
E
10

Let's look at the LOAD segments:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x0009d 0x0009d R E 0x1000
LOAD 0x00009d 0x0804909d 0x0804909d 0x00010 0x00010 RW 0x1000

The first one instructs the loader to mmap 0x9d bytes from file offset 0 into virtual memory at address 0x08048000.

The loader can't do exactly that, because memory mapping only works at one page (4096 bytes) granularity. So it mmaps the .text, and everything that follows it in the file, up to one page, at address 0x08048000.

This means that whatever .data followed .text in the file after offset 0x9d will appear at address 0x0804809d and later, but with wrong permissions (Read and Execute).

The second LOAD segment instructs the loader to mmap file contents, starting at offset 0x9d at virtual address 0x0804909d.

The loader can't do exactly that either for the same "page granularity" reason.

Instead, it will round down the offset and the address, and mmap file contents starting from offset 0 at address 0x08049000.

That that means that whatever .text preceded .data in the file will appear at address before 0x0804909d, again with the wrong permissions (Read and Write this time).

You can confirm that that's what's happening by using GDB x/10i 0x8049080 -- you will see exactly the same instructions as with x/10i 0x8048080.

You can also observe the actual mmap system calls the loader performed with strace.

Emlynn answered 14/7, 2020 at 5:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.