How do I add contents of text file as a section in an ELF file?
Asked Answered
F

1

20

I have a NASM assembly file that I am assembling and linking (on Intel-64 Linux).

There is a text file, and I want the contents of the text file to appear in the resulting binary (as a string, basically). The binary is an ELF executable.

My plan is to create a new readonly data section in the ELF file (equivalent to the conventional .rodata section).

Ideally, there would be a tool to add a file verbatim as a new section in an elf file, or a linker option to include a file verbatim.

Is this possible?

Fasciate answered 14/2, 2017 at 20:2 Comment(0)
V
53

This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program.

OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name.

Assume we have an input file called myfile.txt with the contents:

the
quick
brown
fox
jumps
over
the
lazy
dog

Something like this would be a starting point:

objcopy --input binary \
    --output elf64-x86-64 \
    --binary-architecture i386:x86-64 \
    myfile.txt myfile.o

If you wanted to generate 32-bit objects you could use:

objcopy --input binary \
    --output elf32-i386 \
    --binary-architecture i386 \
    myfile.txt myfile.o

The output would be an object file called myfile.o . If we were to review the headers of the object file using OBJDUMP and a command like objdump -x myfile.o we would see something like this:

myfile.o:     file format elf64-x86-64
myfile.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         0000002c  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 g       .data  0000000000000000 _binary_myfile_txt_start
000000000000002c g       .data  0000000000000000 _binary_myfile_txt_end
000000000000002c g       *ABS*  0000000000000000 _binary_myfile_txt_size

By default it creates a .data section with contents of the file and it creates a number of symbols that can be used to reference the data.

_binary_myfile_txt_start
_binary_myfile_txt_end
_binary_myfile_txt_size

This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file myfile.txt. OBJCOPY will base the symbols on the input file name. myfile.txt is mangled into myfile_txt and used to create the symbols.

One problem is that a .data section is created which is read/write/data as seen here:

Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         0000002c  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA

You specifically are requesting a .rodata section that would also have the READONLY flag specified. You can use the --rename-section option to change .data to .rodata and specify the needed flags. You could add this to the command line:

--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA

Of course if you want to call the section something other than .rodata with the same flags as a read only section you can change .rodata in the line above to the name you want to use for the section.

The final version of the command that should generate the type of object you want is:

objcopy --input binary \
    --output elf64-x86-64 \
    --binary-architecture i386:x86-64 \
    --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \
    myfile.txt myfile.o

Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki:

A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss.

The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced)

Keeping this in mind we could create this C file called main.c:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

/* These are external references to the symbols created by OBJCOPY */
extern char _binary_myfile_txt_start[];
extern char _binary_myfile_txt_end[];
extern char _binary_myfile_txt_size[];

int main()
{
    char *data_start     = _binary_myfile_txt_start;
    char *data_end       = _binary_myfile_txt_end;
    size_t data_size  = (size_t)_binary_myfile_txt_size;

    /* Print out the pointers and size */
    printf ("data_start %p\n", data_start);
    printf ("data_end   %p\n", data_end);
    printf ("data_size  %zu\n", data_size);

    /* Print out each byte until we reach the end */
    while (data_start < data_end)
        printf ("%c", *data_start++);

    return 0;
}

You can compile and link with:

gcc -O3 main.c myfile.o

The output should look something like:

data_start 0x4006a2
data_end   0x4006ce
data_size  44
the
quick
brown
fox
jumps
over
the
lazy
dog

A NASM example of usage is similar in nature to the C code. The following assembly program called nmain.asm writes the same string to standard output using Linux x86-64 System Calls:

bits 64
global _start

extern _binary_myfile_txt_start
extern _binary_myfile_txt_end
extern _binary_myfile_txt_size

section .text

_start:
    mov eax, 1                        ; SYS_Write system call
    mov edi, eax                      ; Standard output FD = 1
    mov rsi, _binary_myfile_txt_start ; Address to start of string
    mov rdx, _binary_myfile_txt_size  ; Length of string
    syscall

    xor edi, edi                      ; Return value = 0
    mov eax, 60                       ; SYS_Exit system call
    syscall

This can be assembled and linked with:

nasm -f elf64 -o nmain.o nmain.asm
gcc -m64 -nostdlib nmain.o myfile.o

The output should appear as:

the
quick
brown
fox
jumps
over
the
lazy
dog
Varices answered 14/2, 2017 at 23:51 Comment(4)
Very educational and a "gem" to store for that rainy day!Gonzalo
I fail to understand why the _size parameter comes out so huge, following this method my original .dat file is 973 bytes, the objcopy o file is 1584 bytes, and the (size_t)_binary_myfile_txt_size is 94570554139597 :\ (_end - _start) is the correct size of 973. What am i misunderstanding?Blooming
@Blooming : Can you put your entire C code in a pastebin? pastebin.comVarices
Such a good answer!Framing

© 2022 - 2024 — McMap. All rights reserved.