How to get arguments value in _start and call main() using inline assembly in C, without Glibc or CRT start files?
Asked Answered
C

3

6

How Get arguments value using inline assembly in C without Glibc?

i require this code for Linux archecture x86_64 and i386. if you know about MAC OS X or Windows , also submit and please guide.

void exit(int code)
{
    //This function not important!
    //...
}
void _start()
{
    //How Get arguments value using inline assembly
    //in C without Glibc?
    //argc
    //argv
    exit(0);
}

New Update

https://gist.github.com/apsun/deccca33244471c1849d29cc6bb5c78e

and

#define ReadRdi(To) asm("movq %%rdi,%0" : "=r"(To));
#define ReadRsi(To) asm("movq %%rsi,%0" : "=r"(To));
long argcL;
long argvL;
ReadRdi(argcL);
ReadRsi(argvL);
int argc = (int) argcL;
//char **argv = (char **) argvL;
exit(argc);

But it still returns 0. So this code is wrong! please help.

Cacie answered 9/5, 2018 at 19:36 Comment(9)
Possible duplicate of Linux equivalent of GetCommandLine and CommandLineToArgv?Arezzo
when you enter into _start you have nothing, the only way to get the command line is to ask the OS for it. Either way if you're going to do this in ASM you'll need to make a system call.Arezzo
sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/i386/… for x86, sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/… for x86_64; in particular, argv and argc are on the stack, not in registers.Toshikotoss
@Arezzo , I know working with syscall. Please send documentation about this item. Or guide. Thank You.Cacie
The main problem there is, you cannot do it into a "regular" C function, as gcc will manipulate the stack before you'll have a chance to do anything. The simplest thing is to do as glibc does and implement _start straight in assembly and perform the handoff to a C function using the regular C calling convention. From there onwards, it's all C. Otherwise, I think you may be able to do something with a naked function, but then again, you want to get the arguments and forward them to a "normal" C function using the regular ABI immediately.Toshikotoss
@Arezzo This is not correct for Linux. The argument, environment, and auxillary vectors are already present in the new executable's address space when control reaches _start.Sahara
@Arezzo , Read from file /proc/self/cmdline , Is this the real way?Cacie
@CPerfomance No, it is not. Hold your horses, I'm still typing.Sahara
@Sahara , Thankful. Waiting.Cacie
T
13

As specified in the comment, argc and argv are provided on the stack, so you cannot use a regular C function to get them, even with inline assembly, as the compiler will touch the stack pointer to allocate the local variables, setup the stack frame & co.; hence, _start must be written in assembly, as it's done in glibc (x86; x86_64). A small stub can be written to just grab the stuff and forward it to your "real" C entrypoint according to the regular calling convention.

Here a minimal example of a program (both for x86 and x86_64) that reads argc and argv, prints all the values in argv on stdout (separated by newline) and exits using argc as status code; it can be compiled with the usual gcc -nostdlib (and -static to make sure ld.so isn't involved; not that it does any harm here).

#ifdef __x86_64__
asm(
        ".global _start\n"
        "_start:\n"
        "   xorl %ebp,%ebp\n"       // mark outermost stack frame
        "   movq 0(%rsp),%rdi\n"    // get argc
        "   lea 8(%rsp),%rsi\n"     // the arguments are pushed just below, so argv = %rbp + 8
        "   call bare_main\n"       // call our bare_main
        "   movq %rax,%rdi\n"       // take the main return code and use it as first argument for...
        "   movl $60,%eax\n"        // ... the exit syscall
        "   syscall\n"
        "   int3\n");               // just in case

asm(
        "bare_write:\n"             // write syscall wrapper; the calling convention is pretty much ok as is
        "   movq $1,%rax\n"         // 1 = write syscall on x86_64
        "   syscall\n"
        "   ret\n");
#endif
#ifdef __i386__
asm(
        ".global _start\n"
        "_start:\n"
        "   xorl %ebp,%ebp\n"       // mark outermost stack frame
        "   movl 0(%esp),%edi\n"    // argc is on the top of the stack
        "   lea 4(%esp),%esi\n"     // as above, but with 4-byte pointers
        "   sub $8,%esp\n"          // the start starts 16-byte aligned, we have to push 2*4 bytes; "waste" 8 bytes
        "   pushl %esi\n"           // to keep it aligned after pushing our arguments
        "   pushl %edi\n"
        "   call bare_main\n"       // call our bare_main
        "   add $8,%esp\n"          // fix the stack after call (actually useless here)
        "   movl %eax,%ebx\n"       // take the main return code and use it as first argument for...
        "   movl $1,%eax\n"         // ... the exit syscall
        "   int $0x80\n"
        "   int3\n");               // just in case

asm(
        "bare_write:\n"             // write syscall wrapper; convert the user-mode calling convention to the syscall convention
        "   pushl %ebx\n"           // ebx is callee-preserved
        "   movl 8(%esp),%ebx\n"    // just move stuff from the stack to the correct registers
        "   movl 12(%esp),%ecx\n"
        "   movl 16(%esp),%edx\n"
        "   mov $4,%eax\n"          // 4 = write syscall on i386
        "   int $0x80\n"
        "   popl %ebx\n"            // restore ebx
        "   ret\n");                // notice: the return value is already ok in %eax
#endif

int bare_write(int fd, const void *buf, unsigned count);

unsigned my_strlen(const char *ch) {
    const char *ptr;
    for(ptr = ch; *ptr; ++ptr);
    return ptr-ch;
}

int bare_main(int argc, char *argv[]) {
    for(int i = 0; i < argc; ++i) {
        int len = my_strlen(argv[i]);
        bare_write(1, argv[i], len);
        bare_write(1, "\n", 1);
    }
    return argc;
}

Notice that here several subtleties are ignored - in particular, the atexit bit. All the documentation about the machine-specific startup state has been extracted from the comments in the two glibc files linked above.

Toshikotoss answered 9/5, 2018 at 20:49 Comment(4)
Could you help with the sys write of the number (argc)?Cacie
@CPerfomance: StackOverflow doesn't work this way. The original question has been answered conclusively, if you have other questions you should post them separately. Besides, what you are asking is not difficult at all - you'll just have to convert the integer to a string and then write it out using the write syscall (that I already provided). If you don't know how to convert an integer to string without using the standard library (i.e. how to write by yourself your itoa), probably you should study basic algorithms before working with this kind of ultra-specific/low-level stuff.Toshikotoss
I know that. I just wanted to make sure. Because I would probably have a way to print a direct number! Thank you for your complete answer.Cacie
You could implement bare_write as a regular function using GNU C Extended-asm, like long bare_write(int fd, const char *buf, unsigned long len) { long retval; asm volatile("syscall" : "=a"(retval) : "a"(__NR_write), "D"(fd), "S"(buf), "d"(len), "m"(*(const char(*)[])buf) : "rcx", "r11"); return retval; } (godbolt.org/g/a2W8zS). Then the compiler can take advantage of it only clobbering a couple registers, and not the arg-passing regs, and take advantage of fd = __NR_write.Doris
S
7

This answer is for x86-64, 64-bit Linux ABI, only. All the other OSes and ABIs mentioned will be broadly similar, but different enough in the fine details that you will need to write your custom _start once for each.

You are looking for the specification of the initial process state in the "x86-64 psABI", or, to give it its full title, "System V Application Binary Interface, AMD64 Architecture Processor Supplement (With LP64 and ILP32 Programming Models)". I will reproduce figure 3.9, "Initial Process Stack", here:

Purpose                            Start Address                  Length
------------------------------------------------------------------------
Information block, including                                      varies
argument strings, environment
strings, auxiliary information
...
------------------------------------------------------------------------
Null auxiliary vector entry                                  1 eightbyte
Auxiliary vector entries...                            2 eightbytes each
0                                                              eightbyte
Environment pointers...                                 1 eightbyte each
0                                  8+8*argc+%rsp               eightbyte
Argument pointers...               8+%rsp                argc eightbytes
Argument count                     %rsp                        eightbyte

It goes on to say that the initial registers are unspecified except for %rsp, which is of course the stack pointer, and %rdx, which may contain "a function pointer to register with atexit".

So all the information you are looking for is already present in memory, but it hasn't been laid out according to the normal calling convention, which means you must write _start in assembly language. It is _start's responsibility to set everything up to call main with, based on the above. A minimal _start would look something like this:

_start:
        xorl   %ebp, %ebp       #  mark the deepest stack frame

  # Current Linux doesn't pass an atexit function,
  # so you could leave out this part of what the ABI doc says you should do
  # You can't just keep the function pointer in a call-preserved register
  # and call it manually, even if you know the program won't call exit
  # directly, because atexit functions must be called in reverse order
  # of registration; this one, if it exists, is meant to be called last.
        testq  %rdx, %rdx       #  is there "a function pointer to
        je     skip_atexit      #  register with atexit"?

        movq   %rdx, %rdi       #  if so, do it
        call   atexit

skip_atexit:
        movq   (%rsp), %rdi           #  load argc
        leaq   8(%rsp), %rsi          #  calc argv (pointer to the array on the stack)
        leaq   8(%rsp,%rdi,8), %rdx   #  calc envp (starts after the NULL terminator for argv[])
        call   main

        movl   %eax, %edi   # pass return value of main to exit
        call   exit

        hlt                 # should never get here

(Completely untested.)

(In case you're wondering why there's no adjustment to maintain stack pointer alignment, this is because upon a normal procedure call, 8(%rsp) is 16-byte aligned, but when _start is called, %rsp itself is 16-byte aligned. Each call instruction displaces %rsp down by eight, producing the alignment situation expected by normal compiled functions.)

A more thorough _start would do more things, such as clearing all the other registers, arranging for greater stack pointer alignment than the default if desired, calling into the C library's own initialization functions, setting up environ, initializing the state used by thread-local storage, doing something constructive with the auxiliary vector, etc.

You should also be aware that if there is a dynamic linker (PT_INTERP section in the executable), it receives control before _start does. Glibc's ld.so cannot be used with any C library other than glibc itself; if you are writing your own C library, and you want to support dynamic linkage, you will also need to write your own ld.so. (Yes, this is unfortunate; ideally, the dynamic linker would be a separate development project and its complete interface would be specified.)

Sahara answered 9/5, 2018 at 20:31 Comment(0)
D
5

As a quick and dirty hack, you can make an executable with a compiled C function as the ELF entry point. Just make sure you use exit or _exit instead of returning.

(Link with gcc -nostartfiles to omit CRT but still link other libraries, and write a _start() in C. Beware of ABI violations like stack alignment, e.g. use -mincoming-stack-boundary=2 or an __attribte__ on _start, as in Compiling without libc)

If it's dynamically linked, you can still use glibc functions on Linux (because the dynamic linker runs glibc's init functions). Not all systems are like this, e.g. on cygwin you definitely can't call libc functions if you (or the CRT start code) hasn't called the libc init functions in the correct order. I'm not sure it's even guaranteed that this works on Linux, so don't depend on it except for experimentation on your own system.

I have used a C _start(void){ ... } + calling _exit() for making a static executable to microbenchmark some compiler-generated code with less startup overhead for perf stat ./a.out.

Glibc's _exit() works even if glibc wasn't initialized (gcc -O3 -static), or use inline asm to run xor %edi,%edi / mov $60, %eax / syscall (sys_exit(0) on Linux) so you don't have to even statically link libc. (gcc -O3 -nostdlib)


With even more dirty hacking and UB, you can access argc and argv by knowing the x86-64 System V ABI that you're compiling for (see @zwol's answer for a quote from ABI doc), and how the process startup state differers from the function calling convention:

  • argc is where the return address would be for a normal function (pointed to by RSP). GNU C has a builtin for accessing the return address of the current function (or for walking up the stack.)
  • argv[0] is where the 7th integer/pointer arg should be (the first stack arg, just above the return address). It happens to / seems to work to take its address and use that as an array!
// Works only for the x86-64 SystemV ABI; only tested on Linux.
// DO NOT USE THIS EXCEPT FOR EXPERIMENTS ON YOUR OWN COMPUTER.

#include <stdio.h>
#include <stdlib.h>

// tell gcc *this* function is called with a misaligned RSP
__attribute__((force_align_arg_pointer))
void _start(int dummy1, int dummy2, int dummy3, int dummy4, int dummy5, int dummy6, // register args
        char *argv0) {

    int argc = (int)(long)__builtin_return_address(0);  // load (%rsp), casts to silence gcc warnings.
    char **argv = &argv0;

    printf("argc = %d, argv[argc-1] = %s\n", argc, argv[argc-1]);

    printf("%f\n", 1.234);  // segfaults if RSP is misaligned
    exit(0);
    //_exit(0);  // without flushing stdio buffers!
}
   # with a version without the FP printf
peter@volta:~/src/SO$ gcc -nostartfiles _start.c -o bare_start 
peter@volta:~/src/SO$ ./bare_start 
argc = 1, argv[argc-1] = ./bare_start
peter@volta:~/src/SO$ ./bare_start abc def hij
argc = 4, argv[argc-1] = hij
peter@volta:~/src/SO$ file bare_start
bare_start: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=af27c8416b31bb74628ef9eec51a8fc84e49550c, not stripped
 # I could have used  -fno-pie -no-pie to make a non-PIE executable

This works with or without optimization, with gcc7.3. I was worried that without optimization, the address of argv0 would be below rbp where it copies the arg, rather than its original location. But apparently it works.

gcc -nostartfiles links glibc but not the CRT start files.

gcc -nostdlib omits both libraries and CRT startup files.

Very little of this is guaranteed to work, but it does in practice work with current gcc on current x86-64 Linux, and has worked in the past for years. If it breaks, you get to keep both pieces. IDK what C features are broken by omitting the CRT startup code and just relying on the dynamic linker to run glibc init functions. Also, taking the address of an arg and accessing pointers above it is UB, so you could maybe get broken code-gen. gcc7.3 happens to do what you'd expect in this case.

Things that definitely break

  • atexit() cleanup, e.g. flushing stdio buffers.
  • static destructors for static objects in dynamically-linked libraries. (On entry to _start, RDX is a function pointer you should register with atexit for this reason. In a dynamically linked executable, the dynamic linker runs before your _start and sets RDX before jumping to your _start. Statically linked executables have RDX=0 under Linux.)

gcc -mincoming-stack-boundary=3 (i.e. 2^3 = 8 bytes) is another way to get gcc to realign the stack, because the -mpreferred-stack-boundary=4 default of 2^4 = 16 is still in place. But that makes gcc assume under-aligned RSP for all functions, not just for _start, which is why I looked in the docs and found an attribute that was intended for 32-bit when the ABI transitioned from only requiring 4-byte stack alignment to the current requirement of 16-byte alignment for ESP in 32-bit mode.

The SysV ABI requirement for 64-bit mode has always been 16-byte alignment, but gcc options let you make code that doesn't follow the ABI.

// test call to a function the compiler can't inline
// to see if gcc emits extra code to re-align the stack

// like it would if we'd used -mincoming-stack-boundary=3 to assume *all* functions
// have only 8-byte (2^3) aligned RSP on entry, with the default -mpreferred-stack-boundary=4
void foo() {
    int i = 0;
    atoi(NULL);
}

With -mincoming-stack-boundary=3, we get stack-realignment code there, where we don't need it. gcc's stack-realignment code is pretty clunky, so we'd like to avoid that. (Not that you'd really ever use this to compile a significant program where you care about efficiency, please only use this stupid computer trick as a learning experiment.)

But anyway, see the code on the Godbolt compiler explorer with and without -mpreferred-stack-boundary=3.

Doris answered 11/5, 2018 at 1:55 Comment(2)
Fun experiment, but this is several levels beyond scary - I wouldn't trust even a toy project to work with this stuff. :-DToshikotoss
@MatteoItalia: Just to repeat the most important part of this answer: If it breaks, you get to keep both pieces. :)Doris

© 2022 - 2024 — McMap. All rights reserved.