Find argc and argv from a library

Asked 21/1, 2016 at 5:14 Answered 20/11, 2019 at 14:16

How do I find a program's argc and argv from a shared object? I am writing a library in C that will be loaded via LD_PRELOAD. I've been able to find the stack two different ways:

Read rsp via inline __asm__ call.
Read /proc/<pid>/maps and parse the entry for stack.

I can then create a pointer, point it at the stack segment, then iterate through looking for data. The problem is I can't figure out an efficient way to determine what bytes are argc and the pointer to the pointer to the argv strings.

I know that /proc/<pid>/cmdline also contains the arguments, each separated by 0x00, but I'm interested in finding everything in memory.

In gdb I see a DWORD for argc followed by a QWORD which is the first pointer. 20 bytes before the address of argc is a pointer that points back into the main program's code segment. But that's not a deterministic way to identify argc and argv.

I've seen a few posts but no working code:

Battologize answered 21/1, 2016 at 5:14 Comment(6)

Seems a bit of a flaky way to do it, being dependent on how the compiler uses the stack. That is likely to change as soon as someone finds a compiler/runtime optimisation. The application may also want to use the same args in a different sense, this could cause problems if your lib tries to interpret params not target ed to it. Can you not pass these directly to your library in a 'constructor' call? Yes, I understand that what you want to do is avoid this overhead. – Flatiron 21/1, 2016 at 5:28

At what point the argc and argv would be accessed? It is probably impossible during LD_PRELOAD phase. – Catrinacatriona 21/1, 2016 at 5:31

It's also perfectly legitimate for a program to modify the data in argv. I'm not sure what happens to the stack in this case. – Ursala 21/1, 2016 at 5:32

@ChrisR: What overhead, exactly? Calling an initializer? – Pfeifer 21/1, 2016 at 6:51

@Pfeifer - yes, a programming overhead, I should have made that explicit. The code may in fact run faster. – Flatiron 21/1, 2016 at 7:8

@ChrisR: Ah, ok. The programming overhead seems minimal to me, certainly less than scanning the stack for something useful. But everyone has their own aesthetic. – Pfeifer 21/1, 2016 at 7:12

This response in your second link contains working source code which worked fine for me (Gnu/Linux elf-based system), including during LD_PRELOAD.

The code is very short; it consists of a function:

int foo(int argc, char **argv, char **env) {
   // Do something with argc, argv (and env, if desired)
}

and a pointer to that function in the .init_array section:

__attribute__((section(".init_array"))) static void *foo_constructor = &foo;

Putting that into a shared library and then LD_PRELOADing the shared library certainly triggered the call to foo when I tried it, and it was clearly called with the argc and argv which would later be passed to main (and also the value of environ).

Pfeifer answered 21/1, 2016 at 6:11 Comment(1)

Great answer! I'd never thought of running a constructor this way in a library. No fiddling with compiler or run time dependencies. love it. I learn every day. – Flatiron 21/1, 2016 at 7:14

This is a bad idea, but I'm not naive enough to say you don't have a valid reason for it.

There's no good way to find argc/argv if all you know is the location of the stack. Luckily, envp is directly after argv on the stack, and every libc that I know of puts envp in the __environ global. So by going backwards from __environ, you can find argc and argv. Here's some example code written in Rust, which should be pretty easy to port to C++:

extern "C" {
    pub static __environ: *const *const c_char;
}

fn raw_args() -> (c_int, *const *const c_char) {
    let mut walk_environ = unsafe { __environ as *const usize };
    walk_environ = walk_environ.wrapping_offset(-1);
    let mut i = 0;

    loop {
        let argc_ptr = walk_environ.wrapping_offset(-1) as *const c_int;
        let argc = unsafe { *argc_ptr };
        if argc == i {
            break (argc, walk_environ as *const *const c_char);
        }
        walk_environ = walk_environ.wrapping_offset(-1);
        i += 1;
    }
}

Eliath answered 20/11, 2019 at 14:16 Comment(0)

The most reliable is probably to use /proc/<pid>/cmdline because that's provided by the kernel and won't change depending on the C implementation (for example it would depend on the processor you're using).

The problem is that on some platforms the arguments to a function (fx main) would be passed on the stack, but on other platforms it might be passed as registers (fx on x86-64 platform). If it's sent via registers then if optimizations are enabled main will not store these in memory if it doesn't need to - that is it's likely not to remain in memory if you don't explicitly do so yourself.

Even if the arguments are passed on the stack the exact location where the arguments of main is located may differ from version to version of the compiler/implementation. Which means there's hardly any reliable method of retrieving them from the stack (and as someone pointed out they may be modified during execution of main as part of command line parsing).

Even the way the kernel passes the arguments to the program doesn't help much as they are passed via registers - which means that where they're going to be stored is entirely up to the CRT init (which in turn may change from version to version).

In short retrieving argv and argc later on requires explicit support from the CRT you're using (Microsoft's CRT does that, but GNU doesnt AFAIK).

What you could do of course is to grab the source of GCC and patch the CRT init to actually store the argv and argc somewhere where you can later retrieve them. That would of course not work if you need to access them before CRT init of the program is being run (fx during dynamic linking).

Overijssel answered 21/1, 2016 at 6:33 Comment(0)

Recommended topics

Hot tags