How can I trace all accesses to a memory block?

T

3

5

I am looking to log all accesses to addresses in a block allocated in heap, pointed by a given pointer. In the following example, it would be monitoring all accesses to 1024 bytes from p.

#define BLK 1024
#include <stdlib.h>
int monfun(int *p){
    int t;
    for(int i=0; i<BLK; i++){
        t=p[i];
    }
    return t;
}

int main(void){
    int *p=(int*)malloc(BLK);
    int *p = a;
    monfun(p);
    return 0;
}

And obtain something like:

w 4b 0x12345
r 4b 0x12346
r 8b 0x12348
...

Performance is not a problem. This is likely slow down execution by a good deal and requires a lot of extra memory. Instead, I am looking for precise measurements.

I have been exploring valgrind, its massif tool gives me a tree with allocation/free pairs. I keep reading online that valgrind can do what I need, but I can only get access counters rather than the actual sequence of touched addresses.

I have read other tools, such as radare2, qemu-mtrace, a tracegrind plugin for valgrind (for v3.15. I have v3.21), intel's pin, but at this point I am not sure which tool/manual to invest my learning on in order to get this done.

If this can be done with valgrind, could you please guide me on how to achieve it?

Tenure answered 6/9, 2023 at 22:10 Comment(3)

This could be very expensive in terms of performance and memory use, but I agree there are times when it would be handy even with those caveats. There used to be a thing called hardware breakpoints where you could set the CPU up to generate a break for accessing a particular range of memory, but I have no idea if/how you could invoke such a feature on contemporary operating systems. – Valleau 6/9, 2023 at 22:18

Performance is not an issue, but thanks for the name. I will read about it. I wonder how could I know beforehand which is the address to associate the breakpoint to. Perhaps good ol' gdb is the way to go. – Tenure 6/9, 2023 at 22:22

There's some discussion about doing it from GDB in this thread. – Valleau 6/9, 2023 at 22:26

V

4

Using a debugger could be a solution, but definitely a clunky one. What you really need is either some kind of compile-time instrumentation (not sure how tho, maybe a LLVM plugin, pretty hardcore solution) or emulation (QEMU, Intel PIN, Dynamorio, etc).

QEMU user

You can achieve this by using the tracing events memory_region_ops_{read,write} or by implementing your own. See QEMU doc about tracing for more info. You probably need a recent version of QEMU for this, for example, the QEMU user on my Debian 11 system does not support these events. Compiling QEMU from source is pretty easy anyway, you could try with the latest. The downside of this approach is that it could be quite slow and you will need to filter out the output since this will log all reads and writes.

Intel PIN

Assuming you are using an x86 CPU, with PIN you can do this and much more, after you get familiar enough with the tool. Both the documentation and the PIN kit obtainable from the download page have examples that do almost exactly what you want. Downloading the latest version of the PIN kit and looking at these files:

source/tools/ManualExamples/pinatrace.cpp for memory R/W tracing
source/tools/ManualExamples/malloctrace.cpp for tracing malloc() calls

It should be pretty easy to combine the two into a single tracing tool. The only functionality to add is dumping values from memory, which can be done with PIN_SafeCopy().

I wrote a tracing tool combining code from the two examples above that should do what you want. It should contain enough comments to understand what is going on, and the examples contain even more comments, so you can read those if something is unclear.

It expects a call to a dummy function trace_next_allocation() to enable tracing of the chunk allocated by the next malloc() call. The chunk is then automatically un-tracked when freed through free().

Here's the code (see below for a usage example):

/**
 * tracechunk.cpp
 * 
 * Copyright (C) 2004-2021 Intel Corporation.
 * Copyright (C) 2023 Marco Bonelli.
 * SPDX-License-Identifier: MIT
 */

#include "pin.H"
#include <iostream>
#include <fstream>

struct tracked_malloc_chunk {
    ADDRINT start;
    ADDRINT end;
    ADDRINT size;
};

struct last_write_info {
    ADDRINT addr;
    UINT32 size;
};

bool tracking = false;
bool track_next = false;
struct tracked_malloc_chunk tracked_chunk;
struct last_write_info last_write;

static PIN_LOCK pin_lock;
std::ofstream trace_file;
KNOB<std::string> trace_output_fname(KNOB_MODE_WRITEONCE, "pintool", "o", "tracechunk.out", "specify trace file name");

/**
 * Dump `size` bytes of tracee memory starting at `addr`.
 */
void hexdump(ADDRINT addr, UINT32 size) {
    char old_fill = trace_file.fill();
    static UINT8 data[512];
    size_t actual;

    actual = PIN_SafeCopy(data, (void *)addr, size);

    trace_file << std::noshowbase << std::setfill('0');
    for (UINT32 i = 0; i < actual; i++)
        trace_file << std::setw(2) << (unsigned)data[i];

    if (actual != (size_t)size)
        trace_file << " (err: could only read " << actual << " bytes)";

    trace_file << std::endl << std::showbase << std::setfill(old_fill);
}

/**
 * Executed *before* a malloc() call: save the allocation size for later.
 */
VOID malloc_before(ADDRINT size) {
    if (!track_next)
        return;

    tracked_chunk.size = size;
}

/**
 * Executed *after* a malloc() call: save the chunk address and start tracing.
 */
VOID malloc_after(ADDRINT retval) {
    if (retval == 0) {
        trace_file << "ERROR: malloc() call to track failed!" << std::endl;
        return;
    }

    if (!track_next || !tracked_chunk.size)
        return;

    tracked_chunk.start = retval;
    tracked_chunk.end = retval + tracked_chunk.size;

    trace_file << "START tracking memory R/W for chunk ["
        << tracked_chunk.start << ","
        << tracked_chunk.end << ") of size "
        << tracked_chunk.size << std::endl;

    tracking = true;
    track_next = false;
}

/**
 * Executed *before* a free() call: stop tracing if the chunk we are currently
 * tracking is freed.
 */
VOID free_before(ADDRINT addr) {
    if (!tracking || addr != tracked_chunk.start)
        return;

    trace_file << "STOP  tracking memory R/W for chunk ["
        << tracked_chunk.start << ","
        << tracked_chunk.end << ") of size "
        << tracked_chunk.size << std::endl;

    tracking = false;
    tracked_chunk = {0};
}

/**
 * Enable tracking of R/W operations for the chunk returned by the next malloc()
 * invocation.
 */
VOID enable_track_next(void) {
    track_next = true;
}

/**
 * Executed *before* a read operation: dump instruction pointer, address, size
 * and memory content.
 */
VOID trace_read(ADDRINT ip, ADDRINT addr, UINT32 size) {
    ADDRINT offset;

    if (!tracked_chunk.size || !size)
        return;

    offset = addr - tracked_chunk.start;
    if (offset < 0 || offset >= tracked_chunk.size)
        return;

    trace_file << ip << ": READ  of size " << size << " at " << addr
        << " (offset " << offset << "): ";

    PIN_GetLock(&pin_lock, ip);
    hexdump(addr, size);
    PIN_ReleaseLock(&pin_lock);
}

/**
 * Executed *before* a write operation: dump instruction pointer, address and
 * size, then save write address and size for later.
 */
VOID trace_write_before(ADDRINT ip, ADDRINT addr, UINT32 size) {
    ADDRINT offset = addr - tracked_chunk.start;

    if (!tracked_chunk.size || !size || offset < 0 || offset >= tracked_chunk.size)
        return;

    last_write.addr = addr;
    last_write.size = size;

    trace_file << ip << ": WRITE of size " << size << " at " << addr
        << " (offset " << offset << "): ";
}

/**
 * Executed *after* a write operation: dump memory content (in big endian order)
 * from previously saved address and size.
 */
VOID trace_write_after(ADDRINT ip) {
    if (!last_write.size)
        return;

    PIN_GetLock(&pin_lock, ip);
    hexdump(last_write.addr, last_write.size);
    PIN_ReleaseLock(&pin_lock);

    last_write.size = 0;
}

VOID Image(IMG img, VOID* v) {
    // Instrument malloc() to save the allocation size and the chunk address
    // when we want to trace the next allocation
    RTN mallocRtn = RTN_FindByName(img, "malloc");
    if (RTN_Valid(mallocRtn)) {
        RTN_Open(mallocRtn);
        RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)malloc_before,
            IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
        RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)malloc_after,
            IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
        RTN_Close(mallocRtn);
    }

    // Instrument free() to stop tracing the chunk
    RTN freeRtn = RTN_FindByName(img, "free");
    if (RTN_Valid(freeRtn)) {
        RTN_Open(freeRtn);
        RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)free_before,
            IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
        RTN_Close(freeRtn);
    }

    // Instrument the dummy trace_next_allocation() function to enable tracing
    // the next malloc() allocation
    RTN triggerRtn = RTN_FindByName(img, "trace_next_allocation");
    if (RTN_Valid(triggerRtn)) {
        RTN_Open(triggerRtn);
        RTN_InsertCall(triggerRtn, IPOINT_BEFORE, (AFUNPTR)enable_track_next,
            IARG_END);
        RTN_Close(triggerRtn);
    }
}

VOID Instruction(INS ins, VOID* v) {
    UINT32 n = INS_MemoryOperandCount(ins);

    for (UINT32 i = 0; i < n; i++) {
        // Instrument read operations to dump address, size and memory content
        if (INS_MemoryOperandIsRead(ins, i)) {
            INS_InsertPredicatedCall(ins, IPOINT_BEFORE, (AFUNPTR)trace_read,
                IARG_INST_PTR, IARG_MEMORYOP_EA, i, IARG_MEMORYOP_SIZE, i,
                IARG_END);
        }

        // Instrument write operations to dump address, size and memory content.
        // This needs to be done in two steps as we can't get both the effective
        // address and the memory content at the same time (the effective
        // address and the size are not provided at IPOINT_AFTER).
        if (INS_MemoryOperandIsWritten(ins, i)) {
            INS_InsertPredicatedCall(ins, IPOINT_BEFORE,
                (AFUNPTR)trace_write_before, IARG_INST_PTR, IARG_MEMORYOP_EA, i,
                IARG_MEMORYOP_SIZE, i, IARG_END);

            if (INS_IsValidForIpointAfter(ins)) {
                INS_InsertPredicatedCall(ins, IPOINT_AFTER,
                    (AFUNPTR)trace_write_after, IARG_END);
            }
        }
    }
}

VOID Fini(INT32 code, VOID* v) {
    trace_file.close();
}

INT32 Usage() {
    std::cerr << "This tool produces a trace of memory read/write operations"
        << " on specific malloc() chunks" << std::endl;
    std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
    return 1;
}

int main(int argc, char **argv) {
    PIN_InitSymbols();

    if (PIN_Init(argc, argv))
        return Usage();

    // Write to a file since stdout and stderr may be closed by the application
    trace_file.open(trace_output_fname.Value().c_str());
    trace_file << std::hex << std::showbase;

    IMG_AddInstrumentFunction(Image, 0);
    INS_AddInstrumentFunction(Instruction, 0);
    PIN_AddFiniFunction(Fini, 0);

    PIN_StartProgram();

    return 0;
}

Here's an example program to trace:

// example.c
#include <stdlib.h>

#define N 4

// Do not optimize away, we need this function to be called to enable tracing
void __attribute__((optimize("O0"))) trace_next_allocation(void) {}

int main(void){
    volatile int *chunk;

    trace_next_allocation();
    chunk = malloc(N * sizeof(int));

    for(unsigned i = 0; i < N; i++) {
        chunk[i] += 123;
    }

    free(chunk);

    return 0;
}

The tool can then be used like this (where /path/to/pin-xxx is the path to the extracted PIN kit downloaded from here):

cd /path/to/pin-xxx/source/tools/ManualExamples

# Write the tool code in a file named tracechunk.cpp inside this directory...

# Compile the tool
make obj-intel64/tracechunk.so

# Compile example program
gcc -o example example.c

# Trace with PIN
../../../pin -t obj-intel64/tracechunk.so -- ./example

# Show trace output
cat tracechunk.out

I tested it with PIN 3.28 and the output looks like this:

START tracking memory R/W for chunk [0x55ec376bb2a0,0x55ec376bb2b0) of size 0x10
0x55ec3591d07a: READ  of size 0x4 at 0x55ec376bb2a0 (offset 0): 00000000
0x55ec3591d083: WRITE of size 0x4 at 0x55ec376bb2a0 (offset 0): 7b000000
0x55ec3591d07a: READ  of size 0x4 at 0x55ec376bb2a4 (offset 0x4): 00000000
0x55ec3591d083: WRITE of size 0x4 at 0x55ec376bb2a4 (offset 0x4): 7b000000
0x55ec3591d07a: READ  of size 0x4 at 0x55ec376bb2a8 (offset 0x8): 00000000
0x55ec3591d083: WRITE of size 0x4 at 0x55ec376bb2a8 (offset 0x8): 7b000000
0x55ec3591d07a: READ  of size 0x4 at 0x55ec376bb2ac (offset 0xc): 00000000
0x55ec3591d083: WRITE of size 0x4 at 0x55ec376bb2ac (offset 0xc): 7b000000
STOP  tracking memory R/W for chunk [0x55ec376bb2a0,0x55ec376bb2b0) of size 0x10

Vietnam answered 7/9, 2023 at 1:5 Comment(3)

Although my laptop is Intel, the final experiments will run on a AMD Threadripper. Given your caveat "assuming you are using Intel", I am wondering what are the problems I could face with a non-intel cpu – Tenure 7/9, 2023 at 15:25

@Tenure yeah it works for AMD too, I should have stated "assuming you are using an x86 CPU", my bad there. Fixed! – Vietnam 7/9, 2023 at 16:25

Amazing. I will dig the docs and start playing with it. I will come back and mark this as the answer once I get numbers myself. Thanks a lot for the time and knowledge sharing. My deepest respects. – Tenure 7/9, 2023 at 17:48

C

2

Apart from valgrind and debugger watches, the other option is asan (for GCC and LLVM/clang).

The address sanitizer usually looks for memory bugs, but it's possible to:

manually poison a region of memory,
compile with -fsanitize-recover=address,
and then run your code with ASAN_OPTIONS=halt_on_error=0

Collagen answered 7/9, 2023 at 0:13 Comment(0)

W

2

Have you tried Valgrind lackey?

It sounds like you want to use the options --basic-counts=no --trace-mem=yes

The manual is here

Slightly unhelpfully it says to look at the source to understand the format (the tool is intended as an example rather than an end-user tool). The output includes instruction reads. So part of the output for running true that I get is

I  04a5dca6,5
 S 1ffeffe498,8
I  04ae9d70,3
I  04ae9d73,7
 L 04deae50,8
I  04ae9d7a,6

That's I for instruction read, S for data store and L for data load. That's followed by the address and the size.

You can filter the instruction reads using something like '| grep -v ^I' (if your font doesn't make that clear it is "pipe grep -v caret capital-i"). There will still be a large amount of output since Valgrind starts instrumenting from the very beginning of program startup. There aren't any client requests to turn instrumentation on and off.

Waspish answered 8/9, 2023 at 7:47 Comment(1)

this looks great. I will absolutely give it a try after the Pin approach. thanks. – Tenure 8/9, 2023 at 19:4

QEMU user

Intel PIN

Recommended topics

Hot tags