Finding the address range of the data segment
Asked Answered
S

5

22

As a programming exercise, I am writing a mark-and-sweep garbage collector in C. I wish to scan the data segment (globals, etc.) for pointers to allocated memory, but I don't know how to get the range of the addresses of this segment. How could I do this?

Seignior answered 29/11, 2010 at 23:3 Comment(4)
I agree, but is there any way to get this inside the program, like with a system call?Seignior
How can we answer that if you don't tell us what the system is?Waterspout
I am running the latest version of Ubuntu Linux. But I thought system calls were a sort of interface (i.e. they might be implemented differently but they still exist)?Seignior
No. Basically. What OS you run on is the most important question. Windows vs POSIX is the most important question, and as far as I know, all Linux variants are POSIX. Once you say "I'm on Windows" or "I'm on POSIX", then yes, you are talking about a known interface, but if you don't know the system, you don't know the interface.Waterspout
J
23

The bounds for text (program code) and data for linux (and other unixes):

#include <stdio.h>
#include <stdlib.h>

/* these are in no header file, and on some
systems they have a _ prepended 
These symbols have to be typed to keep the compiler happy
Also check out brk() and sbrk() for information
about heap */

extern char  etext, edata, end; 

int
main(int argc, char **argv)
{
    printf("First address beyond:\n");
    printf("    program text segment(etext)      %10p\n", &etext);
    printf("    initialized data segment(edata)  %10p\n", &edata);
    printf("    uninitialized data segment (end) %10p\n", &end);

    return EXIT_SUCCESS;
}

Where those symbols come from: Where are the symbols etext ,edata and end defined?

Jeffries answered 30/11, 2010 at 0:56 Comment(3)
brk() and sbrk() are ubiquitous, but not POSIX, by design. That is because the arguments are implementation defined. Check your man page for specifics.Jeffries
So since these represent the end of each of the three segments, to search through the data segment we would search from &etext to &edata, right?Seignior
How does that scale to shared object ? If I load an so, it also has a data and bss segment. These symbol wont work in that case. Or do they ? Can you enlighten me ?Erelia
K
33

If you're working on Windows, then there are Windows API that would help you.

//store the base address the loaded Module
dllImageBase = (char*)hModule; //suppose hModule is the handle to the loaded Module (.exe or .dll)

//get the address of NT Header
IMAGE_NT_HEADERS *pNtHdr = ImageNtHeader(hModule);

//after Nt headers comes the table of section, so get the addess of section table
IMAGE_SECTION_HEADER *pSectionHdr = (IMAGE_SECTION_HEADER *) (pNtHdr + 1);

ImageSectionInfo *pSectionInfo = NULL;

//iterate through the list of all sections, and check the section name in the if conditon. etc
for ( int i = 0 ; i < pNtHdr->FileHeader.NumberOfSections ; i++ )
{
     char *name = (char*) pSectionHdr->Name;
     if ( memcmp(name, ".data", 5) == 0 )
     {
          pSectionInfo = new ImageSectionInfo(".data");
          pSectionInfo->SectionAddress = dllImageBase + pSectionHdr->VirtualAddress;

          **//range of the data segment - something you're looking for**
          pSectionInfo->SectionSize = pSectionHdr->Misc.VirtualSize;
          break;
      }
      pSectionHdr++;
}

Define ImageSectionInfo as,

struct ImageSectionInfo
{
      char SectionName[IMAGE_SIZEOF_SHORT_NAME];//the macro is defined WinNT.h
      char *SectionAddress;
      int SectionSize;
      ImageSectionInfo(const char* name)
      {
            strcpy(SectioName, name); 
       }
};

Here's a complete, minimal WIN32 console program you can run in Visual Studio that demonstrates the use of the Windows API:

#include <stdio.h>
#include <Windows.h>
#include <DbgHelp.h>
#pragma comment( lib, "dbghelp.lib" )

void print_PE_section_info(HANDLE hModule) // hModule is the handle to a loaded Module (.exe or .dll)
{
   // get the location of the module's IMAGE_NT_HEADERS structure
   IMAGE_NT_HEADERS *pNtHdr = ImageNtHeader(hModule);

   // section table immediately follows the IMAGE_NT_HEADERS
   IMAGE_SECTION_HEADER *pSectionHdr = (IMAGE_SECTION_HEADER *)(pNtHdr + 1);

   const char* imageBase = (const char*)hModule;
   char scnName[sizeof(pSectionHdr->Name) + 1];
   scnName[sizeof(scnName) - 1] = '\0'; // enforce nul-termination for scn names that are the whole length of pSectionHdr->Name[]

   for (int scn = 0; scn < pNtHdr->FileHeader.NumberOfSections; ++scn)
   {
      // Note: pSectionHdr->Name[] is 8 bytes long. If the scn name is 8 bytes long, ->Name[] will
      // not be nul-terminated. For this reason, copy it to a local buffer that's nul-terminated
      // to be sure we only print the real scn name, and no extra garbage beyond it.
      strncpy(scnName, (const char*)pSectionHdr->Name, sizeof(pSectionHdr->Name));

      printf("  Section %3d: %p...%p %-10s (%u bytes)\n",
         scn,
         imageBase + pSectionHdr->VirtualAddress,
         imageBase + pSectionHdr->VirtualAddress + pSectionHdr->Misc.VirtualSize - 1,
         scnName,
         pSectionHdr->Misc.VirtualSize);
      ++pSectionHdr;
   }
}

// For demo purpopses, create an extra constant data section whose name is exactly 8 bytes long (the max)
#pragma const_seg(".t_const") // begin allocating const data in a new section whose name is 8 bytes long (the max)
const char const_string1[] = "This string is allocated in a special const data segment named \".t_const\".";
#pragma const_seg() // resume allocating const data in the normal .rdata section

int main(int argc, const char* argv[])
{
   print_PE_section_info(GetModuleHandle(NULL)); // print section info for "this process's .exe file" (NULL)
}

This page may be helpful if you're interested in additional uses of the DbgHelp library.

You can read the PE image format here, to know it in details. Once you understand the PE format, you'll be able to work with the above code, and can even modify it to meet your need.

  • PE Format

Peering Inside the PE: A Tour of the Win32 Portable Executable File Format

An In-Depth Look into the Win32 Portable Executable File Format, Part 1

An In-Depth Look into the Win32 Portable Executable File Format, Part 2

  • Windows API and Structures

IMAGE_SECTION_HEADER Structure

ImageNtHeader Function

IMAGE_NT_HEADERS Structure

I think this would help you to great extent, and the rest you can research yourself :-)

By the way, you can also see this thread, as all of these are somehow related to this:

Scenario: Global variables in DLL which is used by Multi-threaded Application

Kinescope answered 30/11, 2010 at 17:50 Comment(2)
Perfect!!!!!!!!!!!!!, should be marked as the answer not that garbage answer that's accepted.Incendiarism
@SSpoke: Glad that it helped you, even after a decade! :OKinescope
J
23

The bounds for text (program code) and data for linux (and other unixes):

#include <stdio.h>
#include <stdlib.h>

/* these are in no header file, and on some
systems they have a _ prepended 
These symbols have to be typed to keep the compiler happy
Also check out brk() and sbrk() for information
about heap */

extern char  etext, edata, end; 

int
main(int argc, char **argv)
{
    printf("First address beyond:\n");
    printf("    program text segment(etext)      %10p\n", &etext);
    printf("    initialized data segment(edata)  %10p\n", &edata);
    printf("    uninitialized data segment (end) %10p\n", &end);

    return EXIT_SUCCESS;
}

Where those symbols come from: Where are the symbols etext ,edata and end defined?

Jeffries answered 30/11, 2010 at 0:56 Comment(3)
brk() and sbrk() are ubiquitous, but not POSIX, by design. That is because the arguments are implementation defined. Check your man page for specifics.Jeffries
So since these represent the end of each of the three segments, to search through the data segment we would search from &etext to &edata, right?Seignior
How does that scale to shared object ? If I load an so, it also has a data and bss segment. These symbol wont work in that case. Or do they ? Can you enlighten me ?Erelia
P
1

Since you'll probably have to make your garbage collector the environment in which the program runs, you can get it from the elf file directly.

Psychoneurosis answered 29/11, 2010 at 23:10 Comment(0)
W
0

Load the file that the executable came from and parse the PE headers, for Win32. I've no idea about on other OSes. Remember that if your program consists of multiple files (e.g. DLLs) you may have multiple data segments.

Waterspout answered 29/11, 2010 at 23:9 Comment(0)
S
0

For iOS you can use this solution. It shows how to find the text segment range but you can easily change it to find any segment you like.

Surefire answered 24/8, 2014 at 11:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.