Do .bss section zero initialized variables occupy space in elf file?
Asked Answered
E

4

46

If I understand correctly, the .bss section in ELF files is used to allocate space for zero-initialized variables. Our tool chain produces ELF files, hence my question: does the .bss section actually have to contain all those zeroes? It seems such an awful waste of spaces that when, say, I allocate a global ten megabyte array, it results in ten megabytes of zeroes in the ELF file. What am I seeing wrong here?

Enamelware answered 4/3, 2009 at 14:0 Comment(1)
Quick way to answer it: make a hello world with an int is[1000000] and another without, compile and see the compiled sizes :-) Then to really understand, decompile it with binutils, or compile to assembly code with -S.Impudent
U
69

Has been some time since i worked with ELF. But i think i still remember this stuff. No, it does not physically contain those zeros. If you look into an ELF file program header, then you will see each header has two numbers: One is the size in the file. And another is the size as the section has when allocated in virtual memory (readelf -l ./a.out):

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
  INTERP         0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x00454 0x00454 R E 0x1000
  LOAD           0x000454 0x08049454 0x08049454 0x00104 0x61bac RW  0x1000
  DYNAMIC        0x000468 0x08049468 0x08049468 0x000d0 0x000d0 RW  0x4
  NOTE           0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

Headers of type LOAD are the one that are copied into virtual memory when the file is loaded for execution. Other headers contain other information, like the shared libraries that are needed. As you see, the FileSize and MemSiz significantly differ for the header that contains the bss section (the second LOAD one):

0x00104 (file-size) 0x61bac (mem-size)

For this example code:

int a[100000];
int main() { }

The ELF specification says that the part of a segment that the mem-size is greater than the file-size is just filled out with zeros in virtual memory. The segment to section mapping of the second LOAD header is like this:

03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss

So there are some other sections in there too. For C++ constructor/destructors. The same thing for Java. Then it contains a copy of the .dynamic section and other stuff useful for dynamic linking (i believe this is the place that contains the needed shared libraries among other stuff). After that the .data section that contains initialized globals and local static variables. At the end, the .bss section appears, which is filled by zeros at load time because file-size does not cover it.

By the way, you can see into which output-section a particular symbol is going to be placed by using the -M linker option. For gcc, you use -Wl,-M to put the option through to the linker. The above example shows that a is allocated within .bss. It may help you verify that your uninitialized objects really end up in .bss and not somewhere else:

.bss            0x08049560    0x61aa0
 [many input .o files...]
 *(COMMON) 
 *fill*         0x08049568       0x18 00
 COMMON         0x08049580    0x61a80 /tmp/cc2GT6nS.o
                0x08049580                a
                0x080ab000                . = ALIGN ((. != 0x0)?0x4:0x1) 
                0x080ab000                . = ALIGN (0x4) 
                0x080ab000                . = ALIGN (0x4) 
                0x080ab000                _end = .

GCC keeps uninitialized globals in a COMMON section by default, for compatibility with old compilers, that allow to have globals defined twice in a program without multiple definition errors. Use -fno-common to make GCC use the .bss sections for object files (does not make a difference for the final linked executable, because as you see it's going to get into a .bss output section anyway. This is controlled by the linker script. Display it with ld -verbose). But that shouldn't scare you, it's just an internal detail. See the manpage of gcc.

Unplug answered 4/3, 2009 at 14:21 Comment(6)
I assume the NOBITS section type must be set to permit this?Enamelware
Wouter. hmm i never used that flag. my machine header file for gcc looks like #define BSS_SECTION_ASM_OP "\t.section\t.bss, \"aw\""Unplug
You're both right - in for gcc it's enough to know the symbol should go into .bss. ld will take care to set the NOBITS on this section and thus indicate "there's no data in file - zero it".Kylakylah
Well explained. I have another question: do you know which toolkit to use to dump the addresses of all static or global variables? For example, in your code above, the toolkit should give that address of 'a' as '0x08049580'. I tried readelf -s, but it works only for a single obj, while i want to get those addresses from the final executable. Thanks!Compeer
Nice explanation. Where did you get all this info? Is there any man/doc/book recommended?Escalera
@kikeenrique: ELF file format specification + experience.This guy has a lot of.Torytoryism
P
21

The .bss section in an ELF file is used for static data which is not initialized programmatically but guaranteed to be set to zero at runtime. Here's a little example that will explain the difference.

int main() {
    static int bss_test1[100];
    static int bss_test2[100] = {0};
    return 0;
}

In this case bss_test1 is placed into the .bss since it is uninitialized. bss_test2 however is placed into the .data segment along with a bunch of zeros. The runtime loader basically allocates the amount of space reserved for the .bss and zeroes it out before any userland code begins executing.

You can see the difference using objdump, nm, or similar utilities:

moozletoots$ objdump -t a.out | grep bss_test
08049780 l     O .bss   00000190              bss_test1.3
080494c0 l     O .data  00000190              bss_test2.4

This is usually one of the first surprises that embedded developers run into... never initialize statics to zero explicitly. The runtime loader (usually) takes care of that. As soon as you initialize anything explicitly, you are telling the compiler/linker to include the data in the executable image.

Proverb answered 4/3, 2009 at 14:28 Comment(2)
on my platform gcc put bss_test2 into .bss section. you could have mentioned the -fno-zero-initialized-in-bss compiling option that controls this.Arcade
From the manual: "If the target supports a BSS section, GCC by default puts variables that are initialized to zero into BSS."Qadi
G
3

A .bss section is not stored in an executable file. Of the most common sections (.text, .data, .bss), only .text (actual code) and .data (initialized data) are present in an ELF file.

Guyette answered 4/3, 2009 at 14:19 Comment(3)
That's not what readelf on an arbitrary executable tells me. There's a horseload of sections in the file, including the .bss section.Enamelware
It does not depend on ELF itself but on your compilation chain (languages, tools, options such as debug, ...). You can also have your own custom sections.Guyette
The .bss section is stored in the executable file for at least ELF. But it's contents is not stored, so the size of .bss in the file is a small constant. On operating systems with memory protection the .bss section needs to be stored in some way so the loader can arrange for writable memory at that location. Of course it would be thinkable that all that is left over of .bss in some formats is a contribution to a allocated but not copied size field.Ism
C
1

That is correct, .bss is not present physically in the file, rather just the information about its size is present for the dynamic loader to allocate the .bss section for the application program. As thumb rule only LOAD, TLS Segment gets the memory for the application program, rest are used for dynamic loader.

About static executable file, bss sections is also given space in the execuatble

Embedded application where there is no loader this is common.

Suman

Commensal answered 27/3, 2009 at 17:38 Comment(1)
you say, TLS are loaded too, as PT_LOAD? I see that PT_TLS is included in PT_LOADLeiva

© 2022 - 2024 — McMap. All rights reserved.