This question follows on from another question I asked before. In short, this is one of my attempts at merging two fully linked executables into a single fully linked executable. The difference is that the previous question deals with merging an object file to a full linked executable which is even harder because it means I need to manually deal with relocations.
What I have are the following files:
example-target.c
:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
puts("1234");
return EXIT_SUCCESS;
}
example-embed.c
:
#include <stdlib.h>
#include <stdio.h>
/*
* Fake main. Never used, just there so we can perform a full link.
*/
int main(void)
{
return EXIT_SUCCESS;
}
void func1(void)
{
puts("asdf");
}
My goal is to merge these two executables to produce a final executable which is the same as example-target
, but additionally has another main
and func1
.
From the point of view of the BFD library, each binary is composed (amongst other things) of a set of sections. One of the first problems I faced was that these sections had conflicting load addresses (such that if I was to merge them, the sections would overlap).
What I did to solve this was to analyse example-target
programmatically to get a list of the load address and sizes of each of its sections. I then did the same for example-embed
and used this information to dynamically generate a linker command for example-embed.c
which ensures that all of its sections are linked at addresses that do not overlap with any of the sections in example-target
. Hence example-embed
is actually fully linked twice in this process: once to determine how many sections and what sizes they are, and once again to link with a guarantee that there are no section clashes with example-target
.
On my system, the linker command produced is:
-Wl,--section-start=.new.interp=0x1004238,--section-start=.new.note.ABI-tag=0x1004254,
--section-start=.new.note.gnu.build-id=0x1004274,--section-start=.new.gnu.hash=0x1004298,
--section-start=.new.dynsym=0x10042B8,--section-start=.new.dynstr=0x1004318,
--section-start=.new.gnu.version=0x1004356,--section-start=.new.gnu.version_r=0x1004360,
--section-start=.new.rela.dyn=0x1004380,--section-start=.new.rela.plt=0x1004398,
--section-start=.new.init=0x10043C8,--section-start=.new.plt=0x10043E0,
--section-start=.new.text=0x1004410,--section-start=.new.fini=0x10045E8,
--section-start=.new.rodata=0x10045F8,--section-start=.new.eh_frame_hdr=0x1004604,
--section-start=.new.eh_frame=0x1004638,--section-start=.new.ctors=0x1204E28,
--section-start=.new.dtors=0x1204E38,--section-start=.new.jcr=0x1204E48,
--section-start=.new.dynamic=0x1204E50,--section-start=.new.got=0x1204FE0,
--section-start=.new.got.plt=0x1204FE8,--section-start=.new.data=0x1205010,
--section-start=.new.bss=0x1205020,--section-start=.new.comment=0xC04000
(Note that I prefixed section names with .new
using objcopy --prefix-sections=.new example-embedobj
to avoid section name clashes.)
I then wrote some code to generate a new executable (borrowed some code both from objcopy
and Security Warrior
book). The new executable should have:
- All the sections of
example-target
and all the sections ofexample-embed
- A symbol table which contains all the symbols from
example-target
and all the symbols ofexample-embed
The code I wrote is:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <bfd.h>
#include <libiberty.h>
struct COPYSECTION_DATA {
bfd * obfd;
asymbol ** syms;
int symsize;
int symcount;
};
void copy_section(bfd * ibfd, asection * section, PTR data)
{
struct COPYSECTION_DATA * csd = data;
bfd * obfd = csd->obfd;
asection * s;
long size, count, sz_reloc;
if((bfd_get_section_flags(ibfd, section) & SEC_GROUP) != 0) {
return;
}
/* get output section from input section struct */
s = section->output_section;
/* get sizes for copy */
size = bfd_get_section_size(section);
sz_reloc = bfd_get_reloc_upper_bound(ibfd, section);
if(!sz_reloc) {
/* no relocations */
bfd_set_reloc(obfd, s, NULL, 0);
} else if(sz_reloc > 0) {
arelent ** buf;
/* build relocations */
buf = xmalloc(sz_reloc);
count = bfd_canonicalize_reloc(ibfd, section, buf, csd->syms);
/* set relocations for the output section */
bfd_set_reloc(obfd, s, count ? buf : NULL, count);
free(buf);
}
/* get input section contents, set output section contents */
if(section->flags & SEC_HAS_CONTENTS) {
bfd_byte * memhunk = NULL;
bfd_get_full_section_contents(ibfd, section, &memhunk);
bfd_set_section_contents(obfd, s, memhunk, 0, size);
free(memhunk);
}
}
void define_section(bfd * ibfd, asection * section, PTR data)
{
bfd * obfd = data;
asection * s = bfd_make_section_anyway_with_flags(obfd,
section->name, bfd_get_section_flags(ibfd, section));
/* set size to same as ibfd section */
bfd_set_section_size(obfd, s, bfd_section_size(ibfd, section));
/* set vma */
bfd_set_section_vma(obfd, s, bfd_section_vma(ibfd, section));
/* set load address */
s->lma = section->lma;
/* set alignment -- the power 2 will be raised to */
bfd_set_section_alignment(obfd, s,
bfd_section_alignment(ibfd, section));
s->alignment_power = section->alignment_power;
/* link the output section to the input section */
section->output_section = s;
section->output_offset = 0;
/* copy merge entity size */
s->entsize = section->entsize;
/* copy private BFD data from ibfd section to obfd section */
bfd_copy_private_section_data(ibfd, section, obfd, s);
}
void merge_symtable(bfd * ibfd, bfd * embedbfd, bfd * obfd,
struct COPYSECTION_DATA * csd)
{
/* set obfd */
csd->obfd = obfd;
/* get required size for both symbol tables and allocate memory */
csd->symsize = bfd_get_symtab_upper_bound(ibfd) /********+
bfd_get_symtab_upper_bound(embedbfd) */;
csd->syms = xmalloc(csd->symsize);
csd->symcount = bfd_canonicalize_symtab (ibfd, csd->syms);
/******** csd->symcount += bfd_canonicalize_symtab (embedbfd,
csd->syms + csd->symcount); */
/* copy merged symbol table to obfd */
bfd_set_symtab(obfd, csd->syms, csd->symcount);
}
bool merge_object(bfd * ibfd, bfd * embedbfd, bfd * obfd)
{
struct COPYSECTION_DATA csd = {0};
if(!ibfd || !embedbfd || !obfd) {
return FALSE;
}
/* set output parameters to ibfd settings */
bfd_set_format(obfd, bfd_get_format(ibfd));
bfd_set_arch_mach(obfd, bfd_get_arch(ibfd), bfd_get_mach(ibfd));
bfd_set_file_flags(obfd, bfd_get_file_flags(ibfd) &
bfd_applicable_file_flags(obfd));
/* set the entry point of obfd */
bfd_set_start_address(obfd, bfd_get_start_address(ibfd));
/* define sections for output file */
bfd_map_over_sections(ibfd, define_section, obfd);
/******** bfd_map_over_sections(embedbfd, define_section, obfd); */
/* merge private data into obfd */
bfd_merge_private_bfd_data(ibfd, obfd);
/******** bfd_merge_private_bfd_data(embedbfd, obfd); */
merge_symtable(ibfd, embedbfd, obfd, &csd);
bfd_map_over_sections(ibfd, copy_section, &csd);
/******** bfd_map_over_sections(embedbfd, copy_section, &csd); */
free(csd.syms);
return TRUE;
}
int main(int argc, char **argv)
{
bfd * ibfd;
bfd * embedbfd;
bfd * obfd;
if(argc != 4) {
perror("Usage: infile embedfile outfile\n");
xexit(-1);
}
bfd_init();
ibfd = bfd_openr(argv[1], NULL);
embedbfd = bfd_openr(argv[2], NULL);
if(ibfd == NULL || embedbfd == NULL) {
perror("asdfasdf");
xexit(-1);
}
if(!bfd_check_format(ibfd, bfd_object) ||
!bfd_check_format(embedbfd, bfd_object)) {
perror("File format error");
xexit(-1);
}
obfd = bfd_openw(argv[3], NULL);
bfd_set_format(obfd, bfd_object);
if(!(merge_object(ibfd, embedbfd, obfd))) {
perror("Error merging input/obj");
xexit(-1);
}
bfd_close(ibfd);
bfd_close(embedbfd);
bfd_close(obfd);
return EXIT_SUCCESS;
}
To summarise what this code does, it takes 2 input files (ibfd
and embedbfd
) to generate an output file (obfd
).
- Copies format/arch/mach/file flags and start address from
ibfd
toobfd
- Defines sections from both
ibfd
andembedbfd
toobfd
. Population of the sections happens separately because BFD mandates that all sections are created before any start to be populated. - Merge private data of both input BFDs to the output BFD. Since BFD is a common abstraction above many file formats, it is not necessarily able to comprehensively encapsulate everything required by the underlying file format.
- Create a combined symbol table consisting of the symbol table of
ibfd
andembedbfd
and set this as the symbol table ofobfd
. This symbol table is saved so it can later be used to build relocation information. - Copy the sections from
ibfd
toobfd
. As well as copying the section contents, this step also deals with building and setting the relocation table.
In the code above, some lines are commented out with /******** */
. These lines deal with the merging of example-embed
. If they are commented out, what happens is that obfd
is simply built as a copy of ibfd
. I have tested this and it works fine. However, once I comment these lines back in the problems start occurring.
With the uncommented version which does the full merge, it still generates an output file. This output file can be inspected with objdump
and found to have all the sections, code and symbol tables of both inputs. However, objdump
complains with:
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
On my system, 1708
of elf.c
is:
BFD_ASSERT (elf_dynsymtab (abfd) == 0);
elf_dynsymtab
is a macro in elf-bfd.h
for:
#define elf_dynsymtab(bfd) (elf_tdata(bfd) -> dynsymtab_section)
I'm not familiar with the ELF layer, but I believe this is a problem reading the dynamic symbol table (or perhaps saying it's not present). For the time, I am trying to avoid having to reach down directly into the ELF layer unless necessary. Is anyone able to tell me what I'm doing wrong either in my code or conceptually?
If it is helpful, I can also post the code for the linker command generation or compiled versions of the example binaries.
I realise that this is a very large question and for this reason, I would like to properly reward anyone who is able to help me with it. If I am able to solve this with the help of someone, I am happy to award a 500+ bonus.
example-embed
) and then statically detour the code of the new binary to link up the original code to the injected code (I have already written a disassembler/CFG analysis engine and I can also edit arbitrary instructions so this injection is the final piece of the puzzle). For the usecases I need to care about, it can be assumed we have access to the source code of the user defined routines but not the target. – Shick.dynsymtab
section? (readelf -WS exename
) – Rangel.dynsym
section and.new.dynsym
section.readelf -WS
results in 'readelf: Error: File contains multiple dynamic symbol tables'. So perhaps I have to manually merge those sections. – ShickSHT_REL
orSHT_RELA
when iterating through sections then reach down into the ELF layer in those cases and perform the merge manually. – Shick