How to include data object files (images, etc.) in program and access the symbols?
Asked Answered
M

4

9

I've turned a couple of resource files into .obj files using objcopy and i link them with my programs source code. I can very well access the symbols inside the object file in my program with the following code, but only with GCC/G++ (Cygwin):

extern uint8_t data[]   asm("_binary_Resources_0_png_start");
extern uint8_t size[]   asm("_binary_Resources_0_png_size");
extern uint8_t end[]    asm("_binary_Resources_0_png_end");

The code doesn't work in Visual Studio, probably because VS has it's own __asm command. I want to include my programs resources (Images, Shaders, etc.) in my final executable's .data section through linking them.

But how can i access the symbols defined in the object file in VC++? I tried extern uint8_t _binary_Resources_0_png_start[] or extern "C" uint8_t _binary_Resources_0_png_start[] without the assembly command, but i get unresolved symbol link errors.

Major answered 21/11, 2017 at 13:42 Comment(3)
maybe calling the symbols _data, _size ... would help. you could get rid of the asm part then. I did the same thing, but created asm files from binary instead of using objcopy, that gives control on the symbol names.Faubion
@Jean-FrançoisFabre I tried having the variables in my program have the same name as their corresponding symbol name, but to no avail.Major
You tagged this question C and C++. I assumed "C". I modified my answer to assume you really mean C++ since the bottom of your answer suggests that is what you are really using. My answer was amended to add extern "C" to each external variable.Hobson
M
1

After working around and testing different things, i came back to my original approach (linking) and it worked like magic, here is the details:

In order to include data in the final executable's .data section, you need to first turn that data files (which could be an arbitrary binary file (anything!)) into a linkable file format, also known as an object file.

The tool objcopy which is included in GNU Binutils and is accessible in windows through Cygwin or MinGW, takes a file and produces an object file. objcopy requires two things to know before generating the object file, the output file format and the output architecture. In order to determine these two things, i check a valid linkable object file with the tool objdump:

objdump -f main.o

This gives me the following information:

main.o:     file format pe-x86-64
architecture: i386:x86-64, flags 0x00000039:
HAS_RELOC, HAS_DEBUG, HAS_SYMS, HAS_LOCALS
start address 0x0000000000000000

With this knowledge now i can create the object file:

objcopy -I binary -O pe-x86-64 -B i386 data_file.data data_file_data.o

In order to handle large number of files, batch files could come in handy.

I then simply link the produced object file(s) together with my programs source and dereference the pointers that objcopy generated, through the symbols, whose names could easily be queried with:

objdump -t data_file_data.o

Which results in:

data_file_data.o:     file format pe-x86-64

SYMBOL TABLE:
[  0](sec  1)(fl 0x00)(ty  0)(scl  2) (nx 0) 0x0000000000000000 _binary_data_file_data_start
[  1](sec  1)(fl 0x00)(ty  0)(scl  2) (nx 0) 0x0000000000000006 _binary_data_file_data_end
[  2](sec -1)(fl 0x00)(ty  0)(scl  2) (nx 0) 0x0000000000000006 _binary_data_file_data_size

Practically speaking, the following code works with GCC/G++:

extern uint8_t data[]   asm("_binary_data_file_data_start");
extern uint8_t end[]    asm("_binary_data_file_data_end");

And the following with MSVC++:

extern "C" uint8_t _binary_data_file_data_start[]; // Same name as symbol
extern "C" uint8_t _binary_data_file_data_end[];   // Same name as symbol

The size of each each file is calculated with:

_binary_data_file_data_end - _binary_data_file_data_start

You could for example write the data back into a file:

FILE* file;

file = fopen("data_file_reproduced.data", "wb");
fwrite(_binary_data_file_data_start,                               //Pointer to data
       1,                                                          //Write block size
       _binary_data_file_data_end - _binary_data_file_data_start,  //Data size
       file);

fclose(file);
Major answered 22/11, 2017 at 6:3 Comment(2)
For read-only data, you should use the example from the man page: --rename-section .data=.rodata,alloc,load,readonly,data,contents. That will put the symbols in section .rodataAlla
Your question originally didn't say this was for 64-bit code or 32-bit code. I have redone my answer to point out the difference between 32-bit and 64 WinPE formats and the little difference in name decorating (namely that pe-x86-64 format doesn't name decorate by adding an additional _ to global labels). There is still no need for asm directive when using G++ either.Hobson
S
5

The trick with objcopy isn't meant as a full-featured way to embed resources and isn't portable at all, as you have seen.

Microsoft has its own mechanism for resources, so if you're specifically targeting windows, you could use a windows resource file and the RCDATA resource.

If you want something completely portable, your only option is to format the file as C sourcecode like e.g.

const uint8_t my_binary[] = { 0x00, 0x01, ... }

It's straight forward to write your own conversion tool for that.

Sailesh answered 21/11, 2017 at 14:3 Comment(4)
@Major please keep it in english :) It's the only portable way and the size in the resulting binary is the same, so why care? Source sizes can be huge, oh well ...Sailesh
And once you have the source-code generator you can generate the C source on the fly anyway, so the big ugly files never need to be added to source control. If they're only local and transient, it really doesn't matter.Hairstyle
objcopy is portable, the way the externs is done in this code (and some tutorials) is non-portable and not the correct way of doing them. The ASM directive isn't needed at all if done properly.Hobson
@MichaelPetch I don't consider objcopy "portable". It creates a plain object file in several supported formats (e.g. not including the format my C64 compiler uses g -- but probably some others as well) using symbol names that might have to be referenced differently on different platforms. Maybe call it limited portability.Sailesh
L
4

It is a may be completely different approach but it provides a rather simple but portable solution:

We use a small tool to load a binary file and output it as C (or C++ source). Actually, I saw things like this in XPM and GIMP but it can be used for rather any binary data.

To include such tool in the build chain is not difficult in VS, even more simple in make and cmake also.

Such a tool could look like this:

#include <fstream>
#include <iostream>
#include <string>

using namespace std;

int main(int argc, char **argv)
{
  if (argc < 2) {
    cerr << "Usage: " << argv[0] << " FILE [FILE...]" << endl;
    return -1;
  }
  for (size_t i = 1; i < argc; ++i) {
    fstream fIn(argv[i], ios::in | ios::binary);
    if (!fIn.good()) {
      cerr << "ERROR: Cannot open '" << argv[i] << "'!" << endl;
      continue;
    }
    // make name
    string name = argv[i];
    name = name.substr(0, name.find('.'));
    /// @todo more sophisticated name mangling?
    // print preface
    cout << "struct { const char *data; size_t size; } " << name << " = {" << endl
      << "  \"";
    // print data
    const char hex[] = "0123456789abcdef";
    unsigned char byte;
    enum { BytesPerLine = 16 };
    size_t n = 0;
    for (unsigned char byte; fIn.get((char&)byte); ++n) {
      if (n && !(n % BytesPerLine)) cout << "\"\n  \"";
      cout << "\\x" << hex[byte / 16] << hex[byte % 16];
    }
    // print size
    cout << "\",\n"
      "  " << n << "\n"
      "};" << endl;
  }
  return 0;
}

Compiling and test:

$ g++ -std=c++11 -o binToC binToC.cc

$ ./binToC
Usage: ./binToC FILE [FILE...]

More testing with fluffy_cat.png fluff_cat.png:

$ ./binToC fluffy_cat.png > fluffy_cat.inc

$ cat >fluffy_cat_test.cc <<'EOF'
> #include <fstream>
> 
> using namespace std;
> 
> #include "fluffy_cat.inc"
> 
> int main()
> {
>   ofstream fOut("fluffy_cat_test.png", ios::out | ios::binary);
>   fOut.write(fluffy_cat.data, fluffy_cat.size);
>   fOut.close();
>   return 0;
> }
> EOF

$ g++ -std=c++11 -o fluffy_cat_test fluffy_cat_test.cc

$ ./fluffy_cat_test

$ diff fluffy_cat.png fluffy_cat_test.png

$

As the diff shows – the C source reproduces the original exactly.

Btw. I used the same technique (in similar form) in my answer to SO: Paint a rect on qglwidget at specifit times.

Languishing answered 21/11, 2017 at 14:10 Comment(2)
As there is some non-empty intersection in the answer of Felix Palmen and mine, I've invested some additional effort and added a code sample.Priestley
this doesn't work with non-trivial files since MSVC has a 64k character limit on literals.Jose
H
2

Your question originally didn't state whether this is for 64-bit Cygwin G++/MSVC++ or 32-bit. There is a subtle difference when it comes to name decorations.


x86 (32-bit Windows PE) solution with OBJCOPY

I'll assume you had a resource file called Resources_0.png. You can generate a 32-bit Windows PE object file with:

objcopy --prefix-symbol=_ --input-target binary --output-target \
    pe-i386 --binary-architecture i386 Resources_0.png Resources_0.obj

The --prefix-symbol=_ appends an additional underscore (_) to each label. Name decorating with an additional _ is standard for Win32/PE external object. The resulting file would have produced an object with these labels:

__binary_Resources_0_png_start
__binary_Resources_0_png_end
__binary_Resources_0_png_size

MSVC++ and Cygwin G++ targeting 32-bit executables can reference these labels as:

extern "C" uint8_t _binary_Resources_0_png_start[];
extern "C" uint8_t _binary_Resources_0_png_end[];
extern "C" uint8_t _binary_Resources_0_png_size[];

x86-64 (64-bit Windows PE) solution with OBJCOPY

You can generate a 64-bit Windows PE object file with:

objcopy --input-target binary --output-target pe-x86-64 --binary-architecture i386 \
    Resources_0.png Resources_0.obj

This is similar to the 32-bit however we no longer add an additional underscore (_) before each label. That is because in 64-bit PE code the names aren't decorated with an additional underscore.

The resulting file would have produced an object with these labels:

_binary_Resources_0_png_start
_binary_Resources_0_png_end
_binary_Resources_0_png_size

MSVC++ and Cygwin G++ targeting 64-bit Windows PE executables can reference these labels the exact same was as the 32-bit Windows PE version above:

extern "C" uint8_t _binary_Resources_0_png_start[];
extern "C" uint8_t _binary_Resources_0_png_end[];
extern "C" uint8_t _binary_Resources_0_png_size[];

Special note: When compiling with MSVC++ as 64-bit code you may end up with this linking error when using the size label:

absolute symbol '_binary_Resources_0_png_size' used as target of REL32 relocation in section 4

With 64-bit code you can avoid this by computing the size in your C++ code by using the difference between the start and end labels like this:

size_t binary_Resources_0_png_size = _binary_Resources_0_png_end - \
                                     _binary_Resources_0_png_start;

Other Observations

Even if using G++/GCC this is bad form:

extern uint8_t data[]   asm("_binary_Resources_0_png_start");
extern uint8_t size[]   asm("_binary_Resources_0_png_size");
extern uint8_t end[]    asm("_binary_Resources_0_png_end");

There is little need for doing this and it is less portable. See the solutions above that don't use asm directive on variables for G++ code.


The question is tagged both C and C++ and the question contains code with extern "C". The answer above assumes you are compiling .cpp files with G++/MSVC++. If compiling .c files with GCC/MSVC then change extern "C" to extern


If you want to generate Windows PE objects with OBJCOPY where the data is placed in the read-only .rdata section rather than .data section, you can add this option to the OBJCOPY commands above:

--rename-section .data=.rdata,CONTENTS,ALLOC,LOAD,READONLY,DATA

I discuss this option in this Stackoverflow answer. The difference being that in Windows PE the read-only section is usually called .rdata where as with ELF objects it is .rodata

Hobson answered 21/11, 2017 at 17:23 Comment(11)
The ABI on Linux ELF doesn't prepend a leading _, but it's inconvenient to get objcopy to not prepend it . --remove-leading-char doesn't do anything when copying from elf64-x86-64 to elf64-x86-64, or when creating from a binary. --redefine-sym old=new does work, but you need to explicitly rename all three symbols.Alla
Another way might be to first create an object file in a format that does normally use leading underscore, then --remove-leading-char would remove it when copying to elf64-x86-64? I can see why people use asm("_binary_Resources_0_png_start"), even though that's ugly and not portable outside of GNU C/C++, because it does make your code portable to any platform with GNU tools.Alla
@PeterCordes : He isn't using Linux ELF, he's using Windows PE and Cygwin tools. Using objcopy puts the _ on, the Cygwin GCC/LD use PE and implicitly append by default the underscore.Hobson
Sorry, I forgot to say what I was talking about. I meant that your recommendation to never use asm("_binary_Resources_0_png_start") isn't as easy to follow on ELF platforms, because objcopy still prepends the leading _. So the same source isn't portable between Windows and Linux, without the asm("_blah") or with a more complicated way to use objcopy depending on the ABI.Alla
@PeterCordes : You can actually create a C macro that checks the GCC you are using and determine if an underscore is requires and then wrap externs with it of this nature if you so choose. It is not necessary to use ASM here if you actually try. But given the fact that isn't what was asked I didn't produce code for that as it doesn't answer the question. The reality is that in this case where the user is using Cygwin G++ or MSVC++ Linux what my answer has is correct. I don't recall seeing this being Linux vs Windows interoperability.Hobson
Hmm, yeah that could work, too, I guess. I was hoping there was some option to objcopy that I was missing; I don't think I'd ever used it before trying it just now.Alla
@PeterCordes : If you are going to work across platforms then the option is simple IMHO. The objcopy command has to have parameters appropriate to the target platform (you need to specify the pe-i386 target when using windows anyway), if you want to on Windows builds is simply add --prefix-symbol=_ to the objcopy command line used to generate the resource file with the extra underscore.Hobson
Ah yes, that works. --prefix_symbol=foo gives me stuff like foo_P1000006_JPG_end on a no-underscore platform, and I could use _foo on underscore platforms to get the same C name. (I'd rather not prepend an extra _ and always have C variable names that start with _.)Alla
For read-only data, the man page has an example: --rename-section .data=.rodata,alloc,load,readonly,data,contents. That will put the symbols in section .rodata, and set the section metadata appropriately.Alla
@PeterCordes : There is already an SO question and answer about that: #42235675 . Should also point out that on Windows/PE that the readonly section is .rdata (not .rodata and there may be a number of them that start with the .rdata.Hobson
A portable way to do this using GNU tools would be to use ld to make the object and then objcopy to rename/redefine, etc. ld -r -b binary -o $@ $< and then objcopy --rename-section .data... $@ $@ `Peag
M
1

After working around and testing different things, i came back to my original approach (linking) and it worked like magic, here is the details:

In order to include data in the final executable's .data section, you need to first turn that data files (which could be an arbitrary binary file (anything!)) into a linkable file format, also known as an object file.

The tool objcopy which is included in GNU Binutils and is accessible in windows through Cygwin or MinGW, takes a file and produces an object file. objcopy requires two things to know before generating the object file, the output file format and the output architecture. In order to determine these two things, i check a valid linkable object file with the tool objdump:

objdump -f main.o

This gives me the following information:

main.o:     file format pe-x86-64
architecture: i386:x86-64, flags 0x00000039:
HAS_RELOC, HAS_DEBUG, HAS_SYMS, HAS_LOCALS
start address 0x0000000000000000

With this knowledge now i can create the object file:

objcopy -I binary -O pe-x86-64 -B i386 data_file.data data_file_data.o

In order to handle large number of files, batch files could come in handy.

I then simply link the produced object file(s) together with my programs source and dereference the pointers that objcopy generated, through the symbols, whose names could easily be queried with:

objdump -t data_file_data.o

Which results in:

data_file_data.o:     file format pe-x86-64

SYMBOL TABLE:
[  0](sec  1)(fl 0x00)(ty  0)(scl  2) (nx 0) 0x0000000000000000 _binary_data_file_data_start
[  1](sec  1)(fl 0x00)(ty  0)(scl  2) (nx 0) 0x0000000000000006 _binary_data_file_data_end
[  2](sec -1)(fl 0x00)(ty  0)(scl  2) (nx 0) 0x0000000000000006 _binary_data_file_data_size

Practically speaking, the following code works with GCC/G++:

extern uint8_t data[]   asm("_binary_data_file_data_start");
extern uint8_t end[]    asm("_binary_data_file_data_end");

And the following with MSVC++:

extern "C" uint8_t _binary_data_file_data_start[]; // Same name as symbol
extern "C" uint8_t _binary_data_file_data_end[];   // Same name as symbol

The size of each each file is calculated with:

_binary_data_file_data_end - _binary_data_file_data_start

You could for example write the data back into a file:

FILE* file;

file = fopen("data_file_reproduced.data", "wb");
fwrite(_binary_data_file_data_start,                               //Pointer to data
       1,                                                          //Write block size
       _binary_data_file_data_end - _binary_data_file_data_start,  //Data size
       file);

fclose(file);
Major answered 22/11, 2017 at 6:3 Comment(2)
For read-only data, you should use the example from the man page: --rename-section .data=.rodata,alloc,load,readonly,data,contents. That will put the symbols in section .rodataAlla
Your question originally didn't say this was for 64-bit code or 32-bit code. I have redone my answer to point out the difference between 32-bit and 64 WinPE formats and the little difference in name decorating (namely that pe-x86-64 format doesn't name decorate by adding an additional _ to global labels). There is still no need for asm directive when using G++ either.Hobson

© 2022 - 2024 — McMap. All rights reserved.