What is the easiest way to find the sizeof a type without compiling and executing code?
Asked Answered
S

5

5

I wrote a bash script to determine the size of gcc's datatypes (e.g. ./sizeof int double outputs the respective sizes of int and double) by wrapping each of its arguments in the following P() macro and then compiling and running the code.

#define P(x) printf("sizeof(" #x ") = %u\n", (unsigned int)sizeof(x))

The problem is that this is relative slow (it takes a whole second!), especially the linking step (since compiling with -c or -S takes virtually no time, and so does running the outputted binary). One second is not really that slow by itself, but if I were to use this script in other scripts, it would add up.

Is there a faster, less roundabout way to find out what sizes gcc uses for datatypes?

Sears answered 4/2, 2015 at 22:31 Comment(2)
Just out of curiosity, when would you ever need to know the size of a C type when you're not already writing & compiling C code?Hagans
run it once and output to a file, read that file when you need to knowGrania
P
7

You can achieve the functionality for standard types using the GCC's preprocessor only. For standard types there are predefined macros:

__SIZEOF_INT__
__SIZEOF_LONG__
__SIZEOF_LONG_LONG__
__SIZEOF_SHORT__
__SIZEOF_POINTER__
__SIZEOF_FLOAT__
__SIZEOF_DOUBLE__
__SIZEOF_LONG_DOUBLE__
__SIZEOF_SIZE_T__
__SIZEOF_WCHAR_T__
__SIZEOF_WINT_T__
__SIZEOF_PTRDIFF_T__

So, by using code like the following:

#define TYPE_TO_CHECK __SIZEOF_INT__
#define VAL_TO_STRING(x) #x
#define V_TO_S(x) VAL_TO_STRING(x)
#pragma message V_TO_S(TYPE_TO_CHECK)
#error "terminate"

you will be able to get the value of __SIZEOF_INT__ from the preprocessor itself without even starting the compilation. In your script you can define the TYPE_TO_CHECK (with -D) to whatever you need and pass it to gcc. Of course you will get some junk output, but I believe you can deal with that.

Puncture answered 4/2, 2015 at 22:53 Comment(14)
I like the idea, although you could just do echo __SIZEOF_INT__ | cpp | tail -1.Normannormand
This wouldn't be able to calculate the size of arbitrary structuresGoggler
@Goggler I can't see a use case based on the question that will inquire about arbitrary structurePuncture
@EugeneSh.: Could just be that you're curious about the effects of padding, no? Maybe you'd like to estimate memory consumption of some algorithm? Who knows.Slimsy
@EugeneSh. true based on original question as long as strictly only the very basic types are needed. Another minor issue with your approach is that you need to manually translate between the real type names that C uses and the names of the macros ("void *" = POINTER). Not a problem if you know what you are looking for, but possibly an added complication if the basic type names passed in from an external source.Goggler
@Normannormand "cpp" isn't guaranteed to be the same preprocessor used by gcc ... on some systems it is not and potentially could yield different results. You may want "gcc -E -" in place of "cpp"Goggler
@Goggler - just to add, the same strategy works with clang: clang -E -Hyphen
As wojtow pointed out, querying arbitrary structure types would definitely be a plus.Sears
@Sears The question is asking about making a script/program callable from command line ./sizeof <type>. How can you pass an arbitrary struct as the <type> here?Puncture
In my version of the script, you can do ./sizeof '(struct{int a,b,c,d;char*e[8];}){}'. I haven't figured out why it needs to be instantiated, though. You can even do ./sizeof 3L or any other literal.Sears
Well, I don't see a way of doing this without compilation or some smart parsing.Puncture
I was hoping gcc might have some kind of built-in command line option for this. Another idea I was thinking about was to use gcc -S and figure out from the assembly code what size things are. That way, no linking would be involved (but a lot of parsing would be).Sears
Yo can do it, and it is actually not that complicated. sizeof is a compile-time operator, so if you put in your file something like volatile const s = sizeof(struct arbStruct); , in assembly file you will get something like: _s: .long 16Puncture
But there is a danger, that your program will run on a platform with totally different assembly directives, so this logic will become invalid.Puncture
H
6

You can use the 'negative array size' trick that autoconf (see: AC_COMPUTE_INT) uses. That way, you don't need to link or execute code. Therefore, it also works when cross compiling. e.g.,

int n[1 - 2 * !(sizeof(double) == 8)];

fails to compile if: sizeof(double) != 8

The downside is, you might have to pass -DCHECK_SIZE=8 or something similar in the command line, since it might take more than one pass to detect an unusual value. So, I'm not sure if this will be any faster in general - but you might be able to take advantage of it.

Edit: If you are using gcc exclusively, I think @wintermute's comment is probably the best solution.

Hyphen answered 4/2, 2015 at 22:52 Comment(2)
This doesn't output the size, it just checks whether it is 8.Sears
@Sears - it's up to you to provide a range of values - AC_COMPUTE_INT itself implements a binary search to converge on the correct value.Hyphen
S
4

Here are three possible solutions.

The first one will work with any type whose size is less than 256. On my system, it takes about 0.04s (since it doesn't need headers or libraries other than the basic runtime). One downside is that it will only do one at a time, because of the small size of the output channel. Another problem is that it doesn't compensate for slow linking on some systems (notably MinGW):

howbig() {
  gcc -x c - <<<'int main() { return sizeof ('$*'); }' && ./a.out
  echo $?
}

$ time howbig "struct { char c; union { double d; int i[3];};}" 
24

real    0m0.041s
user    0m0.031s
sys     0m0.014s

$ time howbig unsigned long long
8

real    0m0.044s
user    0m0.035s
sys     0m0.009s

If you wanted to be able to do larger types, you could get the size one byte at a time, at the cost of a couple more centiseconds:

howbig2 () 
{ 
    gcc -x c - <<< 'int main(int c,char**v) {
                      return sizeof ('$*')>>(8*(**++v&3)); }' &&
    echo $((0x$(printf %02x $(./a.out 3;echo $?) $(./a.out 2;echo $?) \
                            $(./a.out 1;echo $?) $(./a.out 0;echo $?)) ))
}

$ time howbig2 struct '{double d; long long u[12];}([973])'
101192

real    0m0.054s
user    0m0.036s
sys     0m0.019s

If you are compiling for x86, the following will probably work, although I'm not in a position to test it thoroughly on a wide variety of architectures and platforms. It avoids the link step (notoriously slow on MinGW, for example), by analyzing the compiled assembly output. (It would probably be slightly more robust to analyze the compiled object binary, but I fear that binutils on MinGW are also slow.) Even on Ubuntu, it is significantly faster:

howbig3 () { 
  gcc -S -o - -x c - <<< 'int hb(void) { return sizeof ('$*'); }' |
  awk '$1~/movl/&&$3=="%eax"{print substr($2,2,length($2)-2)}'
}

$ time howbig3 struct '{double d; long long u[12];}([973])'
101192

real    0m0.020s
user    0m0.017s
sys     0m0.004s
Slimsy answered 4/2, 2015 at 23:10 Comment(7)
For me, this is not any faster than what I was doing, since it doesn't eliminate the linking step.Sears
In case it wasn't clear, it's not linking with the standard library that is the bottleneck, it is the invocation of the linker itself (on my system at least). A simple int main() { return 0; } program takes just as long to link.Sears
@Sears interesting. what system do you have? I could extract the return value directly from the unlinked object file, but it would be somewhat fragile.Slimsy
Windows with cygwin in this case.Sears
@Matt: MinGW is notoriously slow to link. I added a solution which analyzes the assembly output, but I have no idea how robust it is. Good luck.Slimsy
I still think Eugene's solution in the comments to his answer is better, namely: volatile const s = sizeof(struct arbStruct);, which would output something like _s: .long 16 in the assembly (in my case .comm s, 16, 4).Sears
@Matt:Yes, that would be another alternative. It would be harder to detect in the object file. You'd have to try both of them in a lot of compile environments to figure out which one is more robust. Anyway, it's your question :) Select the answer you prefer.Slimsy
V
1

Using nm with no code

Just make your thing a global variable. nm can report its size.

// getsize.c

struct foo {
    char str[3];
    short s;     // expect padding galore...
    int i;
} my_struct;

Compile but don't link, then use nm:

$ gcc -c getsize.c
$ nm getsize.o --format=posix
my_struct C 000000000000000c 000000000000000c

Note that the last column is the size (in hex), here is how we can get it:

$ nm test.o -P | cut -d ' ' -f 4
000000000000000c

# or in decimal
$ printf %d 0x`nm test.o -P | cut -d ' ' -f 4`
12

 

Using objdump with no code

If nm doesn't work for some reason, you can store the size itself in a global variable.

Start with this C file:

// getsize.c
struct foo { char str[3]; short s; int i; };

unsigned long my_sizeof = sizeof(struct foo);

Now we have to find the value of this variable from the object file.

$ gcc -c sizeof.c
$ objdump -Sj .data sizeof.o

test.o:     file format elf64-x86-64


Disassembly of section .data:

0000000000000000 <my_sizeof>:
   0:   0c 00 00 00 00 00 00 00                             ........

Darn, little endian! You could write a script to parse this, but the following solution (assuming GCC extensions) will force it to always be big endian:

// getsize.c
struct foo { char str[3]; short s; int i; };

struct __attribute__ ((scalar_storage_order("big-endian"))) {
    unsigned long v;
} my_sizeof = { sizeof(struct foo) };

This yields:

0000000000000000 <my_sizeof>:
   0:   00 00 00 00 00 00 00 0c                             ........

Watch out! You can't just strip out all non-hex characters because sometimes the "...." stuff on the right will be valid ASCII. But the first one should always be a .. The following command keeps things between the : and the first ..

$ gcc -c sizeof.c
$ objdump -Sj .data sizeof.o |

        sed '$!d                     # keep last line only
             s/\s//g                 # remove tabs and spaces
             s/.*:\([^.]*\)\..*/\1/' # only keep between : and .'

000000000000000c
Vibraculum answered 13/8, 2020 at 3:25 Comment(0)
U
0

If you happen to be in an IDE like VS2019, you can just type char foo[sizeof(MyType)] anywhere in the code, hover over foo and get the answer :)

Unsought answered 18/11, 2021 at 23:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.