How can I initialize a flexible array in rodata and create a pointer to it?
Asked Answered
B

3

8

In C, the code

char *c = "Hello world!";

stores Hello world!\0 in rodata and initializes c with a pointer to it. How can I do this with something other than a string?

Specifically, I am trying to define my own string type

typedef struct {
   size_t Length;
   char Data[];
} PascalString;

And then want some sort of macro so that I can say

const PascalString *c2 = PASCAL_STRING_CONSTANT("Hello world!");

And have it behave the same, in that \x0c\0\0\0Hello world! is stored in rodata and c2 is initialized with a pointer to it.

I tried using

#define PASCAL_STRING_CONSTANT(c_string_constant) \
    &((const PascalString) { \
        .Length=sizeof(c_string_constant)-1, \
        .Data=(c_string_constant), \
    })

as suggested in these questions, but it doesn't work because Data is a flexible array: I get the error error: non-static initialization of a flexible array member (with gcc, clang gives a similar error).

Is this possible in C? And if so, what would the PASCAL_STRING_CONSTANT macro look like?

To clarify

With a C string, the following code-block never stores the string on the stack:

#include <inttypes.h>
#include <stdio.h>

int main(void) {
    const char *c = "Hello world!";

    printf("test %s", c);

    return 0;
}

As we can see by looking at the assembly, line 5 compiles to just loading a pointer into a register.

I want to be able to get that same behavior with pascal strings, and using GNU extensions it is possible to. The following code also never stores the pascal-string on the stack:

#include <inttypes.h>
#include <stdio.h>

typedef struct {
   size_t Length;
   char Data[];
} PascalString;

#define PASCAL_STRING_CONSTANT(c_string_constant) ({\
        static const PascalString _tmpstr = { \
            .Length=sizeof(c_string_constant)-1, \
            .Data=c_string_constant, \
        }; \
        &_tmpstr; \
    })

int main(void) {
    const PascalString *c2 = PASCAL_STRING_CONSTANT("Hello world!");

    printf("test %.*s", c2->Length, c2->Data);

    return 0;
}

Looking at its generated assembly, line 18 is also just loading a pointer.

However, the best code I've found to do this in ANSI C produces code to copy the entire string onto the stack:

#include <inttypes.h>
#include <stdio.h>

typedef struct {
   size_t Length;
   char Data[];
} PascalString;

#define PASCAL_STRING_CONSTANT(initial_value) \
    (const PascalString *)&(const struct { \
        uint32_t Length; \
        char Data[sizeof(initial_value)]; \
    }){ \
        .Length = sizeof(initial_value)-1, \
        .Data = initial_value, \
    }

int main(void) {
    const PascalString *c2 = PASCAL_STRING_CONSTANT("Hello world!");

    printf("test %.*s", c2->Length, c2->Data);

    return 0;
}

In the generated assembly for this code, line 19 copies the entire struct onto the stack then produces a pointer to it.

I'm looking for either ANSI C code that produces the same assembly as my second example, or an explanation of why that's not possible with ANSI C.

Blastosphere answered 26/9, 2019 at 1:17 Comment(7)
Have you considered to write it directly in assembly and inline it into C with the asm keyword?Un
@AkutaHinako That is less portable than using GNU extensions that are supported by all major compilers...Blastosphere
What's the motivation for having the structure in .rodata? What exactly are you trying to accomplish by doing so?Lobel
@Lobel I'm working with an embedded device with not that much memory. Large strings ending up on the stack cause the stack to overflow.Blastosphere
@g.rocket If you're on an embedded device then you probably have a specific build chain, meaning that using implementation specific functionality should be acceptable.Lobel
Yes. And I'm using a compiler that, while it technically supports GNU statement-expressions, has very poor support for debugging them, so I'd like to get rid of them if possible.Blastosphere
@g.rocket What compiler is that? You should mention this in your question.Cascarilla
B
1

This can be done with the statment-expressions GNU extension, although it is nonstandard.

#define PASCAL_STRING_CONSTANT(c_string_constant) ({\
        static const PascalString _tmpstr = { \
            .Length=sizeof(c_string_constant)-1, \
            .Data=c_string_constant, \
        }; \
        &_tmpstr; \
    })

The extension allows you to have multiple statements in a block as an expression which evaluates to the value of the last statement by enclosing the block in ({ ... }). Thus, we can declare our PascalString as a static const value, and then return a pointer to it.

For completeness, we can also make a stack buffer if we want to modify it:

#define PASCAL_STRING_STACKBUF(initial_value, capacity) \
    (PascalString *)&(struct { \
        uint32_t Length; \
        char Data[capacity]; \
    }){ \
        .Length = sizeof(initial_value)-1, \
        .Data = initial_value, \
    }
Blastosphere answered 26/9, 2019 at 18:21 Comment(6)
This works, but I'd love to know if there is a way to do this without GNU extensions.Blastosphere
I don't think so, at least not the way you want it. Maybe ask a question about how to debug statement-expressions with your compiler?Cascarilla
@JL2210 Honestly at this point I don't think so either, but I can't figure out why every C compiler I've tried insists on putting const data that's only accessed via a pointer on the stack and I can't quite shake the feeling that there should be some way to do what I want. Maybe some sort of nasty preprocessor hack that inserts the length of a string into the first 8 characters of it?Blastosphere
I could maybe see that but it would be really nasty.Cascarilla
It would. I put a bit of research into trying to find a way to do that but since sizeof is resolved after the preprocessor runs I can't even figure out how to get the length at preprocess-time.Blastosphere
Well, you could pass each character individually and have a length limit. But even with a 7 char limit that's still terrible to write.Cascarilla
L
1

You can use this macro, which names the name of the variable on its contents:

#define PASCAL_STRING(name, str) \
    struct { \
        unsigned char len; \
        char content[sizeof(str) - 1]; \
    } name = { sizeof(str) - 1, str }

To create such a string. Use it like this:

const PASCAL_STRING(c2, "Hello world!");
Lobel answered 26/9, 2019 at 18:30 Comment(4)
That's a pretty cool idea. Unfortunately, it doesn't behave quite the way I want: looking at the generated assembly it stores c2 on the stack and generates code to copy the string onto the stack: godbolt.org/z/NgAhZKBlastosphere
I was hoping for something that would behave exactly like a C string constant: We get a pointer to the PascalString, which is all stored in rodata like any other static constant.Blastosphere
How about if you add const before the word struct? I'm not a readelf expert, but it did seem to change the size of .rodata when I tried it.Institutive
@Institutive If you look at lines 12 and 13 of godbolt.org/z/nCvmCz , the ldm and stmia instructions are copying this struct from rodata onto the stack. I want a way to use pascal strings without the overhead of copying them onto the stack, just like you can do with normal C strings.Blastosphere
F
-2

I am not sure why you would want to do it, but you could do it this way. This method will store your string in the data segment and gives you a way to access it as a structure. Note that I create a packed structure to ensure that the mapping into the structure always works since I have essentially hard coded the data fields in the const expression below.

#include <stdio.h>

#pragma packed(1)
typedef struct {
   unsigned char Length;
   char Data[];
} PascalString;
#pragma pack()

const unsigned char HELLO[7] = { 
0x06,
'H','E','L','L','O','\0'
};


int main(void) {
        PascalString *  myString = (PascalString *)HELLO;
        printf("I say: %s \n", myString->Data);
}
Frozen answered 15/10, 2019 at 1:2 Comment(3)
This does store myString in rodata, but only because it's defined in file scope.Blastosphere
Also, packing the struct does absolutely nothing, and Length should probably be larger than a char (I have been using size_t).Blastosphere
If you use size_t then it may be 4 bytes or 8 bytes long depending on your architecture. If you want this code to be portable you need to ensure your size variable is a known length since you are making an assumption about alignment when you create the hard coded constant. The reason to use pack is to ensure that the compiler does not use a word for the char variable. Pack tells it that you want the data laid out exactly as you have defined it.Frozen

© 2022 - 2024 — McMap. All rights reserved.