dynamically sized classes in c++
Asked Answered
I

4

5

I'd like to create a class Word, which houses a word. I'll have an instance of the class for pretty much every word in the dictionary (so lots) -- and no, I cannot use a tree structure to store this for my particular application. Of course, the size of the strings may vary, and I don't want to kill memory. I'd like to do this in a class, as so:

class Word {
    public:
        ...
    private:
        int         len;
        LetterData  letters[];
};

And then dynamically allocate Word using:

Word *pNewWord = malloc(sizeof(Word)+sizeof(LetterData)*len);   

I realize this isn't very C++'ish. So my questions are: first, is there a better way of doing this, and if not, is this going to cause problems? Word will not inherit any other class types (I'm fairly certain that inheritance would kill this...).

Note: memory usage and speed are very important -- I'd like to avoid an extra pointer per word, and I'd like to avoid an extra pointer deference per access...

Intarsia answered 16/10, 2017 at 12:12 Comment(8)
use std::strings to store your text rather than your class which doesn't seem to do anything more. Also what are you really trying to do here as there is probably a much better way of doing thisEdroi
LetterData letters[] is same as LetterData *lettersDeterrent
1. that malloc won't compile because you would need to cast 2. Mixing malloc and new sounds like a recipe for disaster (malloc does not call constructors and free does not call destructors) 3. std::string and std::vector are the best way to represent strings and dynamically sizing collections.Carpic
What C++ features do you need for this class? Would it be possible to encapsulate it to an opaque pointer with C interface, and hide the implementation in a .c file? I believe C supports the behavior you want to achieve, but C++ does not (unless you are relying on specific compiler support).Holt
@Semyon -- If you do letters[], and then access letters[3], the compiler knows where letters[0] is, so it would hard code a reference to that plus 3. If you do *letters, it would need to read the pointer stored *letters, add three, and deference that (which, unless the array is right next to it, would likley result in a cache miss...).Intarsia
@Intarsia no, Semyon is correct. Those declarations are semantically identical. C++ does not have flexible length structs, what you are attempting is undefined behaviour.Portland
This is not possible in C++. Not in a poetable way. Some compilers may support this as an extension.Rebeccarebecka
You should consider that std::string is a lot smarter than your class. If you have normal sized natural language words, most string implementations would use the Short String Optimization and not have any dynamic allocation at all. Could save one malloc call per word.Paugh
L
5

Using open sized array as the last member of struct is quite common practice in C. And it was standardized in C99 (2.6.7."flexible array member").

C++ standard says nothing about "flexible array members". So it is not guarantee that it works in C++. But the most of the popular compilers support this feature even for C++:

  • gcc
  • msvc
  • clang "Note that clang does support flexible array members (arrays with a zero or unspecified size at the end of a structure)"

So actually it is possible to use it, but this is very dangerous practice. You should try to avoid it, and you need to be very careful if you decide to use it.

There are several potential issues if you are using this technique:

  • Any reallocation or deletion of memory buffer makes your object invalid.
  • Your class must be non-copyable. Any copy (usually implicit copy when you pass your object as an argument) makes the object invalid. You must forbid copy constructor and assignment operator.
  • Inheritance is not possible, you must forbid it.
  • You should initialize all data correctly to avoid garbage (maybe fill buffer with zeros and than use placement new operators)
  • This method is often used for serialization, saving and restoring the objects, but it is not portable. You need to think about sizes of members and big/little endians.

So if it is possible it is better to use something else. The solution with for example vector of LetterData as a member of class.

Lurlene answered 16/10, 2017 at 13:29 Comment(1)
I can't find any reference to "flexible array members" in the C++11 or C++17 standard. Are you sure this is actually available in C++ and not a C-specific feature?Sacramentalism
T
6

My starting point would be to build what would be, for now, little more than a proxy class for std::string:

class Word
{
public:
    std::size_t getSize() const; // in place of your 'len' member.

private:
    std::string m_data;
};

I would then build the rest of my program and only then if there is a performance issue with my Word class cf. other areas of the program would I attempt to refactor it, replacing m_data with something else.

My money is on that being unnecessary and your suspicions around the C++ standard library objects not having adequate performance unfounded.

Trimetric answered 16/10, 2017 at 12:23 Comment(3)
Then why not just std::vector<LetterData>?Tingley
@RustyX because its clearly a piece of text, not a general collectionPortland
Also, std::string usually has small buffer optimization, but std::vector usually doesn't.Amblyoscope
L
5

Using open sized array as the last member of struct is quite common practice in C. And it was standardized in C99 (2.6.7."flexible array member").

C++ standard says nothing about "flexible array members". So it is not guarantee that it works in C++. But the most of the popular compilers support this feature even for C++:

  • gcc
  • msvc
  • clang "Note that clang does support flexible array members (arrays with a zero or unspecified size at the end of a structure)"

So actually it is possible to use it, but this is very dangerous practice. You should try to avoid it, and you need to be very careful if you decide to use it.

There are several potential issues if you are using this technique:

  • Any reallocation or deletion of memory buffer makes your object invalid.
  • Your class must be non-copyable. Any copy (usually implicit copy when you pass your object as an argument) makes the object invalid. You must forbid copy constructor and assignment operator.
  • Inheritance is not possible, you must forbid it.
  • You should initialize all data correctly to avoid garbage (maybe fill buffer with zeros and than use placement new operators)
  • This method is often used for serialization, saving and restoring the objects, but it is not portable. You need to think about sizes of members and big/little endians.

So if it is possible it is better to use something else. The solution with for example vector of LetterData as a member of class.

Lurlene answered 16/10, 2017 at 13:29 Comment(1)
I can't find any reference to "flexible array members" in the C++11 or C++17 standard. Are you sure this is actually available in C++ and not a C-specific feature?Sacramentalism
B
2

The pointers aren't your problem, it's the lack of cache hits from having one object allocated over here and another one allocated elsewhere on the heap for the actual string. If you could allocate it all consecutively you would be in optimal cache hit territory.

I know you said no trees but if you wrote a custom allocator and plugged it into std::map or std::set you could achieve this functionality.

You should check out the cppcon talk on arena allocators.

"CppCon 2017: John Lakos “Local ('Arena') Memory Allocators”

A "simpler" way to achieve this is to maintain two vectors, for Word and a "insert chunk aligned vector for allocating range for LetterData" respectively. Word would get one more field which would be a offset into std::vector (or whatever segment size you choose). You would just placement new into that range and you could also just ignore the destructor, which will speed things up.

I bet once you fatten up your context object it will take just as much space as a map/set node.

Since you'll be stuffing tons of data into this it makes sense to preallocate the memory upfront. Perhaps you can achieve high performance by allocating sets of consecutive ranges in slabs as a opposed to one large consecutive range that will need to be "resized" eventually, which requires creating a new range and copying everything to it.

With all things, measure, measure, and measure before you go off and write a custom stack. Generate a range of random word lengths, search a subset, time it etc.

Bekelja answered 16/10, 2017 at 13:13 Comment(0)
E
0

At a compile time, the compiler must be aware of the class size, so no matter what the class contains, the compiler needs to know the size of each member of that class.

Note, however, that the member letters is a pointer, which has a known size, so even if it points to a memory which can change it's size, it only needs to allocate space to hold that pointer.

If you want to use malloc to allocate the memory, which I don't recommend in C++, it's enough to give it the size of the class:

Word *pNewWord = reinterpret_cast<Word*>(malloc(sizeof(Word)));

However, if you sticked with C++ and used the new operator, the line reduces to:

Word *pNewWord = new Word;

Which not only allocates the memory needed, but also calls the class constructor, which is default in your case.

Ethnology answered 16/10, 2017 at 12:20 Comment(1)
@UnholySheep Thanks for the note, fixed.Ethnology

© 2022 - 2024 — McMap. All rights reserved.