How to load/save C++ class instance (using STL containers) to disk

O

7

6

I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory). It uses an STL list to store information at each node plus iterators to other nodes. Each node has only one parent, but 0-10 children. Abstracted, it looks something like:

struct node {
public:
    node_list_iterator parent;              // iterator to a single parent node
    double node_data_array[X];
    map<int,node_list_iterator> children;   // iterators to child nodes
};

class strategy {
private:
    list<node> tree;        // hierarchically linked list of nodes
    struct some_other_data;
public:
    void build();           // build the tree
    void save();            // save the tree from disk
    void load();            // load the tree from disk
    void use();             // use the tree
};

I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:

I don't know the size in advance;
The data contains iterators, which are volatile;
My ignorance of C++ is prodigious.

Could anyone suggest a pure C++ solution please?

Omar answered 26/4, 2010 at 14:55 Comment(3)

How future-proof do you need to be? Will you be looking at this data in a few years? How about someone else-- will they need to as well? How modifiable is this structure? If it's just for some one-off work, then that's different from a maintainable datastructure (ie, a word file or something, used by lots of people across many versions of a program). – Symptomatic 26/4, 2010 at 15:21

@ "std::list? Yuck! :) – Billy ONeal" -- why? @Symptomatic - It's a one-off project and the data is generated on-the-fly. – Omar 26/4, 2010 at 16:34

if you're not worried about future-proofness, then any of these solutions would work. If you were worried about future-proofness, then you'd also have to think about extensibility and how the needs of the program could change. – Symptomatic 26/4, 2010 at 17:8

C

1

It seems like you could save the data in the following syntax:

File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)

That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.

Reading isn't that much harder. std::list<node> iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set the int child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node>. Once you've patched all child IDs in the respective parents, you can discard the vector.

Crashing answered 26/4, 2010 at 15:12 Comment(1)

Not explicitly. You'd effectively derive that from the position in the file: following its parent node. This is easy to realize when writing, by starting with the root node. And when reading, you recover the child relations by inverting all parent relations. – Crashing 27/4, 2010 at 9:18

S

1

You can use boost.serialization library. This would save entire state of your container, even the iterators.

Simplicidentate answered 26/4, 2010 at 14:58 Comment(3)

I don't think boost::serialization is a help because the members of the list contain iterators to inside the list. Massive recursion problems ahead :) – Freetown 26/4, 2010 at 15:0

it's not a problem if appropriate serialization routine is implemented. – Simplicidentate 26/4, 2010 at 15:10

The fundamental issue with boost::serialization will be the iterator. The rest is already done for the OP, but there isn't any serialization support for iterators in the library, so that would have to be rolled custom. There is periodic chatter on how to do it on the boost lists, but nothing has been done that I can find. – Sepsis 26/4, 2010 at 16:31

M

1

boost.serialization is a solution, or IMHO, you can use SQLite + Visitor pattern to load and save these nodes, but it won't be easy as it sounds.

Mule answered 26/4, 2010 at 15:2 Comment(0)

C

1

It seems like you could save the data in the following syntax:

File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)

That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.

Reading isn't that much harder. std::list<node> iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set the int child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node>. Once you've patched all child IDs in the respective parents, you can discard the vector.

Crashing answered 26/4, 2010 at 15:12 Comment(1)

Not explicitly. You'd effectively derive that from the position in the file: following its parent node. This is easy to realize when writing, by starting with the root node. And when reading, you recover the child relations by inverting all parent relations. – Crashing 27/4, 2010 at 9:18

J

1

Boost Serialization has already been suggested, and it's certainly a reasonable possibility.

A great deal depends on how you're going to use the data -- the fact that you're using a multiway tree in memory doesn't mean you necessarily have to store it as a multiway tree on disk. Since you're (apparently) already pushing the limits of what you can store in memory, the obvious question is whether you're just interested in serializing the data so you can re-constitute the same tree when needed, or whether you want something like a database so you can load parts of the information into memory as needed, and update records as needed.

If you want the latter, some of your choices will also depend on how static the structure is. For example, if a particular node has N children, is that number constant or is it subject to change? If it's subject to change, is there a limit on the maximum number of children?

If you do want to be able to traverse the structure on disk, one obvious possibility would be as you write it out, substitute the file offset of the appropriate data in place of the iterator you're using in memory.

Alternatively, since it looks like (at least most of) the data in an individual node has a fixed size, you might create a database-like structure of fixed-size records, and in each record record the record numbers of the parent/children.

Knowing the overall size in advance isn't particularly important (offhand, I can't think of any way I'd use the size even if it was known in advance).

Jovian answered 26/4, 2010 at 15:15 Comment(0)

F

1

Actually, I think your best option is to move the entire data structure into database tables. That way you get the benefit of people much smarter then you (or me) having dealt with issues of serialization. It will also prevent you from having to worry about whether the structure can fit into memory.

Fragmental answered 26/4, 2010 at 16:12 Comment(0)

D

0

I've answered something like this on SO before, so I will summarize:
1. Use a database.
2. Substitute file offsets for links (pointers).
3. Store the data without the tree structure, in records, as a database would.
4. Use XML to create the tree structure, using node names instead of links.
5. This would be soooo much easier if you used a database like SqLite or MySQL.

When you spend too much time on the "serialization" and less on the primary purpose of your project, you need to use a database.

Doublure answered 26/4, 2010 at 16:55 Comment(1)

My application involves traversing the tree ~10^8 times in a few seconds (as part of an optimisation). So, I need to have the whole tree in memory. I could save & load using a database, but not working with the tree in memory would be too slow for my application. – Omar 26/4, 2010 at 17:12

C

-1

If you're doing it for persistence then there are several solutions you can use from the web i.e. google "persist std::list" or you can roll your own using mmap to create a file backed memory area.

Clincher answered 28/7, 2020 at 20:40 Comment(1)

Please make sure to provide complete and useful answers at best with example code in the future. Also google provides different result f.e. depending on your country, your search history and your general activity and other criteria. If you found something useful make sure to provide links or better write a summering answer. If you put work in it and it helps/solve the question it will reflect on your reputation. I hope this helps - please edit your answer. – Malformation 28/7, 2020 at 23:16

Recommended topics

Hot tags