Serialize Data Structures in C
Asked Answered
S

11

45

I'd like a C library that can serialize my data structures to disk, and then load them again later. It should accept arbitrarily nested structures, possibly with circular references.

I presume that this tool would need a configuration file describing my data structures. The library is allowed to use code generation, although I'm fairly sure it's possible to do this without it.

Note I'm not interested in data portability. I'd like to use it as a cache, so I can rely on the environment not changing.

Thanks.


Results

Someone suggested Tpl which is an awesome library, but I believe that it does not do arbitrary object graphs, such as a tree of Nodes that each contain two other Nodes.

Another candidate is Eet, which is a project of the Enlightenment window manager. Looks interesting but, again, seems not to have the ability to serialize nested structures.

Slashing answered 16/12, 2008 at 14:0 Comment(0)
C
17

Check out tpl. From the overview:

Tpl is a library for serializing C data. The data is stored in its natural binary form. The API is small and tries to stay "out of the way". Compared to using XML, tpl is faster and easier to use in C programs. Tpl can serialize many C data types, including structures.

Chickpea answered 16/12, 2008 at 14:9 Comment(2)
Tpl doesn't seem to support nested structures. E.g a Node that can contain two sub-Nodes.Slashing
another problem with tpl is limited floats portability.Manfred
K
10

I know you're asking for a library. If you can't find one (::boggle::, you'd think this was a solved problem!), here is an outline for a solution:

You should be able to write a code generator[1] to serialize trees/graphs without (run-time) pre-processing fairly simply.

You'll need to parse the node structure (typedef handling?), and write the included data values in a straight ahead fashion, but treat the pointers with some care.

  • For pointer to other objects (i.e. char *name;) which you know are singly referenced, you can serialize the target data directly.

  • For objects that might be multiply refernced and for other nodes of your tree you'll have to represent the pointer structure. Each object gets assigned a serialization number, which is what is written out in-place of the pointer. Maintain a translation structure between current memory position and serialization number. On encountering a pointer, see if it is already assigned a number, if not, give it one and queue that object up for serialization.

Reading back also requires a node-#/memory-location translation step, and might be easier to do in two passes: regenerate the nodes with the node numbers in the pointer slots (bad pointer, be warned) to find out where each node gets put, then walk the structure again fixing the pointers.

I don't know anything about tpl, but you might be able to piggy-back on it.


The on-disk/network format should probably be framed with some type information. You'll need a name-mangling scheme.


[1] ROOT uses this mechanism to provide very flexible serialization support in C++.


Late addition: It occurs to me that this is not always as easy as I implied above. Consider the following (contrived and badly designed) declaration:

enum {
   mask_none = 0x00,
   mask_something = 0x01,
   mask_another = 0x02,
   /* ... */
   mask_all = 0xff
};
typedef struct mask_map {
   int mask_val;
   char *mask_name;
} mask_map_t;
mask_map_t mask_list[] = {
   {mask_something, "mask_something"},
   {mask_another, "mask_another"},
   /* ... */
};
struct saved_setup {
   char* name;
   /* various configuration data */
   char* mask_name;
   /* ... */
};

and assume that we initalize out struct saved_setup items so that mask_name points at mask_list[foo].mask_name.

When we go to serialize the data, what do we do with struct saved_setup.mask_name?

You will need to take care in designing your data structures and/or bring some case-specific intelligence to the serialization process.

Kurdish answered 16/12, 2008 at 17:55 Comment(1)
Thanks for a great writeup. I had essentially this plan in my head as the 'existence proof' of the library. The theory is that every library that can be imagined in C has been written. Can't believe it's turning out not to exist. I might try writing this thing over christmas.Slashing
P
6

This is my solution. It uses my own implementation of malloc, free and mmap, munmap system calls. Follow the given example codes. Ref: http://amscata.blogspot.com/2013/02/serialize-your-memory.html

In my approach I create a char array as my own RAM space. Then there are functions for allocate the memory and free them. After creating the data structure, by using mmap, I write the char array to a file.

Whenever you want to load it back to the memory there is a function which used munmap to put the data structure again to the char array. Since it has virtual addresses for your pointers, you can re use your data structure. That means, you can create data structure, save it, load it, again edit it, and save it again.

Palenque answered 25/2, 2013 at 9:52 Comment(2)
This worked very well even for nested tree structures , though some changes were needed to the code to make it compile. Missing include for 'time.h' and a presumed typedef for ushort which is not a standard of the language.Fellah
However, it does not handle pointer arrays so a graph structure containing those is not serialised correctly, only first pointer is serialised.Fellah
B
4

You can take a look on eet. A library of the enlightenment project to store C data types (including nested structures). Although nearly all libs of the enlightenment project are in pre-alpha state, eet is already released. I'm not sure, however, if it can handle circular references. Probably not.

Bolter answered 16/12, 2008 at 22:46 Comment(0)
H
3

http://s11n.net/c11n/

HTH

Huntingdon answered 17/12, 2008 at 14:10 Comment(0)
T
3

you should checkout gwlib. the serializer/deserializer is extensive. and there are extensive tests available to look at. http://gwlib.com/

Tutti answered 15/2, 2011 at 7:21 Comment(0)
S
2

I'm assuming you are talking about storing a graph structure, if not then disregard...

If your storing a graph, I personally think the best idea would be implementing a function that converts your graph into an adjacency matrix. You can then make a function that converts an adjacency matrix to your graph data structure.

This has three benefits (that may or may not matter in your application):

  • adjacency matrix are a very natural way to create and store a graph
  • You can create an adjacency matrix and import them into your applications
  • You can store and read your data in a meaningful way.

I used this method during a CS project and is definitely how I would do it again.

You can read more about adjacency matrix here: http://en.wikipedia.org/wiki/Modified_adjacency_matrix

Sharpe answered 16/12, 2008 at 14:12 Comment(3)
I'm not talking about a graph structure in that sense. My structures are more like a tree, though I don't want to rule out circular references. There is no reason a C library can't serialize that without any processing on my behalf.Slashing
A tree is just a special case of a graph, and with circular references, one could argue that in fact, you have a graph, not a tree.Sharpe
Fine: all C data structures can be viewed as in memory graphs. Now is there a library that can serialize this data?Slashing
M
1

Another option is Avro C, an implementation of Apache Avro in C.

Mistral answered 12/6, 2012 at 15:28 Comment(0)
B
1

Here is an example using the Binn library (my creation):

  binn *obj;

  // create a new object
  obj = binn_object();

  // add values to it
  binn_object_set_int32(obj, "id", 123);
  binn_object_set_str(obj, "name", "Samsung Galaxy Charger");
  binn_object_set_double(obj, "price", 12.50);
  binn_object_set_blob(obj, "picture", picptr, piclen);

  // send over the network
  send(sock, binn_ptr(obj), binn_size(obj));

  // release the buffer
  binn_free(obj);

If you don't want to use strings as keys you can use a binn_map which uses integers as keys.

There is also support for lists, and all these structures can be nested:

  binn *list;

  // create a new list
  list = binn_list();

  // add values to it
  binn_list_add_int32(list, 123);
  binn_list_add_double(list, 2.50);

  // add the list to the object
  binn_object_set_list(obj, "items", list);

  // or add the object to the list
  binn_list_add_object(list, obj);
Breve answered 13/11, 2015 at 6:15 Comment(0)
H
0

In theory YAML should do what you want http://code.google.com/p/yaml-cpp/

Please let me know if it works for you.

Hudgins answered 3/11, 2009 at 21:47 Comment(1)
True. However, YAML C++ will serialize C data structures, despite requiring a C++ compiler. Other SO readers may find this useful.Hudgins
S
0

you can use https://github.com/souzomain/Packer This library serializes data and returns a buffer

example:

PPACKER protocol = packer_init();
packer_add_data(protocol, yourstructure, sizeof(yourstructure));
send(fd, protocol->buffer, protocol->offset, 0); //use the buffer and the size
packer_free(protocol);

you can get the returns using

recv(fd, buffer, size, 0);
size_t offset = 0;
yourstructure data = (yourstructure *)packer_get_data(buffer, sizeof(yourstructure), &offset);
Septempartite answered 19/4, 2023 at 4:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.