Persisting a trie to a file - C
Asked Answered
H

4

9

I have a trie which I am using to do some string processing. I have a simple compiler which generates trie from some data. Once generated, my trie won't change at run time.

I am looking for an approach where I can persist the trie in a file and load it effectively. I have looked at sqllite to understand how they are persisting b-treebut their file format looks bit advanced and I may not need all of those.

It'd be helpful if someone can provide some ideas to persist and read the trie. I am programming using C.

Hollowell answered 3/4, 2010 at 17:37 Comment(0)
B
12

I did some research and found the following little gems online:

  1. trie.h
  2. trie.c

A working trie with serialization and deserialization. It was originally written for use in Python (there's a corresponding triemodule.c for tying it to Python), but it's pure C; you could mine it for ideas or use it as you wish.

Update:

It appears the links are no longer working. I'll keep the originals up, but here are the links in the wayback machine:

  1. trie.h
  2. trie.c
Barros answered 3/4, 2010 at 17:53 Comment(2)
Looks promising.Let me give it a try.Hollowell
You can find those files in this git: github.com/jamestbrown/biopython/tree/master/BioGlochidiate
T
5

Assuming your entire data structure fits in memory, a recursive serialization approach is simplest. Sqllite works with data structures that won't fit in memory, so it is probably overkill to try copying their methods.

Here is example pseudocode for reading/writing a node. It works by recursively reading/writing the child nodes. It has nothing trie-specific, and should work for other tree data structures as well.

void writeNode(Node *node)
    write node data to file
    write node.numOfChildren to file
    for each child:
        writeNode(child)

Node *readNode()
    Node *node = allocateNewNode()
    read node data from file
    read node.numOfChildren from file
    for (i=0; i<node.numOfChildren; i++)
        Node *child = readNode()
        node.addChild(child)
Tachograph answered 3/4, 2010 at 17:54 Comment(0)
Z
1

If all of your nodes are the same size then you can just enumerate your nodes (root = 0) and write each of them to a file at their index. While writing them you will have to change their references to other nodes to those nodes' indexes, though. You will probably also need a NULL value. You could use -1 or you could use (root = 1) and (NULL = 0).

You will probably also be able to compress these nodes somewhat by changing their pointer fields to be smaller types.

If your nodes are different sizes then it's more complicated.

Zoniazoning answered 5/4, 2010 at 7:46 Comment(0)
M
-1

Once you generated a trie in memory, it's not complex to write it into a file, with mmap, it's like that we re-generate the trie in a continous memory block, or you can write each node into a file in a recursive function.

Metalworking answered 29/10, 2021 at 3:5 Comment(2)
"it's not complex to write it into a file, with nmap" is not a very useful answer. Could you please show how to do it? Same thing for the recursive function approach: what would that look like in practice?Vergara
This answer is not helpful. More details on either of your suggestions would make it a useful answer.Kirbie

© 2022 - 2024 — McMap. All rights reserved.