Trie implementation [closed]

B

12

67

Is there any speed- and cache-efficient implementations of trie in C/C++? I know what a trie is, but I don't want reinvent the wheel, implementing it myself.

Bridgeport answered 24/6, 2009 at 5:17 Comment(1)

code.google.com/p/simple-trie – Gymnasium 14/4, 2013 at 9:16

H

38

if you are looking for an ANSI C implementation you can "steal" it from FreeBSD. The file you are looking for is called radix.c. It's used for managing routing data in kernel.

Homebrew answered 24/6, 2009 at 9:35 Comment(2)

@Homebrew Link doesn't work. – Anjaanjali 12/6, 2014 at 13:37

try this link for radix.c : opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/net/radix.c – Cuyp 16/7, 2014 at 23:41

C

19

I realize the question was about ready implementations, but for reference...

Before you jump on Judy you should have read "A Performance Comparison of Judy to Hash Tables". Then googling the title will likely give you a lifetime of discussion and rebutals to read.

The one explicitly cache-conscious trie I know of is the HAT-trie.

The HAT-trie, when implemented correctly, is very cool. However, for prefix search you need a sorting step on the hash buckets, which somewhat clashes with the idea of a prefix structure.

A somewhat simpler trie is the burst-trie which essentially gives you an interpolation between a standard tree of some sort (like a BST) and a trie. I like it conceptually and it's much easier to implement.

Carpetbagger answered 14/8, 2009 at 23:38 Comment(0)

M

7

I've had good luck with libTrie. It may not be specifically cache optimized but the performance has always been decent for my applications.

Mandy answered 24/6, 2009 at 5:21 Comment(0)

A

5

GCC ships with a handful of data structures as part of its "Policy-based data structures". This includes a few trie implementations.

http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

Aba answered 17/5, 2011 at 19:20 Comment(2)

Could you, please, provide a sample with #include? Thank you for the answer. – Lathi 14/9, 2016 at 18:38

Sample use of trie github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/… – Mammon 9/7, 2017 at 10:53

T

4

References,

A Double-Array Trie implementation article (includes a C implementation)
TRASH - A dynamic LC-trie and hash data structure -- (a 2006 PDF reference describing a dynamic LC-trie used in the Linux kernel to implement address lookup in the IP routing table

Tatianna answered 24/6, 2009 at 5:28 Comment(0)

M

4

Judy arrays: Very fast and memory efficient ordered sparse dynamic arrays for bits, integers and strings. Judy arrays are faster and more memory efficient than any binary-search-tree (incl. avl & red-black-trees).

Maximo answered 24/6, 2009 at 11:33 Comment(0)

P

3

Cedar, HAT-Trie, and JudyArray is quite awesome, you can find the benchmark here.

benchmark result

Pena answered 27/2, 2015 at 5:44 Comment(0)

M

2

You can also try TommyDS at http://tommyds.sourceforge.net/

See the benchmarks page on the site for a speed comparison with nedtries and judy.

Minatory answered 10/1, 2011 at 20:2 Comment(1)

(FYI I'm the author of nedtries) Note that the benchmarks above used the C version of nedtries. The C++ version is about 15% faster and if I'm reading the graph right, would be only slightly slower than TommyDS's version if built as C++. That said, he has far lower overhead per node than I do. I deliberately go overboard in metadata to enable really deep assertion checking during debug operation :) – Bouilli 17/4, 2012 at 14:17

D

1

Cache optimizations are something you'll probably are going to have to do, because you'll have to fit the data into a single cacheline which generally is something like 64 bytes (which will probably work if you start combining data, such as pointers). But it's tricky :-)

Dragoman answered 24/6, 2009 at 10:9 Comment(0)

C

0

Burst Trie's seem to be a bit more space efficient. I'm not sure how much cache-efficiency you can get out of any index since CPU-caches are so tiny. However, this kind of trie is plenty compact enough to keep large data sets in RAM (where a regular Trie would not).

I wrote a Scala implementation of a burst trie that also incorporates some space-saving techniques that I found in GWT's type-ahead implementation.

https://github.com/nbauernfeind/scala-burst-trie

Chimerical answered 29/4, 2014 at 18:48 Comment(1)

My downvote is unintentional and is now locked by StackOverflow. I do not know how to undo it. Sorry. – Gisela 20/3, 2016 at 9:34

M

0

I've had very good results (very good balance between performance and size) with marisa-trie compared to several TRIE implementations mentioned with my data set.

https://github.com/s-yata/marisa-trie/tree/master

Merci answered 25/10, 2016 at 10:2 Comment(0)

B

0

Sharing my "fast" implementation for Trie, tested just in basic scenario:

#define ENG_LET 26

class Trie {
    class TrieNode {
    public:
        TrieNode* sons[ENG_LET];
        int strsInBranch;
        bool isEndOfStr;

        void print() {
            cout << "----------------" << endl;
            cout << "sons..";
            for(int i=0; i<ENG_LET; ++i) {
                if(sons[i] != NULL)
                    cout << " " << (char)('a'+i);
            }
            cout << endl;
            cout << "strsInBranch = " << strsInBranch << endl;
            cout << "isEndOfStr = " << isEndOfStr << endl;
            for(int i=0; i<ENG_LET; ++i) {
                if(sons[i] != NULL)
                    sons[i]->print();
            }

        }
        TrieNode(bool _isEndOfStr = false):isEndOfStr(_isEndOfStr), strsInBranch(0) {
            for(int i=0; i<ENG_LET; ++i) {
                sons[i] = NULL;
            }
        }
        ~TrieNode() {
            for(int i=0; i<ENG_LET; ++i) {
                delete sons[i];
                sons[i] = NULL;
            }
        }
    };

    TrieNode* head;
public:
    Trie() { head = new TrieNode();}
    ~Trie() { delete head; }
    void print() {
        cout << "Preorder Print : " << endl;
        head->print();
    }
    bool isExists(const string s) {
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            if(curr->sons[letIdx] == NULL) {
                return false;
            }
            curr = curr->sons[letIdx];
        }

        return curr->isEndOfStr;
    }
    void addString(const string& s) {
        if(isExists(s))
            return;
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            if(curr->sons[letIdx] == NULL) {
                curr->sons[letIdx] = new TrieNode();    
            }
            ++curr->strsInBranch;
            curr = curr->sons[letIdx];
        }
        ++curr->strsInBranch;
        curr->isEndOfStr = true;
    }
    void removeString(const string& s) {
        if(!isExists(s))
            return;
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';

            if(curr->sons[letIdx] == NULL) {
                assert(false);
                return; //string not exists, will not reach here
            }
            if(curr->strsInBranch==1) {    //just 1 str that we want remove, remove the whole branch
                delete curr;
                return;
            }
            //more than 1 son
            --curr->strsInBranch;
            curr = curr->sons[letIdx];
        }
        curr->isEndOfStr = false;
    }

    void clear() {
        for(int i=0; i<ENG_LET; ++i) {
            delete head->sons[i];
            head->sons[i] = NULL;
        }
    }

};

Bahaism answered 15/4, 2017 at 22:50 Comment(0)

Recommended topics

Hot tags