Where do I find a standard Trie based map implementation in Java? [closed]
Asked Answered
A

14

80

I have a Java program that stores a lot of mappings from Strings to various objects.

Right now, my options are either to rely on hashing (via HashMap) or on binary searches (via TreeMap). I am wondering if there is an efficient and standard trie-based map implementation in a popular and quality collections library?

I've written my own in the past, but I'd rather go with something standard, if available.

Quick clarification: While my question is general, in the current project I am dealing with a lot of data that is indexed by fully-qualified class name or method signature. Thus, there are many shared prefixes.

Antecedency answered 8/3, 2009 at 16:57 Comment(1)
are the strings known in advance? Do they need to be accessed by string only?Bombsight
M
35

You might want to look at the Trie implementation that Limewire is contributing to the Google Guava.

Mountebank answered 8/3, 2009 at 17:51 Comment(3)
It looks like Google-Collections has been superceded by Guava code.google.com/p/guava-libraries, and unfortunately I can't see a Trie class in there anywhere. The Patricia Trie seems to have its own project page now: code.google.com/p/patricia-trieHapless
The Limewire/Google links are a bit of a mess now, too. While I did manage to find code.google.com/archive/p/google-collections/issues/5 with the actual files, note that Apache Commons Collections comes with a number of tries (including a patricia trie). That's the one I'd recommend right now.Ersatz
Also the Apache Commons implementation seems to be from the same place as the Limewire contribution, as the summary comments in the Commons docs for PatriciaTrie are identical to the summary comments in the Limewire-contributed implementation.Ersatz
D
10

There is no trie data structure in the core Java libraries.

This may be because tries are usually designed to store character strings, while Java data structures are more general, usually holding any Object (defining equality and a hash operation), though they are sometimes limited to Comparable objects (defining an order). There's no common abstraction for "a sequence of symbols," although CharSequence is suitable for character strings, and I suppose you could do something with Iterable for other types of symbols.

Here's another point to consider: when trying to implement a conventional trie in Java, you are quickly confronted with the fact that Java supports Unicode. To have any sort of space efficiency, you have to restrict the strings in your trie to some subset of symbols, or abandon the conventional approach of storing child nodes in an array indexed by symbol. This might be another reason why tries are not considered general-purpose enough for inclusion in the core library, and something to watch out for if you implement your own or use a third-party library.

Danyelldanyelle answered 16/4, 2012 at 16:59 Comment(2)
This answer assume I want to implement a trie for strings. A trie is a general data structure, capable of holding arbitrary sequences and providing fast prefix lookups.Homoio
@PaulDraper This answer doesn't assume anything about what you want, since you showed up years after the question was asked. And since the question is specifically about character strings, that's the focus of this answer. Although I do spend a lot of time pointing out that a Java trie would need to be generalized to any type of Comparable.Danyelldanyelle
L
10

Apache Commons Collections v4.0 now supports trie structures.

See the org.apache.commons.collections4.trie package info for more information. In particular, check the PatriciaTrie class:

Implementation of a PATRICIA Trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric).

A PATRICIA Trie is a compressed Trie. Instead of storing all data at the edges of the Trie (and having empty internal nodes), PATRICIA stores data in every node. This allows for very efficient traversal, insert, delete, predecessor, successor, prefix, range, and select(Object) operations. All operations are performed at worst in O(K) time, where K is the number of bits in the largest item in the tree. In practice, operations actually take O(A(K)) time, where A(K) is the average number of bits of all items in the tree.

Lingam answered 20/10, 2014 at 11:54 Comment(0)
B
7

Also check out concurrent-trees. They support both Radix and Suffix trees and are designed for high concurrency environments.

Bylaw answered 21/8, 2013 at 19:53 Comment(1)
As of 2014, this should be the accepted answer. Looks like well-maintained, well-tested, concurrent implementation of tries.Satiety
V
3

I wrote and published a simple and fast implementation here.

Vandal answered 23/7, 2013 at 12:58 Comment(6)
I'd like to like this, but each of your nodes requires 1024 bytes, and represents only one character. Also insertion now takes O(n^2) time because of Java's changed semantics of substring(). This implementation is really not very practical.Kigali
@Stefan Reich, That array space is only for internal nodes which is vanishingly small given how fast Trie trees fan out.Vandal
Thanks for your answer, but I'm not convinced. Tries may not always branch out quickly, in fact they probably won't with real data. Your arrays are also slow to scan for content. We should really use Patricia Tries to have things compact and efficient. I have made my own implementation which I will probably post here shortly. No hard feelings, just trying to optimize :) Many greetingsKigali
My tries can only fan out quickly since redundancies are factored out and stored in the "prefix" member. There is room for lots of different implementations based on what you are trying to optimize. In my case I'm aiming for simple yet practical.Vandal
Ah, I did misunderstand that part of the code. There is so much "Object" and casting that I didn't see it. So it is a Patricia Trie. My bad.Kigali
No problem. At least I learned what a Patricia Trie is. It just seemed like an important optimization even though I was trying to keep it as simple as possible. Best part is that it handles deletions properly, and all of it is well documented.Vandal
H
1

What you need is org.apache.commons.collections.FastTreeMap , I think.

Hospital answered 8/3, 2009 at 17:9 Comment(1)
This doesn't appear to be a trie implementation.Lingam
C
1

Below is a basic HashMap implementation of a Trie. Some people might find this useful...

class Trie {

    HashMap<Character, HashMap> root;

    public Trie() {
        root = new HashMap<Character, HashMap>();
    }

    public void addWord(String word) {
        HashMap<Character, HashMap> node = root;
        for (int i = 0; i < word.length(); i++) {
            Character currentLetter = word.charAt(i);
            if (node.containsKey(currentLetter) == false) {
                node.put(currentLetter, new HashMap<Character, HashMap>());
            }
            node = node.get(currentLetter);
        }
    }

    public boolean containsPrefix(String word) {
        HashMap<Character, HashMap> node = root;
        for (int i = 0; i < word.length(); i++) {
            Character currentLetter = word.charAt(i);
            if (node.containsKey(currentLetter)) {
                node = node.get(currentLetter);
            } else {
                return false;
            }
        }
        return true;
    }
}
Cosine answered 9/12, 2014 at 12:28 Comment(0)
N
1

Apache's commons collections: org.apache.commons.collections4.trie.PatriciaTrie

Necrology answered 27/11, 2015 at 12:44 Comment(1)
R
1

You can try the Completely Java library, it features a PatriciaTrie implementation. The API is small and easy to get started, and it's available in the Maven central repository.

Raymonraymond answered 5/11, 2016 at 11:44 Comment(0)
G
0

You might look at this TopCoder one as well (registration required...).

Goto answered 8/3, 2009 at 17:44 Comment(1)
i did register but that component is unavialable rite now.Bonds
I
0

If you required sorted map, then tries are worthwhile. If you don't then hashmap is better. Hashmap with string keys can be improved over the standard Java implementation: Array hash map

Inartistic answered 21/5, 2012 at 8:52 Comment(0)
S
0

If you're not worried about pulling in the Scala library, you can use this space efficient implementation I wrote of a burst trie.

https://github.com/nbauernfeind/scala-burst-trie

Soonsooner answered 29/4, 2014 at 19:40 Comment(0)
A
0

here is my implementation, enjoy it via: GitHub - MyTrie.java

/* usage:
    MyTrie trie = new MyTrie();
    trie.insert("abcde");
    trie.insert("abc");
    trie.insert("sadas");
    trie.insert("abc");
    trie.insert("wqwqd");
    System.out.println(trie.contains("abc"));
    System.out.println(trie.contains("abcd"));
    System.out.println(trie.contains("abcdefg"));
    System.out.println(trie.contains("ab"));
    System.out.println(trie.getWordCount("abc"));
    System.out.println(trie.getAllDistinctWords());
*/

import java.util.*;

public class MyTrie {
  private class Node {
    public int[] next = new int[26];
    public int wordCount;
    public Node() {
      for(int i=0;i<26;i++) {
        next[i] = NULL;
      }
      wordCount = 0;
    }
  }

  private int curr;
  private Node[] nodes;
  private List<String> allDistinctWords;
  public final static int NULL = -1;

  public MyTrie() {
    nodes = new Node[100000];
    nodes[0] = new Node();
    curr = 1;
  }

  private int getIndex(char c) {
    return (int)(c - 'a');
  }

  private void depthSearchWord(int x, String currWord) {
    for(int i=0;i<26;i++) {
      int p = nodes[x].next[i];
      if(p != NULL) {
        String word = currWord + (char)(i + 'a');
        if(nodes[p].wordCount > 0) {
          allDistinctWords.add(word);
        }
        depthSearchWord(p, word);
      }
    }
  }

  public List<String> getAllDistinctWords() {
    allDistinctWords = new ArrayList<String>();
    depthSearchWord(0, "");
    return allDistinctWords;
  }

  public int getWordCount(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        return 0;
      }
      p = nodes[p].next[j];
    }
    return nodes[p].wordCount;
  }

  public boolean contains(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        return false;
      }
      p = nodes[p].next[j];
    }
    return nodes[p].wordCount > 0;
  }

  public void insert(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        nodes[curr] = new Node();
        nodes[p].next[j] = curr;
        curr++;
      }
      p = nodes[p].next[j];
    }
    nodes[p].wordCount++;
  }
}
Acquiescent answered 26/5, 2014 at 10:42 Comment(0)
D
0

I have just tried my own Concurrent TRIE implementation but not based on characters, it is based on HashCode. Still We can use this having Map of Map for each CHAR hascode.
You can test this using the code @ https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapPerformanceTest.java https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapValidationTest.java

import java.util.concurrent.atomic.AtomicReferenceArray;

public class TrieMap {
    public static int SIZEOFEDGE = 4; 
    public static int OSIZE = 5000;
}

abstract class Node {
    public Node getLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
    public Node createLink(int hash, int level, String key, String val) {
        throw new UnsupportedOperationException();
    }
    public Node removeLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
}

class Vertex extends Node {
    String key;
    volatile String val;
    volatile Vertex next;

    public Vertex(String key, String val) {
        this.key = key;
        this.val = val;
    }

    @Override
    public boolean equals(Object obj) {
        Vertex v = (Vertex) obj;
        return this.key.equals(v.key);
    }

    @Override
    public int hashCode() {
        return key.hashCode();
    }

    @Override
    public String toString() {
        return key +"@"+key.hashCode();
    }
}


class Edge extends Node {
    volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile

    public Edge(int size) {
        array = new AtomicReferenceArray<Node>(8);
    }


    @Override
    public Node getLink(String key, int hash, int level){
        int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
        Node returnVal = array.get(index);
        for(;;) {
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                Vertex node = (Vertex) returnVal;
                for(;node != null; node = node.next) {
                    if(node.key.equals(key)) {  
                        return node; 
                    }
                } 
                return null;
            } else { //instanceof Edge
                level = level + 1;
                index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
                Edge e = (Edge) returnVal;
                returnVal = e.array.get(index);
            }
        }
    }

    @Override
    public Node createLink(int hash, int level, String key, String val) { //Remove size
        for(;;) { //Repeat the work on the current node, since some other thread modified this node
            int index =  Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node nodeAtIndex = array.get(index);
            if ( nodeAtIndex == null) {  
                Vertex newV = new Vertex(key, val);
                boolean result = array.compareAndSet(index, null, newV);
                if(result == Boolean.TRUE) {
                    return newV;
                }
                //continue; since new node is inserted by other thread, hence repeat it.
            } 
            else if(nodeAtIndex instanceof Vertex) {
                Vertex vrtexAtIndex = (Vertex) nodeAtIndex;
                int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1);
                int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1);
                Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1);
                if(newIndex != newIndex1) {
                    Vertex newV = new Vertex(key, val);
                    edge.array.set(newIndex, vrtexAtIndex);
                    edge.array.set(newIndex1, newV);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return newV;
                    }
                   //continue; since vrtexAtIndex may be removed or changed to Edge already.
                } else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) {       HERE newIndex == newIndex1
                    synchronized (vrtexAtIndex) {   
                        boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed.
                        if(result == Boolean.TRUE) {
                            Vertex prevV = vrtexAtIndex;
                            for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) {
                                prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL
                                if(vrtexAtIndex.key.equals(key)){
                                    vrtexAtIndex.val = val;
                                    return vrtexAtIndex;
                                }
                            } 
                            Vertex newV = new Vertex(key, val);
                            prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other.
                            return newV;
                        }
                        //Continue; vrtexAtIndex got changed
                    }
                } else {   //HERE newIndex == newIndex1  BUT vrtex.hash != hash
                    edge.array.set(newIndex, vrtexAtIndex);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return edge.createLink(hash, (level + 1), key, val);
                    }
                }
            }               
            else {  //instanceof Edge
                return nodeAtIndex.createLink(hash, (level + 1), key, val);
            }
        }
    }




    @Override
    public Node removeLink(String key, int hash, int level){
        for(;;) {
            int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node returnVal = array.get(index);
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                synchronized (returnVal) {
                    Vertex node = (Vertex) returnVal;
                    if(node.next == null) {
                        if(node.key.equals(key)) {
                            boolean result = array.compareAndSet(index, node, null); 
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex may be changed to Edge
                        }
                        return null;  //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different. 
                    } else {
                        if(node.key.equals(key)) { //Removing the first node in the link
                            boolean result = array.compareAndSet(index, node, node.next);
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex(node) may be changed to Edge, so try again.
                        }
                        Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous
                        node = node.next;
                        for(;node != null; prevV = node, node = node.next) {
                            if(node.key.equals(key)) {
                                prevV.next = node.next; //Removing other than first node in the link
                                return node; 
                            }
                        } 
                        return null;  //Nothing found in the linked list.
                    }
                }
            } else { //instanceof Edge
                return returnVal.removeLink(key, hash, (level + 1));
            }
        }
    }

}



class Base10ToBaseX {
    public static enum Base {
        /**
         * Integer is represented in 32 bit in 32 bit machine.
         * There we can split this integer no of bits into multiples of 1,2,4,8,16 bits
         */
        BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/ 
        BASE16(15, 4, 8){       
            public String getFormattedValue(int val){
                switch(val) {
                case 10:
                    return "A";
                case 11:
                    return "B";
                case 12:
                    return "C";
                case 13:
                    return "D";
                case 14:
                    return "E";
                case 15:
                    return "F";
                default:
                    return "" + val;
                }

            }
        }, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2);

        private int LEVEL_0_MASK;
        private int LEVEL_1_ROTATION;
        private int MAX_ROTATION;

        Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) {
            this.LEVEL_0_MASK = levelZeroMask;
            this.LEVEL_1_ROTATION = levelOneRotation;
            this.MAX_ROTATION = maxPossibleRotation;
        }

        int getLevelZeroMask(){
            return LEVEL_0_MASK;
        }
        int getLevelOneRotation(){
            return LEVEL_1_ROTATION;
        }
        int getMaxRotation(){
            return MAX_ROTATION;
        }
        String getFormattedValue(int val){
            return "" + val;
        }
    }

    public static int getBaseXValueOnAtLevel(Base base, int on, int level) {
        if(level > base.getMaxRotation() || level < 1) {
            return 0; //INVALID Input
        }
        int rotation = base.getLevelOneRotation();
        int mask = base.getLevelZeroMask();

        if(level > 1) {
            rotation = (level-1) * rotation;
            mask = mask << rotation;
        } else {
            rotation = 0;
        }
        return (on & mask) >>> rotation;
    }
}
Doorkeeper answered 28/10, 2015 at 8:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.