How to create a simple prefix index in Java?
Asked Answered
H

4

5

I have big set of urls and I want to implement an autocompletion. I don't like the complexity of the naive approach as it is linear with the set size:

for(String url: urls) if(url.startsWith(input) {doSomething();}

Now I know that in a Hash Set, the function "contains()" works in "O(1)" but there is no "containsPrefix()". Is there a simple way without using a big library like Lucene or coding it myself? I would have no problem doing it but it seems overkill for such a simple problem so I want to know if there is an existing simple solution :-)

From my computer science classes I remember a tree which consists of string fragments but I forget how it was called. It worked like this:

[car, care, carrot,carrotville]->

car
|
-/
-e
-rrot
  |
  ----ville

P.S.: How do I call the methods that returns all strings that a string is prefix of? Like if a is prefix of b, what is b to a?

Hamitic answered 27/3, 2012 at 10:26 Comment(2)
what you want to do ? automatically add some text in the beginning of every String ?Leap
I want to know which strings my string is a prefix of so I can give them as autocompletion suggestions.Epidaurus
B
2

If you need to efficiently find prefixes of strings, use a Trie, a data structure designed precisely for that purpose:

A trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string

Two links with sample implementations.

Bellanca answered 27/3, 2012 at 10:36 Comment(1)
Perfect! I used the one from forums.oracle.com/forums/thread.jspa?messageID=8787521 and it worked on the first try!Epidaurus
W
1

Long time ago I put a simple Trie implementation here:

http://code.google.com/p/triebag/source/browse/trunk/src/triebag/tries/SimpleTrie.java

However this is not a compact Trie, so it creates one node per character, creating a compact one is a bit trickier.

Winepress answered 27/3, 2012 at 10:44 Comment(3)
This is great! I don't mind if it's one node per character but I will leave the question open just in case someone has one with multiples.Epidaurus
Np, compact version uses about %50 less nodes (At least for Turkish words in a dictionary) This is t he test code, so you can see it in action, I hope there are no bugs :) code.google.com/p/triebag/source/browse/trunk/test/triebag/…Winepress
I tried out your SimpleTrie but it doesn't seem to work for me. First the constructor was not public and after I changed that, the following test returned nothing: SimpleTrie<String> trie = new SimpleTrie<>(); trie.add("x","x"); trie.add("xy","xy"); Iterator it = trie.getItemsWithPrefix("x"); while(it.hasNext()) System.out.println(it.next());Epidaurus
A
1

A great alternative algo is a ternary search tree (more memory efficient) https://github.com/varunpant/TernaryTree/tree/master/TernaryTree

here is a trie in java http://algs4.cs.princeton.edu/52trie/TrieST.java.html

Avicenna answered 18/6, 2013 at 20:57 Comment(0)
C
1

The Regexp implementation java.util.regex.Pattern can efficiently handle prefixes:

StringBuilder buffer = new StringBuilder();
for (String prefix : prefixes) {
    if (buffer.length() > 0)
        buffer.append("|");
    buffer.append(prefix);
}
Pattern prefixPattern = Pattern.compile("^(" + buffer + ")");

You can test all prefixes:

boolean containsPrefix = prefixPattern.matcher(stringToTest).find();

Note: for simplicity, prefix strings are not escaped. Regexp characters [, ], \, *, ?, $, ^, (, ), {, } and | have to be prefixed by \.

Couperin answered 9/7, 2014 at 14:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.