Which data structure is most suitable to implement a Dictionary? [closed]
Asked Answered
B

1

6

I have to write a Dictionary program as a semester project for an undergraduate course on Data Structures and Algorithms, and I am expected to find the most suitable solution (Data Structure) to the problem.

I considered using either a hash table or a trie. I have been suggested to use treaps by someone but have not been able to look into them yet.

My database has about 100k distinct words and their meanings. The basic functionalities the program is expected to provide are insert, update, remove and search a word/definition. If I manage to squeeze in auto-completion and spell correction, that'd be an added bonus.

So, my question is, keeping in mind my requirements, which data structure would be best suited for my purposes. When I say 'best', I am asking for the data structure which has best runtime complexity and low cost (memory requirements).

Also, I wanted to be able to have an algorithm which returned all words starting with the given prefix. For example, say I make a function call dictionary.getWordsStartingWith("fic") it should return a list of all words that start with fic such as fiction, fictitious,fickle etc. I know I can do this if I implement my dictionary as a trie, I could do this, but is this possible to do it with a hash table?

Biota answered 26/12, 2015 at 17:38 Comment(6)
Did you look into a balanced search tree? Like red-black or avlStela
The auto-completion requirement points in the direction of some form of tree or trie, and away from hash table. A hash table won't help in the task of finding all words that start with a given prefix, while a tree-like structure would give this more or less for free.Paresthesia
@Smac89 Yes, I looked at AVL trees but I think they might not be as efficient as a hash table. But again, hash table limits my functionality.Biota
@IgorTandetnik So, should I decide to use a trie then, or are there any other tree-like structures that may be better than a trie?Biota
@سیفخان A hash-table is only as good as it's hash function, and as IgorTandetnik has pointed out, hash-table is not suitable for auto-completion.Stela
Spell correction is a dynamic programming algorithm, but you can use a trie for checking the candidates in auto-completion.Driskell
C
3

You almost certainly want a trie if you want to do auto completion/prefix matching. Hash tables don't really make this possible; in fact good hash functions are designed such that even very similar keys (e.g. same prefix) map to completely different parts of the array. For hashing purposes this is considered a feature.

Treaps are basically binary search trees that use stochasticity + heap property to do their balancing. In general the interface is the standard BST tree interface; so it's really just an implementation detail that only leads to moderately different properties than a red black tree or an AVL tree.

BST's aren't nearly as suited to the problems that you seem to be looking to solve as a trie. BST's tend to be all about following inequalities downwards, whereas trie's are about following equalities downward. When you're dealing with numeric data, inequality comparisons are everything because equality is very rare (since the space of possibilities is huge). With strings, each character has very few possibilities so it makes more sense to exploit equalities, leading to optimizations like not actually storing keys at most nodes.

In summary, I'd recommending proceeding with tries. They're very heavily used for exactly this sort of thing, and you can find a ton of resources on optimizing them (especially for space) since they're particularly used for text input on mobile where space/cycles are at a premium. It's also a very interesting data structure to learn IMHO, compared to BST's which you a) probably learned about heavily in freshman data structures, and b) Isn't really that interesting of a data structure; everything other than the balancing scheme is trivial and the balancing schemes are more tedious than anything else (RB trees have something like 7 truly distinct cases for balancing or something like that, pretty hard to code a RB tree and get them all exactly right).

The wikipedia page has some good info: https://en.wikipedia.org/wiki/Trie. Bitwise tries look especially interesting.

Concettaconcettina answered 26/12, 2015 at 19:57 Comment(1)
Skip lists are also a good candidate as they "are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space." en.wikipedia.org/wiki/Skip_listNiacin

© 2022 - 2024 — McMap. All rights reserved.