I'm looking to implement a typeahead/autocomplete search for fun. I have a few attributes within my schema in mongoDB, but I want to be able to search only by category, title, preview, or date.
This is my mongoDB schema for a single article (I'm using mongoose as ORM):
{
title: { type: String, required: true}
, preview: { type: String, required: true}
, body: { type: String, required: true}
, category: {type: String}
, created_at: { type: Date, default: Date.now }
}
Each time I create, update, or destroy, I have to re-index so the search gets updated. The search will be autocompleted, such that for example, when I have two articles titled "Welcome to stackoverflow" and "How to avoid stackoverflow" respectively and the user types in a key 't'
then I'd display both articles using AJAX since both have character 't'
in their titles. i'd also like to highlight every single 't'
; the 't'
in 'to'
, 't'
in s't'
ackoverflow, indicating that the query hits something. (I expect that it will look similar to when we search for particular 'tags' here at stackoverflow.com)
The question now is should I use a different schema for indexing, or just stick to my existing schema? It seems that I wont be using 'body' attribute which contain the full article and has thousands of words in it since I'm not looking to do full text search right now.
- Title attributes probably only have ~45 characters and 3 or 4 words in average.
- Category mostly only 1 word with average 9-15 characters.
- Preview would be the largest datasets with ~150 characters and 20 words in average.
I'd probably like to implement this using trie data structures. On top of my head, I would probably say that one way of doing this is by making AJAX request every keystroke that will be routed to node.js handler, and then from there making query to mongoDB that will return every entry that has words that has a letter that matches the keystroke typed in by the user as a JSON file. I will then parse that JSON file and display each entry.
The question then is how would I fit the trie algorithm into my plan? The other thing is that I need to rebuild the index each time i do CRUD operation.
Would appreciate any suggestion/pointers to the right direction or any articles that would help me do this. (I'm looking to do the best practice/performant way) Thanks. Let me know if the question needs to be clarified.