MongoDB fulltext search + workaround for partial word match
Asked Answered
S

5

27

Since it is not possible to find "blueberry" by the word "blue" by using a mongodb full text search, I want to help my users to complete the word "blue" to "blueberry". To do so, is it possible to query all the words in a mongodb full text index -> that I can use the words as suggestions i.e. for typeahead.js?

Serb answered 9/1, 2014 at 11:19 Comment(4)
I don't believe this is there yet, you might be able to to use query conditions but I believe they are not scored: docs.mongodb.org/manual/reference/command/text/…Heliotrope
Who said partial matching of strings is not possible in MongoDB? #3306061Tough
@DenizZoeteman I am unsure if you understand the question, this is about FTS not general queryingHeliotrope
You can try the solution mentioned here: https://mcmap.net/q/212475/-mongoose-text-search-with-partial-stringTyrr
U
12

Language stemming in text search uses an algorithm to try to relate words derived from a common base (eg. "running" should match "run"). This is different from the prefix match (eg. "blue" matching "blueberry") that you want to implement for an autocomplete feature.

To most effectively use typeahead.js with MongoDB text search I would suggest focusing on the prefetch support in typeahead:

  • Create a keywords collection which has the common words (perhaps with usage frequency count) used in your collection. You could create this collection by running a Map/Reduce across the collection you have the text search index on, and keep the word list up to date using a periodic Incremental Map/Reduce as new documents are added.

  • Have your application generate a JSON document from the keywords collection with the unique keywords (perhaps limited to "popular" keywords based on word frequency to keep the list manageable/relevant).

You can then use the generated keywords JSON for client-side autocomplete with typeahead's prefetch feature:

$('.mysearch .typeahead').typeahead({
  name: 'mysearch',
  prefetch: '/data/keywords.json'
});

typeahead.js will cache the prefetch JSON data in localStorage for client-side searches. When the search form is submitted, your application can use the server-side MongoDB text search to return the full results in relevance order.

Ushijima answered 9/1, 2014 at 13:39 Comment(1)
this is more effort that I wanted to have but a neat idea! thxSerb
H
4

A simple workaround I am doing right now is to break the text into individual chars stored as a text indexed array.

Then when you do the $search query you simply break up the query into chars again.

Please note that this only works for short strings say length smaller than 32 otherwise the indexing building process will take really long thus performance will be down significantly when inserting new records.

Hyla answered 28/7, 2015 at 4:55 Comment(1)
Downvoted perhaps because this response makes little attempt to explain what the solution is about. Raises more questions and confusion without examples as to how to achieve what is being described.Anastigmat
E
1

You can not query for all the words in the index, but you can of course query the original document's fields. The words in the search index are also not always the full words, but are stemmed anyway. So you probably wouldn't find "blueberry" in the index, but just "blueberri".

Emphatic answered 9/1, 2014 at 12:40 Comment(0)
F
1

Don't know if this might be useful to some new people facing this problem.

Depending on the size of your collection and how much RAM you have available, you can make a search by $regex, by creating the proper index. E.g:

db.collection.find( {query : {$regex: /querywords/}}).sort({'criteria': -1}).limit(limit)

You would need an index as follows:

db.collection.ensureIndex( { "query": 1, "criteria" : -1 } )

This could be really fast if you have enough memory.

Hope this helps.

Fiske answered 4/7, 2014 at 11:49 Comment(1)
If you do not want to search from beginning of the text, then index will not be used anyway. Index in this case is just a waste of resources.Korn
T
1

For those who have not yet started implementing any database architecture and are here for a solution, go for Elasticsearch. Its a json document driven database similar to mongodb structurally. It has "edge-ngram" analyzer which is really really efficient and quick in giving you did you mean for mis-spelled searches. You can also search partially.

Tesstessa answered 26/4, 2017 at 10:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.