mongodb fulltext searching strategy
Asked Answered
O

3

13

We are trying to develop a strategy for using elasticsearch for full-text searching on our mongodb instance. It would appear that every key that we want to use as a filter must be included in elastics index. Potentially we could want to use every key in mongo as a filter - i.e. full-text search on description, filter by date and telephone number. Does anyone have any real-world experiences of adding full-text to mongo that they can share?

Maybe we can just use elasticsearch as a db?

Oligocene answered 7/2, 2012 at 15:55 Comment(0)
S
14

I do not see any reason to use ElasticSearch in conjunction with MongoDb, just use ElasticSearch as separate document storage for documents, that have to be searched. And yes, you can even as whole db. Of course it depends on your domain model and other factors.

If you don't need stemming, fuzzy search, complicated wildcard search, you can do search with mongoDb. When new document inserted, split it to words in lower case, and add to the array "words" for example. Later you can perform search request against this array with regex. Not you can' use I (ignore case) option in this regex, and you can search only LIKE% wildcard (or without wildcard), otherwise search would not use mongoDb index.

One more option - you can try to find river for mongoDb

Another option - is to use Lucene if you are using Java. Probably you will be able to extend Directory class, in such a way, that Lucene will store index in MongoDb instead of file system or RAM. I have not made any research in this area, but I think it is possible

Spurlock answered 7/2, 2012 at 16:18 Comment(3)
Thanks Umar, we're going to give your approcah a tryOligocene
While this is an option, there comes a point where the weight of data is large enough to make regex searches an inefficient choice. That is in fact why search indexers exist. They supplement and augment persistent storage for the express purpose of keeping expensive search operations off the database.Ethban
Even with regex mongodDb can use indexes as I mentioned in answer, it depends on type of regexSpurlock
N
9

I experimented with full text search in MongoDB by splitting the words in the string like @Umar suggested. Honestly though, its a database and not a search engine so I would use Mongo for persistant storage and ElasticSearch for the search engine part of it. As a matter of fact, I would stick with something like Postgresql for persistant storage and then push the data you want to search out to the search engine. http://gdal.org/ogr/drv_elasticsearch.html is a driver that will allow you to quickly export your data from one RDBMS to ElasticSearch. THe data does not have to be geospatial in order to use it GDAL as long as their is a way to connect to the input source.

Adam

Nidifugous answered 8/2, 2012 at 3:18 Comment(0)
H
0

i wrote a library base on @Anton answer that intercepts the read and writes of MongoClient instance. this automatically add the text array needed while writing and removes it while reading.

an example:

import { MongoClient } from "mongodb";
import { proxyClient } from "mongodb-middleware-utils";

const mongoServer = new MongoClient('mongodb://127.0.0.1:27017');

proxyClient({
    'my_database_name': { // name of the database you want to intercept
        'my_collection_name': { // name of collection you want to intercept
           random: true,
           fulltext: ['name', 'des']
        }
    },
    // you can have as many map as needed
    ...otherMapping
})(mongoServer);

// connect
mongoServer.connect();

const db = mongoServer.db('my_database_name');

// insert document
await db.collection('my_collection_name').insertOne({
    _id: 'doc1',
    name: 'ademola onabanjo',
    date: Date.now(),
    des: 'Lorem ipsum dolor sit amet consectetur'
});

then you can later perform a fulltext search as follows:

const descSearch = await db.collection().find({ $text: { $search: 'dolor sit am', $field: 'des' } }).toArray();

// outputs: 
// [{
//     _id: 'doc1',
//     name: 'ademola onabanjo',
//     date: Date.now(),
//     des: 'Lorem ipsum dolor sit amet consectetur'
// }]
console.log('searchResult: ', descSearch);
Hectometer answered 29/6, 2024 at 2:16 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.