Full-text search in NoSQL databases [closed]
Asked Answered
D

10

24
  • Has anyone here have any experience deploying a real online system that had a full text search in any of the NoSQL databases?
  • For example, how does the full-text search compare in MongoDB, Riak and CouchDB?
  • Some of the metric that I am looking for is ease of deployment and maintaince and of course speed.
  • How mature are they? Are they any replacement for the Lucene infrastructure?
Disharoon answered 28/3, 2011 at 1:45 Comment(5)
RavenDB use lucene by default. Check it here ravendb.net/faq/lucene-queries-examples. So that mean built in full text search. I've used it in past, but it seems to me 'not production ready'.Substantialize
MarkLogic is a NoSQL database built with real-time full-text search at it's core. See developer.marklogic.com/products/marklogic-server/which-nosqlHeise
The open source version of MarkLogic's product, ExistDB, has a great full-text implementation and I found it really easy to use once it's set up. You can find it here: exist-db.org/exist/apps/homepage/index.htmlHazelton
See this also, #13175127Tuberculate
Dgraph says it supports full text search: dgraph.io/tour/search/5Santalaceous
A
10

None of the existing "NoSQL" database provides a reasonable implementation of something that could be named "fulltext search". MongoDB in particular has barely nothing so far (matching using regular expressions is not fulltext search and searching using $in or $all operators on a keyword word list is just a very poor implementation of a "fulltext search"). Using Solr, ElasticSearch or Sphinx is straight forward - an implementation and integration on the application level. Your choice widely depends on you requirements and current setup.

Apocynthion answered 28/3, 2011 at 2:19 Comment(7)
Lucene/Solr is awesomely fast compared to using $inUtah
Wrong - Riak has Riak-Search (see below), which is an integral part of the product and provides a good implementation of lucene imho. We tested that and plan on using it in production for a very large web app.Burdelle
@Eland : Can you please share with us your test results and what made you use Riak? Thank you very much.Disharoon
Agreed. Just because MongoDB has "barely nothing" doesn't mean people using Riak, BigCouch, or CouchDB-Lucene have to stop having fun.Trilobite
We at MarkLogic have had this for sometime now. Sorry I missed this earlier discussion.Heise
I would have to disagree. RavenDB has excellent fulltext search capabilities.Volumeter
Who said Solr is not NoSQL? Awesome performance, perfect API and really good storage model with ability to control storage and indexing options. Stemmers available, faceting, highligting.Moreover it's opensource Java. I didn't used better tool fo rfulltext.Lanark
T
8

Yes. See CouchDB-Lucene which is a CouchDB extension to support full Lucene queries of the data.

Trilobite answered 28/3, 2011 at 4:10 Comment(0)
D
5

I'm involved in the development of an application using Solandra (Cassandra based Apache Solr). In my experience the system is quite stable and able to handle TB+ data. I'm personally quite happy with the software for the following reasons: 1. Automated partitioning of data due to Cassandra backend. 2. Rich querying capabilities (due to Solr and Lucene). 3. Fast read and writes (writes significantly faster than reads).

However currently Solandra, I believe does not support batch mutations. That is, I can insert 100 columns in a single insertion into Cassandra, however Solandra does not support this.

Dayflower answered 4/10, 2011 at 16:6 Comment(0)
U
4

For MongoDB, there isn't a full full-text indexing feature yet, however there's possibly one in the pipeline, perhaps due in v2.2.

In the meantime, you can create a simple inverted index by using a string array field, and putting an index on it, as described here: Full Text Search in Mongo

Or, you could maintain a parallel full-text index in a dedicated Solr or Lucene index, and if you're feeling really ambitious replicate directly to your full-text store from the Mongo oplog. Otherwise, populate both and keep in sync from your application logic.

Utah answered 28/3, 2011 at 1:58 Comment(1)
MongoDB has text indexes since version 2.4Mouthwash
P
2

I've just finished completion of this using data that is stored in MongoDB while having my Fulltext engin in Sphinx Search. I know mongo has a votable issue for adding fulltext to a future release; however at this point they don't have it.

There are several ways of inserting your Mongo information into sphinx; however the one I've found the most luck with (and has been extremely easy) is through xmlpipe2. It took me a bit to fully understand how to use this; however this article: Sphinx xmlpipe2 in PHP has an outstanding walk through which shows (at least in PHP) how to build the document, then how to insert it into sphinx.

Essentially my config ends up looking like this:

source my_source {
     type = xmlpipe
     xmlpipe_command = /usr/bin/php /www/generateSphinXml.php identifierForMyTable
}

with my index then looking like this:

index my_index {
     source = my_source
     path = /usr/local/sphinx/var/data/my_index
     docinfo = extern
     min_word_len = 1
     mlock = 0
     morphology = stem_en
     charset_type = utf-8 //<----- This is q requirement however.
     enable_star = 1
     html_strip = 0
     min_prefix_len = 2
}

I've had excellent success with this; hopefully you can find this as useful.

Philtre answered 16/11, 2011 at 16:17 Comment(0)
R
2

Couchbase 5.0 is releasing full text search capabilities built on the open source Bleve engine. You enable indexing for full text and start using against existing JSON documents in the database.

Some slides and presentation video covering the topic, mentioning Elasticsearch and Lucene as well... https://www.slideshare.net/Couchbase/fulltext-search-how-it-works-and-what-it-can-do

Rule answered 13/9, 2017 at 19:1 Comment(0)
D
1

If you are using PHP there is a great solution for fulltext search in No-SQL database MongoDB named as MongoLantern. http://sourceforge.net/projects/mongolantern/

Previously I was using Sphinx+MongoDB to perform fulltext search, the performance was great but result quality was very poor. With MongoLantern my current search improved a lot.

MongoLantern is also listed in MongoDB site.

Please let me know if you try it of your own.

Donela answered 30/12, 2011 at 7:22 Comment(0)
C
1

Solr could be used with 10gen's Mongo Connector, which allows to push data there (among others)

https://github.com/10gen-labs/mongo-connector/tree/master/mongo-connector

From their example:

python mongo_connector.py -m localhost:27217 -t http://localhost:8080/solr
Columbous answered 29/10, 2012 at 0:54 Comment(0)
N
0

cLunce project. Also xapian not mentioned above. I use Sphinx and it's very good but somewhat clumsy to set up. I actually prefer piping data from Mongo into Sphinx via XMLPIPE2, instead of using Sphinx' SQL in sphinx.conf file.

Northman answered 21/7, 2011 at 0:8 Comment(0)
L
0

Definitely Solr. It is NoSQL.

It has:

  • awesome performance
  • awesome storage options
  • stemmers
  • highligting
  • faceting
  • distributed search (SolrCloud)
  • perfect API
  • web admin
  • HTML, PDF, DOC indexing
  • many other features
Lanark answered 1/10, 2013 at 13:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.