Full-text search on MongoDB GridFS?
Asked Answered
S

2

7

Say, if I want to store PDFs or ePub files using MongoDB's GridFS, is it possible to perform full-text searching on the data files?

Stop answered 8/5, 2012 at 1:29 Comment(0)
F
3

You can't currently do real full text search within mongo: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

Feel free to vote for it here: https://jira.mongodb.org/browse/SERVER-380

Mongo is more of a general purpose scalable data store, and as of yet it doesn't have any full text search support. Depending on your use case, you could use the standard b-tree indexes with an array of all of the words in the text, but it won't do stemming or fuzzy matches, etc.

However, I would recommend combining mongodb with a lucene-based application (elastic search is popular). You can store all of your data in mongodb (binary data, metadata, etc.), and then index the plain text of your documents in lucene. Or, if your use case is pure full text search, you might consider just using elastic search instead of mongodb.

Update (April 2013): MongoDB 2.4 now supports a basic full-text index! Some useful resources below.

http://docs.mongodb.org/manual/applications/text-search/

http://docs.mongodb.org/manual/reference/command/text/#dbcmd.text

http://blog.mongohq.com/blog/2013/01/22/first-week-with-mongodb-2-dot-4-development-release/

Fiver answered 8/5, 2012 at 5:40 Comment(2)
MongoDB 2.4 now supports full text search.Convulsant
MongoDB 2.6 now has full text search as part of its regular query operators (until 2.4, you had to use db.runCommand). However, you can't do any kind of search inside of a gridfs file. They are just binary chunks and mongo will treat them no different if they are parts of an image or chapters of a text book.Ralleigh
H
1

Not using MongoDB APIs, not that I know of. GridFS seems to be designed to be more like a simplified file system with APIs that provides a straightforward key-value semantic. On their project ideas page they list two things that would help you if existed in production-ready state:

  • GridFS FUSE that would allow you to mount GridFS as a local file system and then index it like you would index stuff on your disk
  • Real-Time Full Text search integration with tools like Lucene and Solr. There are some projects on github and bitbucket that you might want to check out.

Also look at ElasticSearch. I have seen some integration with Mongo but I am not sure how much has been done to tap into the GridFS (GridFS attachment support is mentioned but I haven't worked with it to know for sure). Maybe you will be the one to build it and then opensource it? should be a fun adventure

Hachure answered 8/5, 2012 at 2:2 Comment(1)
GridFS FUSE is hopelessly outdated.Denominational

© 2022 - 2024 — McMap. All rights reserved.