Say, if I want to store PDFs or ePub files using MongoDB's GridFS, is it possible to perform full-text searching on the data files?
You can't currently do real full text search within mongo: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo
Feel free to vote for it here: https://jira.mongodb.org/browse/SERVER-380
Mongo is more of a general purpose scalable data store, and as of yet it doesn't have any full text search support. Depending on your use case, you could use the standard b-tree indexes with an array of all of the words in the text, but it won't do stemming or fuzzy matches, etc.
However, I would recommend combining mongodb with a lucene-based application (elastic search is popular). You can store all of your data in mongodb (binary data, metadata, etc.), and then index the plain text of your documents in lucene. Or, if your use case is pure full text search, you might consider just using elastic search instead of mongodb.
Update (April 2013): MongoDB 2.4 now supports a basic full-text index! Some useful resources below.
http://docs.mongodb.org/manual/applications/text-search/
http://docs.mongodb.org/manual/reference/command/text/#dbcmd.text
http://blog.mongohq.com/blog/2013/01/22/first-week-with-mongodb-2-dot-4-development-release/
Not using MongoDB APIs, not that I know of. GridFS seems to be designed to be more like a simplified file system with APIs that provides a straightforward key-value semantic. On their project ideas page they list two things that would help you if existed in production-ready state:
- GridFS FUSE that would allow you to mount GridFS as a local file system and then index it like you would index stuff on your disk
- Real-Time Full Text search integration with tools like Lucene and Solr. There are some projects on github and bitbucket that you might want to check out.
Also look at ElasticSearch. I have seen some integration with Mongo but I am not sure how much has been done to tap into the GridFS (GridFS attachment support is mentioned but I haven't worked with it to know for sure). Maybe you will be the one to build it and then opensource it? should be a fun adventure
© 2022 - 2024 — McMap. All rights reserved.