There are several plugin options for building a search engine into your Ruby on Rails application. Which of these is the best?
Thinking Sphinx has more concise syntax to define which fields and which models are indexed.
Both UltraSphinx and Thinking Sphinx (recently) have ultra-cool feature which takes into account geographical proximity of objects.
UltraSphinx has annoying problems with how it loads models (it does not load entire Rails stack, so you could get strange and hard to diagnose errors, which are handled by adding explicit require
statements).
We use Thinking Sphinx on new projects, and UltraSphinx on projects which use geo content.
A solid option used by one of my friends is Solr, a search engine using the original Java-based Lucene. To use it with Rails, there's, of course an acts_as plugin, acts_as_solr.
He presented the combo recently at Montreal on Rails and gives a nice and thorough overview of how to use acts_as_solr on his blog.
It apparently supports french accents very well, too.
I'm going through this exact process right now so while I don't have actual experience, I've spent many hours researching all the options. Here's what I've learned so far:
- *Sphinx - good reputation for speed and functionality but Sphinx needs integer keys and my model uses GUID; ThinkingSphinx recently announced support for GeoSpatial
- Acts_As_Solr - recommended by a friend with a high-volume site; original creators have stopped working on it and documentation is hard to find; requires a Java servlet
- Acts_As_Ferret - looks easy to use, but lots of detractors that say its unstable
- Two others with limited information are Acts_As_Indexed and Acts_As_Searchable
I have a spreadsheet with my attempt at documenting the advantages and disadvantages of all of them. If anyone is interested in seeing it and/or helping me correct it, just contact me. I'll post it somewhere once I know its accurate.
My recommendation would be to try UltraSphinx or Thinking Sphinx if you have normal primary keys. I'm going to try Acts_As_Xapian based on the good documentation, feature set, and how active the project seems to be.
I have only used the Ferret/acts_as_ferret combo (legacy decision) on a client project. I strongly recommend looking at the other options first.
aaf is very fragile and can bring your Rails app to a screeching halt if you make a mistake in the config or if for some reason you hit a bug in aaf.
In such a case, instead of simply having the search functionality crapping out, any controller action touching an indexed model will completely fail and raise an exception. Which is baaad, hmkay?
In case anyone is still interested, the latest thing to use now is elasticsearch. There are gems available for it like tire or elasticsearch-rails. It is also based on Lucene like Solr, Java-based. Solr is actually integrated with this project now...
If you are using a shared hosting service like me (Bluehost), your options may be limited to what the provider offers. In my case, I couldn't find a good and reliable way to start and keep a separate server running, such as Lucene or Solr.
Therefore, I went with Xapian and it's been working well for me. There are 2 plugins for rails I've researched: acts_as_xapian and xapian_fu. The first will get you going quickly, but it doesn't seem to be maintained anymore. I've just begun working with xapian_fu.
I use the acts_as_xapian plugin. I followed this tutorial:
http://locomotivation.com/2008/07/23/simple-ruby-on-rails-full-text-search-using-xapian
Works very well.
I'm using acts_as_ferret. It's easy to configure and generally fast. The built-in active record find functionality is quite useful: you can apply any conditions or join other models after your search finds the matching records.
Unlike sphinx, you don't have to re-index ALL of your records when you add new data. There are after_save and after_update hooks that will insert your new record into the ferret db. This was one of the big selling points for me.
When you do have to mass index your data, ferret is definitely slower than acts_as_sphinx (by a factor of 3). I ended up writing my own method to re-index models which works as fast as sphinx -- it basically preloads all the data from the DB instead of going record by record to create the new index.
The ferret documentation is good for the basics, but it's a bit sparse once you get into more complex searches, sorts and using a dRb server to host a remote index. That being said, it feels a much more mature product than acts_as_sphinx, although I have limited experience with sphinx.
I've been looking for the perfect solution as well. At first I went with Thinking Sphinx, which worked fine. But since I intent to host my webapp on Heroku, the only option is to use Solr. The biggest drawback, however, is that development of the main acts_as_solr gem seems to have stopped after May 2008. So that's too old for my taste. I just found Sunspot as an advanced alternative and with recent updates, so that's one I'm going to consider.
Another option Heroku offers is to go for a hosted index server based on Solr, named Websolr. The required gem websolr-acts_as_solr is also luckily very much up-to-date.
I recommend acts_as_ferret. But though the tough part is to get it up and running successfully in your server, once done you hardly have any problem as ferret server will be running as separate background process to update your index every time there is any new update. Also, its working great in mongrel with apache for us.
Thinking Sphinx is a better alternative than Ultrasphinx, which seems abandoned, but, in general, Xapian has a more powerful engine than Sphinx and is easier for implementing realtime search.
I'm using a different option which was worked out amazingly well. I'm using jruby and talking to lucene directly.
I've used acts_as_solr in the past and ran into some issues. mainly it makes a synchronous call for each AR save. This isn't too bad, but in my situation a save sometimes caused many synchronous calls to solr and would occasionally take longer than mongrel would allow and I'd get a mongrel timeout exception (or something like that)
I've used Thinking Sphinx and it seems pretty good, but I haven't had the time to evaluate all of the options.
I recommend Thinking Sphinx. It is the fastest option in my opinion.
I've used Ferret and it worked well for my purposes, but I have not evaluated the other options.
An option I haven't tried is the C++ based Xapian
We're using http://hyperestraier.sourceforge.net/, which was inherited. Haven't looked into other engines, but hyperestraier provides all the hooks necessary. Setting up the search index is complicated though. Probably easier options available.
It depends on what database you are using. I would recommend using Solr as it offers up a lot of nice options for fuzzy search and has a great query parser. The downside is you have to run a separate process for it. I have used Ferret as well, but found it to be less stable in terms of multi-threaded access to the index. I haven't tried Sphinx because it only works with MySQL and Postgres.
© 2022 - 2024 — McMap. All rights reserved.