How to implement Solr into Sitecore
Asked Answered
E

2

8

I have to implement Solr index into Sitecore and I would like to know what is the best approach?

I looked at following approaches:

  1. Capture publish end event (or other events) and then push item to solr index
  2. Implement custom database crawler and get all changes from history table. Then using custom index push data to solr.

Second approach sounds like a way to go (in my opinion). In this case do I need to create a new search index, or search manager?

If anyone's done it before, can you point me into the right direction? Also if you could post some links to articles about sitecore-solr implementation.

UPDATE Ok, after reading sitecore documentation this is what I came up with :

  1. Create your custom SolrConfiguration class where you can set properties like solrserviceurl, add indexes and its definition (custom solr indexes)

  2. Create SolrIndex and add it (in the config file) to your SolrConfiguration. Which instantiating, solrindex should subscribe to AddEntry event of Sitecore History Manager, and communicate with solr crawlers.

  3. Create custom processor and hook into sitecore initialisation pipeline. Processor should initialize SolrConfiguration (from step 1)

  4. Since everything in your config file in will be build using refrection, you can get instance of your cofiguration based on your config file

How does that sound like. Can I have any comments please?

Edmondedmonda answered 21/8, 2012 at 9:34 Comment(0)
C
2

We've done this on a few sites and tend to have a new "published" solr index and "unpublished" index

We interrupt:

OnItemSaving

Event to push things into the unpublished index (you may not need this, it depends if you want things in preview mode)

OnPublishItemProcessed

We process additions and updates to the published index here, I'm not sure what we do about deletions here without digging right into the code but certainly deal with deletions on the OnItemDelete (mentioned below)

OnItemDelete

We interrupt here to remove things from the published and non-published index (I think we remove from the published index here because Sitecore makes you publish the parent node in order to publish out deletions to the web database)

I hope that helps, I'd post the code if I could (but I'd be scowled at).

Concatenation answered 21/8, 2012 at 11:55 Comment(3)
Hi, I like this approach however. The recommendation says that events should be used for simple, fast item related operations (correct me if I'm wrong). I know it work fine if you subscribe to events and update your solr index, but does that have any performance issues ?Edmondedmonda
We've not had any performance issues reported back and it's implemented a few big websites we've worked (which have a LOT of content).Concatenation
I've not looked at this in any detail yet but this github.com/jerrong/Sitecore-Item-Buckets looks very, very interesting indeed and might be worth you investigating. (Ahhh looks like it's Sitecore 6.5 only but that might still be good for you?)Concatenation
M
2

In addition to the already posted answer (which I think is a good way to do things) I'll share how we do it.

We basically just took a look at the Sitecore database crawler and decided to do things kind of like how it was doing it.

We utilize a significantly modified version of the Custom Item Generator to facilitate mapping between strongly typed objects and an object that has properties that correspond to our Solr schema. For actual communication with Solr we use SolrNet.

The general idea is that we loop through all the items (starting with the site root) recursively and map them to the appropriate type based on its template. Then we go through an indexing process for that item (some items need to index multiple documents to Solr in our implementation).

This approach is working very well for us except I will note that because we are indexing everything at once, it tends to introduce a slight bit of lag time between publish and the site reflecting any changes made to the index. One oversight we made in the beginning but will be working to fix soon is that we don't have an "unpublished" index (meaning we need to publish the site to see updates). It doesn't impact our solution that much really, but I can definitely see where it would others, so keep that in mind.

We didn't particularly want to get into the deletion of items from the index so we do the indexing as a publish:end event.

I hope this additional insight helps you. As far as I know there's not a whole lot of information out there about this specific combination of products, but I can tell you it's definitely possible and quite useful.

Mckeehan answered 21/8, 2012 at 18:56 Comment(6)
That also works (tested this approach), but it doesn't sound right - database crawler has to be added to an index (which is a lucene index), which means that you would keep updating 2 indexes... or after updating your solr index you cancel the job and therefore record will not be added to your sitecore lucene index?Edmondedmonda
The Solr implementation is operating independently of the Sitecore Lucene index. To clarify, we only took the database crawler as inspiration (specifically the approach to gathering the items for indexing). We're not actively using the Sitecore Lucene index for anything. As I said before, we only really have one index (will be adding a second one in the future) and via configuration we tell it to index the web database.Mckeehan
Did you notice any performance issue ? Also how did you handle rebuilding index?Edmondedmonda
The only performance issue is the lag time between the time we publish and the time the index is updated (will probably be looking into fixing this by introducing the history engine into the indexing process). Rebuilding the index is (for now) done by 1. Deleting the current index 2. Indexing all the items 3. Committing the index additions to Solr.Mckeehan
How are you Indexing all items (step 2.). do you have a custom mechanism to do that? I checked and rebuilding index did not trigger AddItem (or DeleteItem) on crawler?Edmondedmonda
The indexing is all custom, the only integration with Sitecore is an event at publish:end to trigger the indexing. The Sitecore crawler is not involved in this process.Mckeehan

© 2022 - 2024 — McMap. All rights reserved.