I would recommend planning an escape plan from Lucene as early as you start thinking about multiple CDs and here is why:
A) Each server has to maintain its own index copy:
- Any unexpected restart might cause a few documents not to be added to the index on the one box, making indexes different from server to server.
That would lead to same page showing differently by CDs
- Each server must perform index updates - use CPU & disk space; response rate drops after publish operation is over =/
- According to security guide, CDs should have Sitecore Shell UI removed, so index cannot be easily rebuilt from Control Panel =\
B) Lucene is not designed for large volumes of content. Each search operation does roughly following:
- Create an array with size equal to total number of documents in the index
- If document matches search, set flag in the array
While this works like a charm for low sized indexes (~10K elements), huge performance degradation is produced once the volume of content grows.
The allocated array ends in Large Object Heap that is not compacted by default, thereby gets fragmented fast.
Scenario:
Perform search for 100K documents -> huge array created in memory
Perform one more search in another thread -> one more huge array created
Update index -> now 100K + 10 documents
The first operation was completed; LOH has space for 100K array
Seach triggered again -> 100K+10 array is to be created; freed memory 'hole' is not large enough, so more RAM is requested.
- w3wp.exe process keeps on consuming more and more RAM
This is the common case for Analytics Aggregation as an index is being populated by multiple threads at once.
You'll see a lot of RAM used after a while on the processing instance.
C) Last Lucene.NET release was done 5 years ago.
Whereas SOLR is actively being developed.
The sooner you'll make the switch to SOLR, the easier it would be.