How to optimize solr index. I want to optimize my solr indexing for i try to change in solrconfig.xml it getting indexed but i want to how to verify that they are optimized and with which thing are involve in index optimization.
Check the size of respective core before you start.
Open Terminal 1:
watch -n 10 "du -sh /path to core/data/*"
Open Terminal 2 and Execute:
curl http://hostname:8980/solr/<core>/update?optimize=true
Instead of "core", update your respective name of the core.
You could see the size of the core will increase gradually about double the size of your indexed data and will reduce suddenly. This will take time depends on your solr data.
For instance, 50G indexed data spikes nearly 90G and downs to optimized 25G data. And normally it will take 30-45min for this amount of data.
I find this to be the easiest way to optimize a Solr index. In my context "optimize" means to merge all index segments.
curl http://localhost:8983/solr/<core_name>/update -F stream.body=' <optimize />'
You need to pass optimize=true
to update solr request to optimize the solr.
There are different ways to optimize an index. You could trigger one of the solr basic scripts: http://wiki.apache.org/solr/SolrOperationsTools#optimize
You also could set optimize=true
at an (full) import or while adding new data.
...or simply trigger an commit with optimize=true
Maybe also this could be interesting for your needs: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
By Optimise considering it to be forceMerge. The Optimize operation re-organizes all the Segments in a Core (or per shard) and merged them to 1 single Segment (default is 1 segment)
To optimise: You could specify MergePolicy in solrconfig.xml, so that Solr will merge segments by himself. To manually trigger the optimise http://hostname:port/solr/<COLLECTION_NAME>/update?optimize=true&maxSegments=1'
To answer you next question - how to verify if optimise is done or not ? You can check the Core/Shard Overview tab in the Solr UI which will denote the count of segment. You can also verify the size of segments in the /data/index folder before and after the optimise.
Optimize/forceMerge are better behaved, but still expensive operations.
https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations:
“Optimizing is very expensive, and if the index is constantly changing, the slight performance boost will not last long.”
For testing how much a change you do optimize the indexing, just write a custom indexer and add random generated content. Add a big number of documents (500.000 or 1.000.000) and measure the time it takes.
Following the articles shared above I made to myself a custom indexer and I mananged to optimize the time it took to index documents by 80%.
When it comes to optimization of Solr core/shard data it is as easy as running a command like this:
curl http://hostname:8980/solr/<COLLECTION_NAME>/update?optimize=true'
But be aware that this doesn't come for free - if you have a lot of data you may end up with quite a lot of I/O on Solr nodes and the process itself taking a lot of time. In most cases, you want to start with tuning the merge process, not force merging the index itself.
I did a talk on that topic during Lucene/Solr revolution - if you would like to have a look at the slides and the video here is a link: https://sematext.com/blog/solr-optimize-is-not-bad-for-you-lucene-solr-revolution/
If you have access to the Solr web-based UI, this can be done there by navigating to the core you want to optimize, then:
- Open the "Documents" page/menu/whatever
- Set the
Request-Handler
to/update
(which is the default) and the document type toXML
(this may be possible with JSON, but...) - Enter
<optimize/>
into the "Document(s)" text area - Submit the document
This will kick-off the optimization process.
© 2022 - 2024 — McMap. All rights reserved.
commit=true
? – Tepper