Can Apache Solr Handle TeraByte Large Data
Asked Answered
P

1

8

I am an apache solr user about a year. I used solr for simple search tools but now I want to use solr with 5TB of data. I assume that 5TB data will be 7TB when solr index it according to filter that I use. And then I will add nearly 50MB of data per hour to the same index.

1- Are there any problem using single solr server with 5TB data. (without shards)

  • a- Can solr server answers the queries in an acceptable time

  • b- what is the expected time for commiting of 50MB data on 7TB index.

  • c- Is there an upper limit for index size.

2- what are the suggestions that you offer

  • a- How many shards should I use

  • b- Should I use solr cores

  • c- What is the committing frequency you offered. (is 1 hour OK)

3- are there any test results for this kind of large data


There is no available 5TB data, I just want to estimate what will be the result.

Note: You can assume that hardware resourses are not a problem.

Posthaste answered 12/1, 2012 at 14:34 Comment(1)
A question for you. Assuming you are indexing 5TB of raw data, why do you think it will grow to 7TB? Should I take this to mean that you are storing the full document content in the index as well, as opposed to just storing the search fields? If so, I would suggest only storing what you need for searching in Solr. The raw documents themselves belong elsewhere.Dawna
I
3

if your sizes are for text, rather than binary files (whose text would be usually much less), then I don't think you can pretend to do this in a single machine.

This sounds a lot like Logly and they use SolrCloud to handle such amount of data.

ok if all are rich documents then total text size to index will be much smaller (for me its about 7% of my starting size). Anyway, even with that decreased amount, you still have too much data for a single instance I think.

Iridissa answered 12/1, 2012 at 14:39 Comment(4)
But 50MB per hour means aprox not 0.75TB per month, it is 0.075TB which means 75GB per monthPosthaste
sorry not sure how I got my calculations so wrong. Anyway the intial data is too large for a single solr I think...Iridissa
In your opinion what is the optimal data size for single solr serverPosthaste
the 5TB, is it just text or binary files (.doc .pdf...)Iridissa

© 2022 - 2024 — McMap. All rights reserved.