Solrcloud- does it matter if I have even or odd number of shards?
Asked Answered
M

1

0

I had a few queries on choosing exact number of shards for collection and nodes in cloud-

  • is there any impact on search/ingestion, if I choose even or odd number of shards?
  • is there any thumb rule or guidelines for deciding number of shards and nodes in cloud?

It would be really helpful if you could provide suggestions to plan solrcloud, collection(#shards) for below requirement:

Data type: structured
Expected data load: 3 TB
Ingestion Strategy: 2 MM records( INSERT/UPDATE/DELETE requests) in every 3 hrs 
Max size of a record: 100 KB

Hardware: I have 5 vms, whereas each vm has 4 cores, 24 GB of RAM. CPU architecture:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Stepping:              0
CPU MHz:               2600.000
BogoMIPS:              5200.00
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-3
Mcclanahan answered 21/1, 2016 at 23:0 Comment(0)
M
1

To your first point: There is no impact to odd or even no. of shards, but having more shards definitely increases query time if the shards are randomly distributed.

For the second point: Shards are like db partitions. You should decide on the shard depending on the data you have and how you want to access them. There was no way to re-shard collection once it has been created. You may split it if required.

In general, it is probably best to randomly distribute documents to your shards.

As for the sizing part: Since every case is different , it is best to index some 100/1000 docs and check the size of the index, since that can be different depending on the schema definition. You can then extrapolate the values to the data volume you perceive. You can check the /solr/admin/cores?action=STATUS&memory=true .

I have a 5 VM cluster and it has 3 shards with 4 replicas for each shard. But again every system is different!

Membership answered 5/2, 2016 at 18:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.