What is the maximum number of Apache Nutch crawler instances that can run at the same time with one master node?
Maximum number of Apache Nutch worker instances
Not clear what you mean by crawler instances. If you want to run the crawl script several times in parallel e.g. you have distinct crawls with separate configs, seeds etc... then they will compete for slots on the Hadoop cluster. It will then boil down to how many mapper / reducer slots are available on your cluster, which itself depends on how many slaves are there.
Handling multiple Nutch crawls in parallel can get very tricky and resource inefficient. Instead re-think your architecture so that all the logical crawlers could run as a single physical one or have a look at StormCrawler, which should be a better fit for doing this.
© 2022 - 2024 — McMap. All rights reserved.