How to prevent Googlebot from overwhelming site?
Asked Answered
S

6

12

I'm running a site with a lot of content, but little traffic, on a middle-of-the-road dedicated server.

Occasionally, Googlebot will stampede us, resulting in Apache maxing out its memory, and causing the server to crash.

How can I avoid this?

Sewerage answered 25/8, 2009 at 13:55 Comment(1)
This might not be Google at all. Identify the IP address(es) of the offending bots and do the reverse lookup. Check whether it resolves to Google's domain. I've seen very agressive bots that used Googlebot user-agent.Blanketyblank
S
9
  • register at google webmaster tools, verify your site and throttle google bot down
  • submit a sitemap
  • read the google guildelines: (if-Modified-Since HTTP header)
  • use robot.txt to restrict access from to bot to some parts of the website
  • make a script that changes the robot.txt each $[period of time] to make sure the bot is never able to crawl too many pages at the same time while making sure it can crawl all the content overall
Symon answered 25/8, 2009 at 14:19 Comment(1)
I added a condition in nginx.conf Also added robots.txt agent for GoogleBot User-agent: AhrefsBot Disallow: / But this won;t work for this other bot is getting excludedMaggoty
O
9

You can set how your site is crawled using google's webmaster tools. Specifically take a look at this page: Changing Google's crawl rate

You can also restrict the pages that the google bot searches using a robots.txt file. There is a setting available for crawl-delay, but it appears that it is not honored by google.

Outoftheway answered 25/8, 2009 at 14:0 Comment(0)
S
9
  • register at google webmaster tools, verify your site and throttle google bot down
  • submit a sitemap
  • read the google guildelines: (if-Modified-Since HTTP header)
  • use robot.txt to restrict access from to bot to some parts of the website
  • make a script that changes the robot.txt each $[period of time] to make sure the bot is never able to crawl too many pages at the same time while making sure it can crawl all the content overall
Symon answered 25/8, 2009 at 14:19 Comment(1)
I added a condition in nginx.conf Also added robots.txt agent for GoogleBot User-agent: AhrefsBot Disallow: / But this won;t work for this other bot is getting excludedMaggoty
G
1

Register your site using the Google Webmaster Tools, which lets you set how often and how many requests per second googlebot should try to index your site. Google Webmaster Tools can also help you create a robots.txt file to reduce the load on your site

Gavrila answered 25/8, 2009 at 13:59 Comment(0)
P
1

Note that you can set the crawl speed via Google Webmaster Tools (under Site Settings), but they only honour the setting for six months! So you have to log in every six months to set it again.

This setting was changed in Google. The setting is only saved for 90 days now (3 months, not 6).

Primogenitor answered 14/10, 2014 at 20:1 Comment(0)
R
0

You can configure the crawling speed in google's webmaster tools.

Rattat answered 25/8, 2009 at 13:58 Comment(0)
B
0

To limit the crawl rate:

  • On the Search Console Home page, click the site that you want.

  • Click the gear icon Settings, then click Site Settings.

  • In the Crawl rate section, select the option you want and then limit the crawl rate as desired.

The new crawl rate will be valid for 90 days.

Brisson answered 29/1, 2019 at 4:34 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.