Throttling requests by IP address on Apache? [closed]
Asked Answered
H

2

6

I want to throttle requests to my web server so as to thwart web scraping and denial of service attacks against my site. I'm willing to be relatively lax, the key thing is that no one requests so much so as to slow things down.

I was thinking of setting up throttling by IP address, so that requests from a given IP would be slowed if too many requests were made in a short period of time.

Some questions I have--

  • Is this the right way to go about dealing with web scrapers and DoS attacks at the web server level?
  • What's a good limit so that I don't inconvenience regular users who may be working on shared IP networks?
  • How specifically should I setup the throttling? I'm using Apache/2.2
Horology answered 19/9, 2011 at 6:26 Comment(2)
Did you already think of using mod_cache? If your content is not too dynamic, mod_cache might help a lot for too many requests. As of DOS, on a webserver level mod_throttle might help. But watch out, if your pages have lots of objects (one HTML but 80 images and some JS+CSS) you might be to conservative. As of mod_cache I can help you out, but I'm not too confident in mod_throttle; mostly because the documentation is sparse.Maggio
My website scrapes data live from APIs, and I'm worried about being not only copied, but also DoS'ed if someone spiders my site. So yeah my site is pretty dynamic.Horology
I
4

"Is this the right way ... at the web server level?" It's probably the best option you have. It might be good to have different thresholds on different parts of your site: you may be more willing to throttle certain kinds of traffic than others. But ideally these kinds of settings would be managed at the network level.

"What's a good limit ... ?" It completely depends on your traffic. How much you expect, where your real users come from, etc.

How to do it? It is possible to write rules to handle this sort of thing in ModSecurity, which also defends against some other stuff. As with the mod_evasive answer, this won't fully protect you against attackers with a lot of resources at their disposal, but it would force them to step up their game.

I don't think there's anything "built into" Apache httpd that will facilitate this. The expectation would be that issues with an abusive IP address (i.e., network traffic issues) are managed at the network level.

EDIT:

Since you comment elsewhere that you are using Rackspace for hosting, you might want to check out their load balancer API.

Ignacioignacius answered 24/9, 2011 at 17:8 Comment(3)
Really, there's nothing built into Apache for this type of situation?Horology
Why should that be so surprising? httpd is built to serve HTTP quickly in most of the usual scenarios, and be a platform for development of more advanced features ... such as per-client throttling. As I said, this is something that is usually done at a different layer (IP stack on the box, or at the network gateway). You say you're on Rackspace ... I'll edit my answer to provide a related link.Ignacioignacius
Thanks for educating me on this (esp. the rackspace link). Answer/bounty awarded!Horology
L
0

To avoid dos attacks/web scraping you can explore mod_evasive which provides various configuration to block requests. http://www.zdziarski.com/blog/?page_id=442

It can be useful for basic protection, however, it won't be sufficient against a determined and experienced attacker, who can attack from an internal network or use an array of server proxies to hide his IP.

Leigha answered 22/9, 2011 at 18:48 Comment(1)
This requires special installing, and I'm on shared hosting. What can I Use that's built into Apache?Horology

© 2022 - 2024 — McMap. All rights reserved.