Practical non-image based CAPTCHA approaches?

D

103

317

It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!

We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:

http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs

The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!

However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.

I have written a traditional CAPTCHA control for ASP.NET which we can re-use.

CaptchaImage

However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.

I've seen things like..

ASCII text captcha: \/\/(_)\/\/
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?

Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.

Ideas?

Dactylic answered 12/8, 2008 at 4:59 Comment(24)

How about subkismet? – Granoff 12/8, 2008 at 5:32

Do you intend to use captchas for registering or for every post? Personally I think anyone who manages to get over a certain threshhold of rep should be immune to Captchas... – Filmer 6/10, 2008 at 11:47

There is no need to actually create an image on the server. You just need to handle the request. For example <img src="generateImage.aspx?guid=blah"> – Coverdale 19/10, 2008 at 4:44

Trivia questions are prone to cultural bias (think of a french guy answering your question...). Furthermore, they can tackle users whose English isn't native. Also, they can easily be broken using brute force (you only have ~2^#_OfQuestions options). – Card 26/1, 2009 at 9:29

@jeff - definitely the former - en.wikipedia.org/wiki/Toad_in_the_hole – Spatz 5/2, 2009 at 4:3

Careful Google calc can solve worded maths puzzles like that.. google.co.uk/… – Directorate 14/3, 2009 at 2:4

Also, what on earth is a popsicle? – Directorate 14/3, 2009 at 2:6

@Directorate I think they mean what I would call an iceblock: en.wikipedia.org/wiki/Popsicle – Gaylordgaylussac 9/7, 2009 at 8:16

I would imagine these kind of defences will be easily bypassed by scripts that just defer submitting the page for 30 seconds.. theres no initial cost in terms of its throughput only that when it first starts loading urls to spam, it has to wait 30 seconds before it starts submitting.. whilst this works today I imagine this will be easily bypassable. – Clupeid 27/8, 2009 at 12:16

so .. what is 7 minus 3 times 2 = ( 1 or 8 ) ? – Vories 9/1, 2010 at 3:44

According to Wolfram Alpha, "what is 7 minus 3 times 2" is 1. I thought it was 8. I think you just invented the anti-captcha. – Salon 14/1, 2010 at 22:55

Its one. Remember PEMDAS? OOP here people, OOP. Could also read like this... 7-(3*2), giving a value of 13. Definitely not the way to go... – Leesaleese 28/1, 2010 at 0:14

@Mike Robinson: I think programmers should know about operator precedence in NORMAL day use =) – Heater 10/2, 2010 at 10:4

As anyone who studied basic arithmetic knows, multiplication is higher than subtraction in the order of operations, so 7 - 3 * 2 = 1; with or without parentheses. Are you smarter than a fifth grader? :) – Quincy 4/3, 2010 at 1:28

@mwarenr: Sure, but when writen with words "7 minus 3 times 2" my first instinct would be to calculate it sequentially. – Lucillelucina 30/3, 2010 at 11:11

I think @Sosh is correct. Algebraic order probably only applies to equations in algebraic notation. – Diegodiehard 16/4, 2010 at 23:27

IMHO "7 minus 3, times 2" = 8. Without the comma, it's hard to guess the intent without hearing it spoken. – Chide 19/4, 2010 at 16:30

@Brian To send the image, you have to create it first. Yes, you can keep it in memory and after sending throw it away ... this would kill the performance. – Grisly 21/4, 2010 at 10:17

What I want to see is the CAPTCHA equivilent of those old 90s computer game copy protection schemes. "Go to google, search for 'foobar', what is the first word of the third result listing?" :) – Khudari 21/4, 2010 at 10:35

7 minus 3 times 2 IS 1. You are supposed to do times and divide first. So (7-3*2) is calculated as (7-(3*2)). Remember your Algebra. – Endblown 5/5, 2010 at 15:59

result = eval("7 minus 3 times 2".toLower().replace("minus", "-").replace("times","*")). Don't get my wrong, but how is that supposed to stop a bot ? – Hashim 7/8, 2010 at 8:1

I vote for the trivia captchas. The main problem people seem to have with them is that they're hard to solve for someone who doesn't speak English. But I have to say: "Isn't that a good thing?". Do we want people who don't understand English posting questions on StackOverflow? – Innovation 25/10, 2010 at 8:30

I would argue that making this <noscript> compatible is a pointless requirement. The amount of people disabling javascript is negligible and those that do know how to enable it for the sites that need it. – Transistorize 26/10, 2010 at 22:3

@Jeff: Curios, why are you implementing this in the first place? It seems there's already an active community that quickly squashes spam, a distributed NI (Natural Intelligence) solution. – Phemia 15/11, 2011 at 19:41

M

205

A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:

<input type="hidden" name="antispam" value="lalalala" />

I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:

var antiSpam = function() {
        if (document.getElementById("antiSpam")) {
                a = document.getElementById("antiSpam");
                if (isNaN(a.value) == true) {
                        a.value = 0;
                } else {
                        a.value = parseInt(a.value) + 1;
                }
        }
        setTimeout("antiSpam()", 1000);
}

antiSpam();

Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.

If AntiSpam = A Integer
    If AntiSpam >= 10
        Comment = Approved
    Else
        Comment = Spam
Else
    Comment = Spam

The theory being that:

A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting

The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.

Response to comments

@MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.

@AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.

Mastersinger answered 12/8, 2008 at 4:59 Comment(13)

VERSION THAT WORKS WITHOUT JAVASCRIPT How about if you did this with ASP, etc. and had a timestamp for when the form page was loaded and then compared that to the time when the form was submitted. If ElapsedTime<10 sec then it's likely spam. – Ankerite 9/9, 2008 at 16:48

Very obviously bypassable, if a malicious user bothers to look at it. While I'm sure you're aware of this, I guess you're assuming that they won't bother... Well, if it's not a site of any value, then you're right and they wont bother - but if it is, then they will, and get around it easy enough... – Blakeney 20/9, 2008 at 17:44

The spammer could use some very old page load too. – Arcograph 7/2, 2009 at 16:59

Here's a twist on this that I use. Make the hidden value an encrypted time set to now. Upon post back, verify that between 10 seconds and 10 minutes has elapsed. This foils tricksters who would try to plug in some always-valid value. – Swipple 7/2, 2009 at 22:41

@GateKiller: Iny is saying that a spam bot could delay their response. I.e. cache the page for a few seconds and then submit the postback later (meanwhile trawling other sites and caching those pages). – Rolandrolanda 5/3, 2009 at 5:31

To all who have pointed out that bots could get past... This I know as I pointed out in the answer. It's a very simple method to stop your average bot and bored users. I am currently using it on my blog and so far, it has been 100% successful. – Mastersinger 5/3, 2009 at 9:21

I think it's better to start with easy-to-bypass tests to see if they are adequate. – Flyman 6/7, 2009 at 14:7

An approach such as this is easily circumvented by a custom bot that understands how to correctly submit the mangled form. Stack Overflow receives enough traffic that this would be worthwhile for a spammer to write. – Aggrade 9/7, 2009 at 6:14

I would just like to point out that if I were to write a spambot, I wouldn't be loading the entry-form, I would be submitting to the POST submission page. – Hintze 14/7, 2010 at 13:53

@user257493: Your right, but this type of Captcha is only designed to stop casual bots and not focused attacks. – Mastersinger 16/7, 2010 at 12:5

Take a look at Watin, you could write a C# unit test to work your way around it very easily with that – Redheaded 6/11, 2010 at 18:33

-1: Question specifically asked for a fallback for when JavaScript is disabled. – Anastomose 16/9, 2011 at 16:10

I have filled out forms, pressed "Submit" only to find I have typed in mismatching email address or something similar, But this new page has removed my email address (and sometimes all information) from the form, I then pressed the back button to go back to just before i pressed submit with all my information in, corrected my mistake (usually missed a dot out of .co.uk) then submit, so AntiSpam would be very low and I would be marked as a bot. – Annunciata 5/12, 2012 at 13:6

H

211

My favourite CAPTCHA ever:

Captcha

Heeley answered 12/8, 2008 at 4:59 Comment(12)

That one is great. The link to the site is random.irb.hr/signup.php. Sometimes it's a lot easier – Mohammadmohammed 11/9, 2008 at 8:45

Only problem is that it is really hard for majority of humans but computers will usually have no problem with this. – Turman 22/12, 2008 at 12:14

I believe the answer to that problem is -3? – Jewess 22/12, 2008 at 20:21

@Erik, not really. It also keeps those who have PhDs in computer science but don't want to bother out. – Decillion 23/12, 2008 at 3:10

-3 seems correct. I remember using this website for research a while ago and when I got to the Captcha I was so happy because it was fun and different. It is for access to a quantum random number generator using an actual radioactive decaying source. – Primitivism 27/12, 2008 at 22:20

This is from that site that provides quantum random numbers, isn't it? I nearly shat a brick when I got one of these questions when I tried to register. – Janelljanella 4/5, 2009 at 2:35

The whole point is that if you can't solve it, refresh it. Computers probably won't refresh. ;) – Radu 4/5, 2009 at 3:7

The other problem with this is that a determined attacker could just parse this and plug it into Mathematica, surely? wolfram.com/products/mathematica/index.html – Winded 9/7, 2009 at 7:23

@therefromhere - I don't think this sort of CAPTCHA is prevalent enough to dedicate that much effort to OCR, buying Mathematica, etc. :-) – Heeley 9/7, 2009 at 12:43

I would like to see more sites with captchas like that, it would prevent not only automatic spam but also silly people from posting xD – Satin 21/4, 2010 at 10:15

I like this. It not only filters out bots but also people unlikely to make good use of the service – Leeth 26/5, 2010 at 22:1

YAY. I reloaded 4 times and got one that does not involve integrals! :-D ("Note: If you do not know the answer to this question, reload the page and you'll (probably) get another, easier, question.") – Hluchy 20/12, 2010 at 15:23

M

205

A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:

<input type="hidden" name="antispam" value="lalalala" />

I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:

var antiSpam = function() {
        if (document.getElementById("antiSpam")) {
                a = document.getElementById("antiSpam");
                if (isNaN(a.value) == true) {
                        a.value = 0;
                } else {
                        a.value = parseInt(a.value) + 1;
                }
        }
        setTimeout("antiSpam()", 1000);
}

antiSpam();

Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.

If AntiSpam = A Integer
    If AntiSpam >= 10
        Comment = Approved
    Else
        Comment = Spam
Else
    Comment = Spam

The theory being that:

A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting