Has reCaptcha been cracked / hacked / OCR'd / defeated / broken? [closed]
Asked Answered
I

14

175

Have any programming methods have been used to defeat reCAPTCHA?

I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely automated, humanless methods.

To clarify, not looking for reCAPTCHA-cheating solutions that involve humans in any way, whether teams tasked with filling out CAPCHAs, porn-seekers, or Mechanical Turk.

I'm also not looking for alternatives to reCAPTCHA, like picking the type of animal, or background fields or javascript trickery.

Ipoh answered 15/1, 2009 at 23:32 Comment(15)
Norman Ramsey.... How do you close a topic about captcha and say its not programming related? I will never understand.Hild
This is odd to me as well, given the recent programming work to develop client-side javascript-based OCR. I guess I have to hope some others might click 'reopen.'Ipoh
+1 to reopen the question; it is programming related, very much so.Schwerin
the amount of misinformation in these answers is ASTONISHING. If ReCaptcha has been "broken", then someone better tell Facebook, Craigslist, and TicketMaster, stat! :pIntromission
Jeff, they HAVE been told, and the only misinformation is referring to CAPTCHA as a valid security mechanism. It has been empirically broken, both in common implementations AND in theory (not just reCAPTCHA, but the very concept of CAPTCHA). On the other hand, its not COMPLETELY valueless, I've actually referred to this very site as a valid use-case for CAPTCHA - in addition to the many other mechanisms, it can work together to cost the "attackers" just a little bit more.Gautier
Another viewpoint... ha.ckers.org/blog/20090420/google-whats-up-captchaGautier
I'm disappointed that the subject doesn't have pwned in itOllieollis
Some more research on the topic: schneier.com/blog/archives/2010/10/analyzing_captc.html. Actually I found the comments more interesting than the post or research itself...Gautier
Oo! Best CAPTCHA ever! xkcd.com/810Gautier
Disagree with @Flexo that this question "only affects a small geographic area". CAPTCHA is used on the Internet which is worldwide. I think this moderator doesn't understand the internet, or doesn't understand how to moderate - he seems to just close questions for random reasons even if there's nothing wrong with them.Devlin
@Devlin the pertinent part of that close message in my view is: "a specific moment in time". Is recaptcha broken yesterday when the question was asked? Maybe, maybe not if captcha killer is down. Is it broken today? Maybe, maybe not it totally depends on the flip-flopping of the continual game of whack-a-mole that's being played out. That's why I closed this question - it's temporal and not suited to Q&A format particularly well.Pneumato
@Pneumato The moment of time is irrelevant. He's asking if it has been hacked. Once it's been hacked then the answer for-evermore is "yes". You cannot go back in time and undo the hacking and knowing if it was ever hacked/cracked/OCRd is very useful to determine why it appears to have been bypassed on your own system.Devlin
@Pneumato Just to chime in here several years later, the point of my asking is that if it has ever been hacked then it is not a suitable security measure. Whether it is, at this moment, hacked isn't particularly relevant. (If a bank is regularly robbed, it's not secure, even if it isn't currently being robbed.)Ipoh
Whether a particular remote CAPTCHA implementation has been hacked is difficult to answer, since if it has, the implementation can be updated, and then the answer is negative again. The answer to this question is therefore: yes, no, yes, no, etc. I believe the current answer is "no", but this is subject to change.Goring
This question is unlikely to help any future visitors; it is only relevant to a small geographic area.. blah blah... And the question has 152 upvotes and 63 favorites. Must have been opened.Furan
G
94

I notice that almost all the answers here relate to the ineffectiveness of the concept of CAPTCHA, in principle - and while I very much agree with them, in fact gave a talk at OWASP a few months ago explaining just that - the question is very specific, so I will provide for a demonstration.
But first, I will reiterate that demonstration aside, re-read the other comments, since it's truth that CAPTCHA is pointless and not helpful, irrelevant of implementation....

But really, check out CAPTCHA Killer. You can upload a CAPTCHA image, and it will automatically, if not immediately, provide the OCR'd answer. It also provides for an API (REST, I think, but maybe also SOAP). I personally tried numerous reCAPTCHA images, and it was actually some of the easiest ones (or at least quickest) broken.

UPDATE: CAPTCHA Killer's website is now taken down, apparently under legal pressure. See http://captcha.org/ for a complete overview of the topic.

And yeah, OCR is not the best way to break a CAPTCHA protected site - there are many other better ways.

Gautier answered 14/2, 2009 at 21:43 Comment(9)
I wonder how captcha killer works. Somehow it looks to me like it's using cheap labour and making money with the advertisement on the website. (And merchandising.)Crashland
I'm pretty sure it's OCR, but I could be wrong.Gautier
Useful answer about captchas in general, but the question was about reCAPTCHA specifically.Laband
Just tried Captcha Killer with three reCAPTCHAs. All three expired without returning an answer.Subtilize
@Mike, reCAPTCHA is not necessarily MORE broken than CAPTCHAs in general, but all that of course applies to it too... Also, as I mentioned reCAPTCHA images were the quickest broken. @Ifaraone, I find that odd, its worked fine for me before, and as Ive said specifcally reCAPTCHA images were the quickest broken... Though I havent done it in quite a while, I'm going to check it out again.Gautier
+1 @AviD, Thanks. I am ovewhelmed reading advises of spam victims teaching others ("how to become spam victims"). Can you give me link on hotPatcha you mentioned in #449463 ? I could not find any hotCaptcha by search. HotOrNot.com uses reCaptcha. Can you give link? BTW read my #8972Drummer
@vgv8, I don't think hotCaptcha is still around, it was pretty much for entertainment value only. I don't think it was even ever operational, it was mainly a demo to show another possible form of captcha...Gautier
CAPTCHA Killer seems to have be killed: it has been violently destroyed by multinational corporations seeking to spread their overlord dominion and eliminate the freedom of creative expression! Such a beautiful killer, such an early death!Chain
I think its just change of domain and the version become paid now, check this bypasscaptcha.com/captchakiller.phpSalad
D
54

You might be interested in this detailed report on how 4chan defeated reCAPTCHA, and used it to manipulate Time.com's annual TIME 100 Poll results.

Hacking Recaptcha (aka ‘The Penis Flood’)

The next tactic used was to see if they could find a flaw in the reCAPTCHA implementation. One thing they discovered about reCAPTCHA was that it always presents two words to a user for decoding - one word is a control word known by the reCAPTCHA system, while the other is an unknown word (reCAPTCHA uses the humans to help correct OCR errors). Wikipedia describes the process: “Scanned text is subjected to analysis by two different optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. The word is displayed along with a control word already known and is labeled by the human. Those words that are consistently given a single label by human judges are recycled as control words”. 2iasdo4 What Anonymous realized was that if they always labeled the unknown scanned text with the same word - and if they did this thousands and thousands of times eventually a large percentage of the unknown words would be mislabeled with their word. All they had to do was look at the two words in the captcha, enter the proper label for the ‘easy’ one (presumably that would be the one that the two optical scanners would agree upon) and enter the word “penis” for the hard one. If they did this often enough, then soon a significant percentage of the images would be labeled as ‘penis’ and the ability to autovote would be restored (one side effect, that was not lost on Anonymous, was the notion that for years to come there would be a number of digital books with the word ‘penis’ randomly inserted throughout the text. Update: I asked Ben Maurer, chief engineer of reCAPTCHA about this ‘penis flood‘ attack, Ben says that they’ve anticipated this type of attack and they have numerous protections that will keep the penises from penetrating the reCAPTCHA barrier.

Optimizing reCAPTCHA

As appealing as the notion of sprinkling the word ‘penis’ into texts, the Anonymous team knew that the clock was ticking, and if they were going to restore the Message they didn’t have time to wait for the autovoters to come back online - they were going to have to vote manually, many, many times. And so they needed to be able to enter captcha’s as fast as they could. They developed a set of guidelines that allowed them to quickly decide which reCAPTCHA words they could skip. For example:

You will be given 2 words: 1 real, 1 fake.

For [REAL FAKE] or [FAKE REAL], you can just type in REAL and it should be accepted.

If it’s [LOOKSREAL LOOKSREAL] or [LOOKSFAKE LOOKSFAKE], it’s usually just quicker to just type in both words. Don’t waste precious time deciding which one of them is real.

Use both the appearance and the type of word to identify a fake word. Don’t rely on just one of them.

The whole ruleset is here: fake captcha.

Deanadeanda answered 29/4, 2009 at 9:17 Comment(4)
But is not the point of that story that they did not break reCAPTCHA? They instead succeeded by streamlining the manual voting process to allow determined volunteers to vote thousands of times each.Gratulation
@pdc, just because they didnt OCR the images (though this could also have been done), doesnt mean they didnt break reCAPTCHA. Think about it like this: Is the purpose of reCAPTCHA to present undecipherable images? Or is it to prevent automated flooding? If its the first, you might be able to argue that it was not broken (arguable, but I would not agree with you), but if its the second - then you have empiric proof that reCAPTCHA does not work. I also think it should be quite clear that aside from entertainment value, the SECOND purpose is the real one, and only one that counts.Gautier
@Gautier Huh? According to the article, automated flooding was no longer possible. Rather, dedicated people were able to vote several times faster than they otherwise could (and various non-captcha-related techniques were used to thwart ineffective measures against such heavy voting by humans). Basically equivalent to using cheap human labor - which reCAPTCHA of course doesn't claim to stop.Tedmann
@Tedmann that's exactly the issue, reCAPTCHA doesn't try to stop the real problem. CAPTCHA tries to solve the wrong problem, badly.Gautier
A
32

The weakness of CAPTCHA systems is that people set up rooms full of people in China whose only job it is is to look at a CAPTCHA image and type in the result, which plugs into the automated system that's actually doing the spamming.

Not much you can do about that really.

It's also far cheaper than trying to do image recognition, OCR, etc on the actual image (you may get a response for under $0.01 the other way).

Annabal answered 15/1, 2009 at 23:34 Comment(13)
Or even better, they grab the captcha off your site, and show it to some wanker (literally) as a requirement to showing them some porn.Willis
Man... that's clever (credit where credit is due).Annabal
not only that but Amazon Web Services can enable such things. aws.amazon.com/mturkVacuous
Note that this doesn't make it an ineffective tool. It merely means that if your site is popular enough then this might happen. For the other 99.99% of the websites in the world, a simple captcha will do.Pentacle
Hell, CodingHorror's captcha doesn't even change, nor is it obfuscated, and it manages to do the job all right!Pentacle
@Robert, I wonder how that word was chosen.Marcin
Servers can and will ban your IP after too many account registrations. So a good sized and/or growing botnet is needed as well.Niven
@Paul: Spam and the like is pure evil, but that solution is so remarkably cool...if only they could be turned to the power of good!Basilius
@Paul, that is hilariously brilliant.Wilmoth
Actually, that's not entirely true. Although there are examples of this, it is FAR cheaper to OCR-crack a CAPTCHA. Using sweat shops are usually NOT economically feasible for the spammers.Codger
+10 for Robert P's comment if I could. Its a very very practical observation (and useful to many webmasters prematurely losing their sleep over their captchas being broken).Odelle
Sources? This sounds like an urban (read: internet) legend.Hullda
Death By Captcha is already doing that.Span
K
21

Before giving in to the pressure of using captcha, consider creative workarounds such as having a field labeled "Your Comments" that is hidden by CSS. If the field is entered, the request is dropped by the server. Most bots will fall for it even if there is still not a good way to defeat the room full of underpaid laborers, which captcha does not help with anyways.

UPDATE: Just read a case study where removing CAPTCHA increased conversion rates by almost 10%. That would indicate to me that it is rather broken if you are losing 10% of your leads just to filter out bots. Imagine what 10% means to most businesses.

Kegler answered 15/1, 2009 at 23:53 Comment(6)
This is very smart but doesn't work if you're sufficiently popular. Yahoo or Google, for example, could never use this.Cherokee
The question here is whether your site is valuable enough to attack specifically. Most aren't, and having little idiosyncrasies will do some good.Spleen
I would +1 for the update re 10% loss - VERY important point. (but I can't +1 cuz of the hidden field suggestion - this is less than useless.)Gautier
There are 2 problems "targeted attack" and "random spam". Your solution might save your ass for random spam, a targeted attack will flood your system within a day though.Halfon
@webdtc, its as Slough said. Useless because it's absolutely trivial for a script to get around this, less than that because of the false sense of security.Gautier
@dreeves: didnt google just acquire reCAPTCHA?Andy
M
18

My favorite captcha is from Microsoft: http://research.microsoft.com/en-us/um/redmond/projects/asirra/

Asirra (Animal Species Image Recognition for Restricting Access) is a HIP that works by asking users to identify photographs of cats and dogs. This task is difficult for computers, but our user studies have shown that people can accomplish it quickly and accurately. Many even think it's fun!

It is a free service and they have example code to get you started.

I wonder how long it will be before it is cracked.

Marcin answered 15/1, 2009 at 23:40 Comment(7)
Unfortunately cletus's answer above shows how such a service will be ineffectual in the greater fight against spam.Salvador
i failed that one 2 out of 4 times, a badly lit picture of a Pomeranian can look like a cat :(Unconstitutional
I took the test and it feels good to know that I am a human. :)Marcin
Actually the best captcha used to be HotCaptcha - but its offline last time i checked. Based on HotOrNot.com, it wasnt horribly effective, but VERY popular with the users :-)Gautier
The issue here is that it would be very easy to brute force due to a small key space. If yuo start adding more objects to name then you get into ambiguity in naming (example, is it a Kangaroo, a Joey, or a Baby Kangaroo?). You would need to make sure you had a one to many relation between objects to be named and their possible names.Develop
Argh... I hope people don't start using it. Its so tedious for the user.Odelle
doesn't numenta.com/legacysoftware.php (Numenta Vision Toolkit) specificly do dog vs cat for free as of like 1995?Expectant
M
11

reCAPTACHA isn't broken and it won't be for a very long time. The thing is, if you implement your own captcha if it's broken, it probably takes a long time to fix it.

This is taken from the page about reCAPTCHA security:

reCAPTCHA is a Web service. That means that all the images are generated and graded by our servers. (…) this also provides an extra level of protection: our CAPTCHAs can be automatically updated whenever a security vulnerability is found.

For example, if somebody writes a program that can read our distorted images, we can add more distortions in very little time, and without Web masters having to change anything on their side.

I believe as they are specialized on captchas they have improved versions stored, ready to be deployed in little time if needed. (Why should they create stronger security when the weaker isn't broken yet?)

Mordancy answered 19/2, 2009 at 23:55 Comment(0)
S
9

Not only has it been defeated, but also a useful application has been successfully built on top of it, to become the most amazing tool to defeat all kind of free-account protections of a big list of direct download sites (not only megaupload and rapidshare).

Jdownloader is open source and written in Java so a peek at the source code can answer not only if it is broken but also how.

Edit: Most of direct download sites do not use reCaptcha, but a simpler Captcha method (3 capital letters colored in different colors). Nonetheless Jdownloader and Cryptload (a program similar to Jdownloader) are the only working implementations that I know that effectively have broken a Captcha method. I have not heard of any implementation to crack reCaptcha.

Update: It seems that at least one implementation of reCaptcha (not whole reCaptcha itself) has been cracked too.

Update Dec 2010: Jdownloader seems at last to be defeating reCaptcha. The plugin is still experimental and works only on Windows versions of Jdownloader, but, as I have been told by a mate who tried it, it does work.

Speedboat answered 16/2, 2009 at 7:58 Comment(2)
Do you know which one of those filehosters use RE-captcha because rapidshare and megaupload don't.Halfon
@dr.evil it was covering a list of hosters almost all we can say, as the list was containing many that we minght not have heard any time, the program was smart enough to break most of captcha and if not it was prompting user for the same, ain't it useful. I have used that in past personally. It was one of the best downloader in some cases better then IDM, Please note: I am not promoter of jDownloader. Thank youSalad
M
8

There was a speech at Defcon last year that went into the problems with CAPTCHAs in general. One of the things they did is use multiple free OCR engines and had them vote on the best words. Doing this, they were able to achieve a somewhat decent chance of succeeding. For one kind, it was 40% or so, I don't think it was reCaptcha, though.

Mycah answered 15/1, 2009 at 23:59 Comment(1)
That's an important point, a spam bot doesn't have to break all capthas - 1% would do if it can keep trying.Vitiligo
S
8
  • "In fact, it [reCAPTCHA] became pretty useless on 4 January [2011] when spammers apparently got their collective hands on a piece of software that circumvents reCAPTCHA and allows for a fully automated registration process. The bots have been busy, very busy indeed, ever since" [ 1 ]

2-3 years ago the text-typing based captchas approach trespassed the line when they lost its battle, i.e. further complications just make them relatively (since computer power is increasing, while human's not) easier for machines and more repugnant and repelling, if not completely impossible, to humans. This contadicts to original paradigm of CAPTCHA as a test to to ensure that the response is not generated by a computer

Update:
Note that reCAPTCHA is owned by Google Inc. but Google Inc. does not use it by their own services.
Here is a link containg webpage with captcha used by Google itself/internally for ex., for Gmail registration:

alt text



Note that Google's reCAPTCHA always has 2 words.
Here is the link for image with Google's reCAPTCHA offered to be used by others.

And reCAPTCHA's screenshot:

alt text

I leave to make the obvious conclusions to a reader.

Cited: [ 1 ]
vBulletin forums hit by reCAPTCHA cracking spam bot | PC Pro blog
Posted on January 12th, 2011 by Davey Winder

Siloum answered 13/1, 2011 at 10:32 Comment(0)
F
5

I'm seeing blog comments on a system protected by reCAPTCHA where the page loads and 1 second later the post was made successfully. The User-Agent was nonsense (in this particular case it claimed to be running Ubuntu 9.25/Firefox 3.8), the referrer was from a completely unrelated site with no link to us.

This is clearly automated.

Fitton answered 2/3, 2010 at 14:49 Comment(0)
L
3

reCAPTCHA has not been defeated. If it had been, then why did Google just buy it and announce they will be applying the technology within Google to increase fraud and spam protection for Google products?

from Google Acquires reCAPTCHA posted to the Google Blog on 9/16/09:

In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

Laband answered 3/10, 2009 at 2:39 Comment(0)
W
3

The easiest way to defeat Captchas is Amazon Mechanical Turk. There's a guy named Kermit Welda who pays people a nickel each to register Hotmail, AOL and Gmail accounts. That's 6,000 fake email accounts at 5 cents = $300 a day. The cost of doing business is pretty cheap when you have other people do the dirty work for you. No wonder our server's spam filters want to reject anything from Hotmail.

Wormseed answered 9/2, 2011 at 5:6 Comment(3)
Is this really an answer...?Hal
Make sense, some similar concept to Death By Captcha.Span
OP ha clearly stated this is not what he is looking for.Souvaine
H
2

AFAIK In practice there is no tool to crack RE-captcha implementation, however eventually I assume someone will get it.

Funny enough if someone manages to get it then the whole RE-captcha project is pointless because re-captcha designed digitalize books which can't be done in an automated way.

BTW :

The weakness of CAPTCHA systems is that people set up rooms full of people in China whose only job it is is to look at a CAPTCHA image and type in the result, which plugs into the automated system that's actually doing the spamming.

You can't secure a system thinking like that, this is like saying "your web application is not secure enough if your host is not in a old military bunker, because now people can steal your machine".

Halfon answered 16/2, 2009 at 8:1 Comment(2)
Your sentiment is spot on, but the application of it is misplaced: The thinking (of the comment you quoted) is that CAPTCHA does not solve the problem it intends to. Or as I often say "CAPTCHA (in general) is a bad solution to the wrong problem." The problem CAPTCHA tries to solve (by definition) is: How do I know that the user is a person, not a computer? Whether or not CAPTCHA solves this (it doesnt), the REAL problem is: How can I prevent mass flooding of my service? CAPTCHA farms and proxies show the exact difference. It's why any security solution should start with the threats.Gautier
You right, it's all come down "Why are you using CAPTCHA?". For some systems it's just enough security for some systems it's not even close. But just like keysize in crypto helps you to protect something by making brute forcing take years (although eventually they are going to crack it! but not in this life time or not in next 10 years) CAPTCHA in some systems can help enough security in the very same way. So as you said it's all come down what are you using CAPTCHA for?Halfon
F
2

There are lots of methods that are used to crap recaptcha. While its hard to use neural netwpork enabled programs to automatically solve them, its possible to grab the image and have amazon's mechanical turk or some equivalent program to solve them.

http://codemagician.wordpress.com/2010/01/22/solving-recaptcha/

Fabri answered 30/1, 2010 at 21:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.