Hunting cheaters in a voting competition
Asked Answered
D

19

64

Currently we are running a competition which proceeds very well. Unfortunately we have all those cheaters back in business who are running scripts which automatically vote for their entries. We already saw some cheaters by looking at the database entries by hand - 5 Star ratings with same browser exactly all 70 minutes for example. Now as the userbase grows up it gets harder and harder to identify them.

What we do until now:

  1. We store the IP and the browser and block that combination to a one hour timeframe. Cookies won't help against these guys.
  2. We are also using a Captcha, which has been broken

Does anyone know how we could find patterns in our database with a PHP script or how we could block them more efficiently?

Any help would be very appreciated...

Draft answered 25/2, 2010 at 9:51 Comment(5)
A combination of IP and browser is not a unique combination. I'm behind a proxy server, and use a standard browser. If the guy sitting next to me is using the same browser, one of us can't vote (our IP is the same to your web site).Eudocia
Maybe you've been Pharyngulated.Sloane
@David: sacrificing access for some users may be justified when spam is a big problemLilly
You're in an arms race with them at this point. You may have to step up to authentication (making them prove they are a specific person). Doing that requires a persistent identity which is valuable to them to maintain. If you allow anonymous voting, the best you can do is combat their tricks one by one--and they have more free time than you do. Why do you think Stack Overflow only lets people with established identities vote?Broadbrim
As said a little bit more downwards the captcha hasnt been broken... the users just sit in fornt of theri pc's typing in the captcha each time... THEY SEEM TO HAVE A LOT OF TIME... The solution for now - JUST FORTHE MOMENT - was to block IP's as they are... we dont care anymore about enterprise networks with one IP. And we dont give anyopne any information about their position in the contest. We just say : Bottom Line, Midfield, Stars... And in the best of view we just provide random users with good votings... the rest must be done by hand. I really whished it was easier ;-)Draft
L
80

Direct feedback elimination

This is more of a general strategy that can be combined with many of the other methods. Don't let the spammer know if he succeeds.

You can either hide the current results altogether, only show percentages without absolute number of votes or delay the display of the votes.

  • Pro: good against all methods
  • Con: if the fraud is massive, percentage display and delay won't be effective

Vote flagging

Also a general strategy. If you have some reason to assume that the vote is by a spammer, count their vote and mark it as invalid and delete the invalid votes at the end.

  • Pro: good against all detectable spam attacks
  • Con: skews the vote, harder to set up, false positives

Captcha

Use a CAPTCHA. If your Captcha is broken, use a better one.

  • Pro: good against all automated scripts.
  • Con: useless against pharygulation

IP checking

Limit the number of votes an IP address can cast in a timespan.

  • Pro: Good against random dudes who constantly hit F5 in their browser
  • Pro: Easy to implement
  • Con: Useless against Pharyngulation and elaborate scripts which use proxy servers.
  • Con: An IP address sometimes maps to many different users

Referrer checking

If you assume that one user maps one IP address, you can limit the number if votes by that IP address. However this assumption usually only holds true for private households.

  • Pro: Easy to implement
  • Pro: Good against simple pharyngulation to some extent
  • Con: Very easy to circumvent by automated scripts

Email Confirmation

Use Email confirmation and only allow one vote per Email. Check your database manually to see if they are using throwaway-emails.

Note that you can add +foo to your username in an email address. [email protected] and [email protected] will both deliver the mail to the same account, so remember that when checking if somebody has already voted.

  • Pro: good against simple spam scripts
  • Con: harder to implement
  • Con: Some users won't like it

HTML Form Randomization

Randomize the order of choices. This might take a while for them to find out.

  • Pro: nice to have anyways
  • Con: once detected, very easy to circumvent

HTTPS

One method of vote faking is to capture the http request from a valid browser like Firefox and mimic it with a script, this doesn't work as easy when you use encryption.

  • Pro: nice to have anyway
  • Pro: good against very simple scripts
  • Con: more difficult to set up

Proxy checking

If the spammer votes via proxy, you can check for the X-Forwarded-For header.

  • Pro: good against more advanced scripts that use proxies
  • Con: some legitimate users can be affected

Cache checking

Try to see if the client loads all the uncached resources. Many spambots don't do this. I never tried this, I just know that this isn't checked usually by voting sites.

An example would be embedding <img src="a.gif" /> in your html, with a.gif being some 1x1 pixel image. Then you have to set the http header for the request GET /a.gif with Cache-Control "no-cache, must-revalidate". You can set the http headers in Apache with your .htaccess file like this. (thanks Jacco)

  • Pro: uncommon method as far as I know
  • Con: slightly harder to set up

[Edit 2010-09-22]

Evercookie

  • A so-called evercookie can be useful to track browser-based spammers
Lilly answered 25/2, 2010 at 9:51 Comment(8)
5. Proxy / Dynamic IP / Overkilling for multiple computers sharing IP via NAT.Advanced
Option 6 is a winner: If the 'browser' does not request any additional resources/does not make any additional request, consider discarding the vote. (put in some resource with must-revalidate).Eurystheus
There are plenty of tools these days that will plug into the browser and capture a request even on for https. 6 isn't as good as it sounds.Skiles
Randomizing the order is probably a good idea anyway - it avoids the bias for real voters, who might select a earlier option over a later one.Marin
Email confirmation is only minor protection. Mailinator exists, and it's really not that hard to set up a domain name with a billion addresses. Of course, you can detect that too, but it's whack-a-mole time at best.Broadbrim
Hey thank you for your long list of all possibilities. Actually we dont have any problems with toooo advanced users yet. We found out that the cheaters filled out the captcha by hand each time. All the other stuff they did was to reset their router and use different browsers and always cleared their cookies. Right now we fell back to a pure IP limitation. The competition only works if we dont force users to be registerd. So we investigate by hand and observe all the top entries. If someone obviously cheats he will be banned. Right now we have the "luck" that they do it somehow unprofessional.Draft
HTML Form Randomization <- I thought this also had to do with randomizing form submit fields and matching with the session key.. I use it and it stops bots/scripts dead in their tracks. (i.e. <input id="ZX23PSD21" ... > and is only valid for cookie id "PCASD41234")Albescent
Great list! Just wanted to say though, that using Evercookie or other techniques to store data on the client with the intention of making said data hard for the user to remove, is likely illegal in many countries.Terriss
I
6

Have you tried to do browser fingerprinting? Check this open source from EFF: https://panopticlick.eff.org/ Could be used to identify one person similar to 500-1500 in the world (!).

Infielder answered 25/2, 2010 at 11:2 Comment(0)
A
4

You may add captcha to voting form. Also requiring e-mail confirmation will be useful

Amused answered 25/2, 2010 at 9:56 Comment(2)
AHhh i forgot... Sure we are using a captcha... but this has been broken at least by two users....Draft
recaptcha is recognized on $1 per 1000 captchas rateCraniology
M
2

If you're really worried about it then you have to do something like email verification, which might be sufficient to block most cheaters.

Also it depends whether multiple people behind a NAT are likely to want to vote for the same option (e.g. favourite school).

Any scheme you create can be gamed.

EDIT: As everyone else has suggested, you can use a CAPTCHA such as reCAPTCHA to block automated bots, and make humans less likely to repeat vote. At the cost of making humans less likely to vote at all.

Marin answered 25/2, 2010 at 9:57 Comment(0)
W
2

The Vote to Promote pattern (you may be aware of it) has a section on how to mitigate against gaming - but it is a tricky one to avoid altogether. Given your actions to date I would consider using weighting, for example consider a reasonable level of voting over a time period, say 10 votes per ting per hour (just an example not a guide) and for surplus votes weight the next 10 at 90% (ie only count 9), the next 10 at 80% and so on. This is Yahoo's advice on gaming within this pattern:

Community voting systems do present a number of challenges. Particularly the possibility that members of the community may try to game the system, out of any number of motivations:

  • malice - perhaps against another member of the community and that member's contributions.

  • gain - to realize some reward, monetary or otherwise, from influencing the placement of certain items in the pool)

  • or an overarching agenda - always promoting certain viewpoints or political statements, with little regard for the actual quality of the content being voted for.

There are a number of ways to attempt to safeguard against this type of abuse. Though nothing can stop gaming altogether. Here are some ways to minimize or hinder abusers in their efforts:

  • Vote for things, not people. In keeping with Yahoo's general strategy, don't offer users the ability to directly vote on another user: their looks, their likeability, intelligence, or anything else. It's OK for the community to vote on a person's contributions, but not on the quality of their character.

    • Consider rate-limiting of votes. o Only allow the user a certain number of votes within a given time-period. o Limit the number of times (or the rate at which) a user votes down a particular user's content. (To prevent ad-hominem attacks.)

    • Weigh other factors besides just the number of votes. Digg, for instance, does not calculate their Digg-score solely on the number of votes a submission receives. Their algorithm also considers: "story source (is it a blog repost, or the original story), user history, traffic levels of the category the story falls under, and user reports." They update this algorithm frequently. Consider keeping the exact algorithm a secret from the community, or only discuss the factored inputs in general terms.

  • If relationship information is available consider weighting user votes accordingly. Perhaps prohibit users with formal relationships from voting for each other's submissions.

While this is currently a popular pattern on the Web, it is important to consider the contexts in which we use it. Very active and popular communities (Digg is an excellent example) that enable community-voting can also engender a certain negativity of spirit (mean comments, opinionated cliques, group attacks on 'outlier' viewpoints).

Womanish answered 25/2, 2010 at 10:31 Comment(0)
T
2

Check out Asirra: http://research.microsoft.com/en-us/um/redmond/projects/asirra/ It's still in beta, but it's pretty cool.

Terriss answered 25/2, 2010 at 17:29 Comment(0)
I
1

To prevent the bots from voting you can use CAPTCHA.

Invective answered 25/2, 2010 at 9:57 Comment(0)
B
1

The only thing that comes to mind is using a Captcha. Either an elaborate one with pictures and noise like the ReCaptcha service, or a very simple and unobtrusive one like "What is seven plus three?" or (If you're located in the US), "What is the last name of our President", simple common sense questions everybody can answer. If you change them often enough, this could even be more effective than a classic image-based CAPTCHA.

Bridgetbridgetown answered 25/2, 2010 at 9:58 Comment(1)
AHhh i forgot... Sure we are using a captcha... but this has been broken at least by two users....Draft
O
1

CAPTCHA's aren't a silver bullet, the user could have their script display the CAPTCHA to them and solve them manually for at least several votes per minute.

You need to use them in combination with other techniques mentioned here.

Orfurd answered 25/2, 2010 at 10:54 Comment(0)
O
1

You could add a honeypot field like in Django. Most likely, this will not protect you from cheaters who deliberately want to change your competition, but at least you will have lesser 'drive-by' spammers to additionally take care of.

Orbiculate answered 25/2, 2010 at 13:17 Comment(0)
T
1

Sorry for the double post, but I wasn't allowed to post two URLs in the same post...

If you're looking at building your own tracking, maybe this link might provide some inspiration: https://panopticlick.eff.org/ Turns out that a lot of browsers can be uniquely identified, even without any form of tracking cookies. I'm guessing a vote-bot might give a very specific fingerprint?

Terriss answered 25/2, 2010 at 17:41 Comment(0)
D
1

So if everyone ever wants to make a competition where people can win something and wanna use a community driven rating system... here i share some experiences:

The bad:
1) First it cant be made secure for 100%
2) to reach a mass of users which filters out all the nonsense ratings is very hard 3) Forget about star ratings in that case... their is always either 5 Stars or 1 Star

The good
1) Dont give them orientation about where they stand... We replaced the "Order by place" view with a random presentation of the TOP 100 (only the top 30 wll win a price)... This really helped because a lot of users lost their interest as soon as they didnt see where they stood.

2) Don't allow votings like: 1x5_Stars 40x1_Star... Just allow users which vote in a fair way...

3) Most of them act a little bit stupid... You'll see them in your logs and can trace down who votes fair and who unfair... Search for patterns...

**GOOD LUCK ;-) **

Draft answered 18/3, 2010 at 22:10 Comment(0)
T
0

CAPTCHA is always good, might be "disturbing" for some users though.

reCAPTCHA is a fairly used service

Trinitrophenol answered 25/2, 2010 at 9:57 Comment(0)
J
0

How about only allow users who logged in with openid and with reCaptcha before submitting the vote, and monitering the submitter list with same ip address.

Joses answered 25/2, 2010 at 10:10 Comment(0)
V
0

We use a combination of CAPTCHA and email. The user receive a link with a GUID by mail. This one must be unique for each user that try to vote. www.votesite.com/vote.aspx?guid=..... By using this link the vote is confirmed or not. In database we check the combination of email address and GUID to be unique.

Veldaveleda answered 25/2, 2010 at 12:54 Comment(0)
O
0

I use a combination of CAPTCHA, IP verification and LSO (Flash Local Shared Objects, hard to find and delete for common people).

Op answered 25/2, 2010 at 13:56 Comment(0)
I
0

1.Use recaptcha
2. Yes randomize your voting options but not like this:
      -> from vote_id_1 to asdsasd_1, grdsgsdg_2,
      Instead use session variables to set a mask from vote_id_1 to asgjdas87th2ad in the vote form.

Iulus answered 25/2, 2010 at 16:31 Comment(0)
R
0

What about some post hoc stochastic analysis, like time series analysis - looking for periodicity in events of particular (ip, browser, vote)? You could then assign probability to each such group of events that it belongs to 1 person and either discard all such groups of events beyond some probability level, or use some kind of weighting to lower the weight according to the probability.

Look in R, it contains A LOT of useful analysis packages.

Rhearheba answered 23/7, 2011 at 9:48 Comment(0)
G
0

Check the domain details of the email they are using. I had the same problem and found that all of them were registered to the same registrant. I wrote it up here: http://tincan.co.uk/659/news/competition-spammers.html

Now, I filter on the DNS information for the email used in the registration.

Goodrum answered 11/10, 2011 at 13:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.