How to protect e-mail addresses on a website from modern day JS-enabled bots?
Asked Answered
A

3

8

This is a recurring question on the website, but after spending 20 minutes browsing through old questions I was unable to find a modern day solution.

I've previously employed this JS-based method to protect addresses. Before the JS-method I was using image and flash-based solutions. Below is my old way.

Animated example codepen: http://codepen.io/anon/pen/kIjKe/

HTML:

<span class="reverse eml">moc.niamod@tset</span><br>

CSS:

.reverse {
  unicode-bidi: bidi-override;
  direction: rtl;
}

.eml {
  display: inline;
}

JS:

function reverseEmails() {
  if (jQuery(".eml.reverse").length > 0) {
    jQuery(".eml.reverse").each(function() {
      var that  = jQuery(this);
      var email = that.text().split("").reverse().join("");
      that.removeClass("reverse");
      that.html("<a href='mailto:" + email + "'>" + email + "</a>");
    });
  }
}

None of these methods seem to work nowadays, since Node.js based scrapers are able to generate an image of the page they are scraping, then reading any human-readable data from said image - you can guess the rest.

Is there any method that works nowadays, in which users are still able to easily read / click / copy paste e-mail adresses, but JS-enabled bots could not?

Almeta answered 5/3, 2015 at 14:30 Comment(3)
IMHO you shouldn’t bother with that at all anymore … just use a good spam filter. You will get spam anyways, even on addresses that are not published on the web.Fetishism
Accepting spam as inevitable is just wrong. I can't claim I have 100% spam-free addresses, but I have managed to keep it down to the level where I can still send an abuse complaint about each and every one I receive.Krystalkrystalle
My e-mail is over 10 years old and I still receive just a minute amount of spam. It's doable if you aren't careless :)Almeta
K
2

Put the email address on a separate page which is only reachable by solving a CAPTCHA.

Granted, then the security is only as good as the security of the CAPTCHA.

Using your own obfuscations may be a serious alternative if you only have a limited number of addresses you want to protect. Some ideas I have used in the past;

  • Crossword puzzle. Make it really easy, with cues like famous song titles with one word missing (easy to google and no debate about possible second interpretations). You can fill in many letters to make it even easier.
  • Audio recording with background noise. I didn't want to use my own voice so I used a speech synthesizer, with a German accent (-: AT&T web demo IIRC) and mixed in a couple of seconds of music in the background (Frank Zappa's Peaches en regalia worked very well for me, but tastes differ).
  • Hand drawn image. I like to draw letter outlines but I doubt they are regular enough to pass any OCR.

The real beef here is not the stellar brilliance of these solutions, but the different approaches which I hope can inspire you to think in new directions. In the end, you will always be safer if you come up with your own unique solution; anything resembling a "new de facto standard" will be the lowest-hanging fruit that the scrapers will spend time trying to pluck.

Incidentally, I tried to think about usability for people with disabilities, so I actually deployed the audio version as a fallback for people who have issues with interacting with the other two, which are based on visual layout.

By the by, very few people want to send me email these days anyway (or maybe they do, but end up being rejected as spam?) which is frankly a relief. Those who do typically use the whois registration info for my domain name (which uses an anonymized address provided by the whois registrar) or are good guessers.

Krystalkrystalle answered 29/3, 2015 at 6:14 Comment(1)
Guess I will go with a Click-to-activate modal dialogue captcha that puts the e-mail address on page if you succeed.Almeta
M
10

This is personally my favorite method, which I have found to work so far, it's not bullet proof, in theory a bot that can parse CSS3 and will preform a text search can still find it or a spambot that triggered events in order to harvest email addresses would have to feed the page into basically a headless browser, somehow determine what might be JS-obfuscated email content these scenarios are enormous amount of work for possibly no benefit and no spammer would ever consider doing it, the fact is I have had no spam to date and it works great for humans, both to read or click on:

  <style>
    .email:after{ content:'@mydomain.com'; }
    </style>
    Contact me at:<div class="email">myemail</div>
    <script>
$('.email').click(function(){
window.location.href='mailto:'+$(this).html()+'@mydomain.com';
});
</script>

The thing is that the email is not a link so bots never trigger the click event as they don't even know it will do anything.

Mcclendon answered 29/3, 2015 at 22:12 Comment(1)
Thanks for this tip, seems prettyy solid.Almeta
K
2

Put the email address on a separate page which is only reachable by solving a CAPTCHA.

Granted, then the security is only as good as the security of the CAPTCHA.

Using your own obfuscations may be a serious alternative if you only have a limited number of addresses you want to protect. Some ideas I have used in the past;

  • Crossword puzzle. Make it really easy, with cues like famous song titles with one word missing (easy to google and no debate about possible second interpretations). You can fill in many letters to make it even easier.
  • Audio recording with background noise. I didn't want to use my own voice so I used a speech synthesizer, with a German accent (-: AT&T web demo IIRC) and mixed in a couple of seconds of music in the background (Frank Zappa's Peaches en regalia worked very well for me, but tastes differ).
  • Hand drawn image. I like to draw letter outlines but I doubt they are regular enough to pass any OCR.

The real beef here is not the stellar brilliance of these solutions, but the different approaches which I hope can inspire you to think in new directions. In the end, you will always be safer if you come up with your own unique solution; anything resembling a "new de facto standard" will be the lowest-hanging fruit that the scrapers will spend time trying to pluck.

Incidentally, I tried to think about usability for people with disabilities, so I actually deployed the audio version as a fallback for people who have issues with interacting with the other two, which are based on visual layout.

By the by, very few people want to send me email these days anyway (or maybe they do, but end up being rejected as spam?) which is frankly a relief. Those who do typically use the whois registration info for my domain name (which uses an anonymized address provided by the whois registrar) or are good guessers.

Krystalkrystalle answered 29/3, 2015 at 6:14 Comment(1)
Guess I will go with a Click-to-activate modal dialogue captcha that puts the e-mail address on page if you succeed.Almeta
O
1

I suspect your intuition is correct, if an email address is displayed on a page then a bot can scrape it.

Your best bet is to involve the server-side in some way.

For example, if you just want visitors to be able to reach you, then you can add a "contact us" form like this one: https://store.theonion.com/t-contact.aspx

If you want visitors to be able to reach eachother, then you might need to build an anonymization system like Craigslist uses.

Olmos answered 29/3, 2015 at 0:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.