Hide Email Address from Bots - Keep mailto:
Asked Answered
C

12

115

tl;dr

Hide email address from bots without using scripts and maintain mailto: functionality. Method must also support screen-readers.


Summary

  • Email obfuscation without using scripts or contact forms

  • Email address needs to be completely visible to human viewers and maintain mailto: functionality

  • Email Address must not be in image form.

  • Email address must be "completely" hidden from spam-crawlers and spam-bots and any other harvester type


Desired Effect:

  • No scripts, please. There are no scripts used in the project and I'd like to keep it that way.

  • Email address is either displayed on the page or can be easily displayed after some sort of user interaction, like opening a modal.

  • The user can click on on the email address which in turn would trigger the mailto: functionality.

  • Clicking the email will open the user's email application.

    In other words, mailto: functionality must work.

  • The email address in not visible or not identified as an email address to bots (This includes the page source)

  • I don't have an inbox that's full of spam


What does NOT Work

  • Adding a contact form - or anything similar - instead of the email address

I hate contact forms. I rarely fill up a contact form. If there's no email address, I look for a phone number, and if that's not there, I start looking for an alternative service. I would only fill up a contact form if I absolutely have to.

  • Replacing the address with an image of the address

This creates a HUGE disadvantage to someone using a screenreader (please remember the visually impaired in your future projects)

It also removes the mailto: functionality unless you make the image clickable and then add the mailto: functionality as the href for the link, but that defeats the purpose and now the email is visible to bots.


What might work:

  • Clever usage of pseudo-elements in CSS

  • Solutions that make use of base64 encoding

  • Breaking up the email address and spreading the parts across the document then putting them back together in a modal when the user clicks a button (This will probably involve multiple CSS classes and the usage of anchor tags)

  • Alterting html attributes via CSS

@MortezaAsadi gracefully brought up the possibility in the comments below. This is the link to the full - The article is from 2012:

What if We Could Use CSS to Alter HTML Attributes?

  • Other creative solutions that are beyond my scope of knowledge.

Similar Questions / Fixes

(This a great fix suggested by Joe Maller, it works well but it's script based. Here's what it looks like;

<SCRIPT TYPE="text/javascript">

  emailE = 'example.com'

  emailE = ('yourname' + '@' + emailE)

  document.write('<A href="mailto:' + emailE + '">' + emailE + '</a>')

</script>

<NOSCRIPT>

  Email address protected by JavaScript

</NOSCRIPT>

(JavaScript fix)

The selected answer works. It actually works really well. It involves encoding the email as html entities. Can it be improved?

Here's what it looks like;

<A HREF="mailto:

&#121;&#111;&#117;&#114;&#110;&#097;&#109;&#101;&#064;&#100;&#111;&#109;&#097;&#105;&#110;&#046;&#099;&#111;&#109;">

&#121;&#111;&#117;&#114;&#110;&#097;&#109;&#101;&#064;&#100;&#111;&#109;&#097;&#105;&#110;&#046;&#099;&#111;&#109;

</A>

(The selected answer to this SuperUser question is great and it presents a study of the amount of spam received by using different obfuscation methods.

It seems that manipulating the email address with CSS to make it rtl does work. This is the same method used in the first question I linked to in this section.

I am uncertain what effects adding mailto: functionality to the fix would have on the results.

  • There are also many other questions on SO which all have similar answers. I have not found anything that fits my desired effect

The Question:

Would it be possible to increase the efficiency (ie as little spam as possible) of the email obfuscation methods above by combining two or more of the fixes (or even adding new fixes) while:

A- Maintaining mailto: functionality; and

B- Supporting screen-readers


Many of the answers and comments below pose a very good question while indicating the impossibility of doing this without some sort of js

The question that's asked/implied is:

Why not use js?

The answer is that I am allergic to js

Joking aside though,

The three main reasons I asked this question are:

  • Contact forms are becoming more and more accepted as a replacement for providing an email address - which they should not.

  • If it can be done without scripting then it should be done without scripting.

  • Curiosity: (as I am in fact using one of the js fixes currently) I wanted to see if discussing the matter would lead to a better way of doing it.

Campstool answered 25/12, 2016 at 5:51 Comment(18)
I think that if you're looking to maintain mailto: functionality and you're not willing to use Javascript, then it's just not possible.Malady
Do you want Use CSS to Alter HTML Attributes?Incandescence
@Rishav I agree with you that it might be quite tricky to get the desired effect without using js; however, I would say that the existence of methods such as the one I highlighted where the you encode the email as html entities might negate the impossibility of it.Campstool
@MortezaAsadi Can you post an example of what you referred to as an answer?Campstool
@i-love-css take a look to this article: andydavies.me/blog/2012/08/13/…Incandescence
@MortezaAsadi Thank you very much for that link, very interesting idea I will add it to the questionCampstool
What you might do is to create your own tag: <mail-to href="[BASE64 ENCRYPTED MAIL]" subject="Contact me">Contact me</mail-to> . Ideas: blog.teamtreehouse.com/create-custom-html-elements-2, NOTICE: your tag needs to have a dash in it, for example : some-tag. Basically the tag music include the minus symbolEvelyne
@RichardOlsenSandberg. Thank for the the article, I may have misunderstood but wouldn't that require the use of js to "define" the custom html tags?Campstool
@ILoveCSS, Yes, it would..Evelyne
@ILoveCSS I will happily make that class to for example <mail-to href="[Encrypted Mail With Between Encryption]" subject="Contact Request" [etc]>, I have never made a such class, but I am experienced in that zone beause I have worked with such things before, it would be great doing a collaboration on a such nice class.Evelyne
Spam filters are so good nowdays, I would just do a regular mailto with no obfuscation rather than going to war with bots to try and obsfucate. If you have spam issues get a better spam filter. Perhaps not satisfying and certainly no answer but in all honesty probably the best thing to do.Denunciatory
You cannot solve this without a script, your demand to avoid them makes no sense. I have a way to deal with this, but with a script.Orjonikidze
https://mcmap.net/q/152098/-hide-email-address-from-bots-keep-mailtoEvelyne
What is the problem with the solution that makes use of HTML entities that you mentioned?Zea
@Zea Because that id readable for a botEvelyne
Couldnt u use a picture with an onclick that decrypts a base64 ?Sanguinary
I don't have a screen reader to test; but there is no reason why CSS cannot be used to reorder chars. For example raw source doesn't have an email address domain, but this is injected with CSS:content. A bot catching 'myname@' from your page will probably then mail itself.Steffen
I've used this in the past and have had success when comparing the amount I spam I receive from sites that do not use that method and ones that do.Advertent
T
44

The issue with your request is specifically the "Supporting screen-readers", as by definition screen readers are a "bot" of some sort. If a screen-reader needs to be able to interpret the email address, then a page-crawler would be able to interpret it as well.

Also, the point of the mailto attribute is to be the standard of how to do email addresses on the web. Asking if there is a second way to do that is sort of asking if there is a second standard.

Doing it through scripts will still have the same issue as once the page is loaded, the script would have been run and the email address rendered in the DOM (unless you populate the email address on click or something). Either way, screen readers will still have issues with this since it's not already loaded.

Honestly, just get an email service with a half decent spam filter and specify a default subject line that is easy for you to sort in your inbox.

<a href="mailto:[email protected]?subject=Something to filter on">Email me</a>

What you're asking for is if the standard has two ways to do something, one for bots and the other for non-bots. The answer is it doesn't, and you have to just fight the bots as best you can.

Trod answered 25/12, 2016 at 7:55 Comment(7)
It sucks fighting the robots but one day we will win the war... or go extinctTrod
Sorry but when robots work then it find all strings with @, and next split this text by ? and check if first part is matching with regex. Finally save 2 versionAntietam
Not sure what you meant. My point of adding the subject line is in hopes that the end user doesn't change it. That way you could create an email filter to put all those subject specific emails into a specific folder. The point wasn't to prevent bots but instead to aid email rules. As I said before, preventing bots is a never ending battle.Trod
Thank you for the coherent and detailed answer. I think you raise a valid point when you say that screen-readers are essentially bots; however, the idea is to keep the email concealed until the user takes some sort of action like press a button to open a modal. Once the user interaction has occurred then the email address is revealed. If this can be achieved without the use of scripts, then that would be the answer to my question. I am giving this answer a +1 because the method where you add a subject line to the emails and then filter the messages based on it is pure genius.Campstool
Also, regarding the mailto: attribute, while it may be true that the point of having it is to standardize the way email addresses are presented on the web - I don't know for sure if that's the case - I care more about it because it's easy to use, you click the email adress. and a new message is open in your email client and all you have to do is type and send. Finally, the "standard" often takes time to catch up with realty and not the other way around.Campstool
"the idea is to keep the email concealed until the user takes some sort of action like press a button to open a modal". If that user action doesn't change the DOM (which is what screen readers would actually read) then the screen reader is still going to be able to see it. The best way to change the DOM due to user action is through JavaScript. Webpages are loosely MVC patterns, where HTML, CSS and JavaScript are analogous to Model, View and Controller respsectively. This means that any modifications really go through JavaScript, and to do it elsewhere is a bit difficult to impossible.Trod
Also, I'm fairly sure the mailto attribute was specifically created to produce hyperlinks that would activate email messages, you can read it in the RFC referenced off the wikipedia page: en.wikipedia.org/wiki/MailtoTrod
U
43

Defeating email bots is a tough one. You may want to check out the Email Address Harvesting countermeasures section on Wikipedia.

My back-story is that I've written a search bot. It crawled 105,000+ URLs during it's initial run many years ago. From what I've learned from doing that is that web crawling bots literally see EVERYTHING that is text, which appears on a web page. Bots read everything except images.

Spam can't be easily stopped via code for these reasons:

  1. CSS & JS are irrelevant when using the mailto: tag. Bots specifically look at HTML pages for that "mailto:" keyword. Everything from that colon to the next single quote or double quote (whichever comes first) is seen as an email address. HTML entity email addresses - like the example above - can be quickly translated using a reverse ASCII method/function. Running the JavaScript code snippet above, quickly turns the string which starts with: &#121;&#111;&#117;&#114;... into... [email protected]. (My search bot threw away hrefs with mailto:email addresses, as I wanted URLs for web pages & not email addresses.)

  2. If a page crashes a bot, the bot author will tune the bot to fix the crash with that page in mind, so that the bot won't crash at that page again in the future. Thus making their bot smarter.

  3. Bot authors can write bots, which generate all known variations of email addresses... without crawling pages & never using any starter email addresses. While it may not be feasible to do that, it's not inconceivable with today's high-core count CPUs (which are hyper-threaded & run at 4+ GHz), plus the availability of using distributed cloud-based computing & even super computers. It's conceivable that someone can now create a bot-farm to spam everyone, without knowing anyone's email address. 20 years ago, that would have been incomprehensible.

  4. Free email providers have had a history of selling their free user accounts to their advertisers. In the past, simply signing up for a free email account automatically guaranteed them a green light to start delivering spam to that email address... without ever using that email address online. I've seen that happen multiple times, with famous company names. (I won't mention any names.)

  5. The mailto: keyword is part of this IETF RFC, where browsers are built to automatically launch the default email clients, from links with that keyword in them. JavaScript has to be used to interrupt that application launching process, when it happens.

I don't think it's possible to stop 100% of spam while using traditional email servers, without using filters on the email server and possibly using images.

There is one alternative... You can also build a chat-like email client, which runs internally on a website. It would be like Facebook's chat client. It's "kind of like email", but not really email. It's simply 1-to-1 instant messaging with an archiving feature... that auto-loads upon login. Since it has document attachment + link features, it works kind of like email... but without the spam. As long as you don't build an externally accessible API, then it's a closed system where people can't send spam into it.

If you're planning to stick with strictly traditional email, then your best bet may be to run something like Apache's SpamAssassin on a company's email server.

You can also try combining multiple strategies as you've listed above, to make it harder for email harvesters to glean email addresses from your web pages. They won't stop 100% of the spam, 100% of the time... while also allowing 100% of the screen readers to work for blind visitors.

You've created a really good starting look at what's wrong with traditional email! Kudos to you for that!

A good screen reader is JAWS from Freedom Scientific. I've used that before to listen to how my webpages are read by blind users. (If you hear a male voice reading both actions [like clicking on a link] & text, try changing 1 voice to female so that 1 voice reads actions & another reads text. That makes it easier to hear how the web page is read for the visually impared.)

Good luck with your Email Address Harvesting countermeasure endeavours!

Urinary answered 27/12, 2016 at 9:16 Comment(2)
Thank you very much for the very thorough answer. You have shared a wealth of information. The information helps further hone-in on the issue and may lead to eventually figuring out a way to fix it.Campstool
You're welcome! It's been a pleasure trying to help you with the additional experiential insight. I appreciate the award. It was a surprise. Thank you for it!Urinary
Z
40

Here is an approach that does make use of JavaScript, but with a rather small foot-print. It's also very "ghetto", and generally I would not recommend an approach with inline JS in the HTML except you have an extreme reluctance to use JS, at all.

<a
  href="#"
  data-contact="bGUtZW1haWxAdGhlLWRvbWFpbi5jb20="
  data-subj="QW4gQW1hemluZyBTdWJqZWN0"
  onfocus="this.href = 'mailto:' + atob(this.dataset.contact) + '?subject=' + atob(this.dataset.subj || '')"
  >
  Send an email
</a>

data-contact is the base64 encoded email address. And, data-subj is an optional base64 encoded subject.

The main challenge with doing this without JS is that CSS can't alter HTML attributes. (The article you linked is a "pie-in-the-sky" musing and does not have any bearing on what is possible today or in the near future.)

The HTML entities approach you mentioned, or some variation of it, is likely the simplest option that will have some efficacy. Additionally, the iframe approach is clever and the server redirect approach is pretty awesome. But, all three are vulnerable to bots:

  • The HTML entities just need to be converted (and detecting that is simple)
  • The document referenced by the iframe might simply be followed
  • The server redirect might simply be followed, as well

With the approach outlined above, the use of a base64 encoded email address in a data-contact attribute is very "one-off" – as long as the scraper is not specifically designed for your site, it should work.

Zea answered 31/12, 2016 at 12:6 Comment(1)
I like this. If they don't have js enabled they can go mail someone else.Pannikin
S
29

Simple + Lot of @ + Editable without tools

<a href="mailto:user@domain@@com"
   onmouseover="this.href=this.href.replace('@@','.')">
   Send email
</a>
Sumach answered 3/1, 2017 at 0:38 Comment(3)
I like this, neat little snippet!Ontario
love this little thing, @AndyHolmes i used the onclick="..." for this, works on mobile (tested on android / mobile chrome ) too, dunno if it gets more useless that way, since bots probably check for a onclick more than a onmouseover.Shoshone
@Shoshone onclick would work on mobile, onmouseover won't as mobiles don't have a hover stateKhajeh
M
10

Have you considered using google's recaptcha mailhide? https://www.google.com/recaptcha/admin#mailhide

The idea is that when a user clicks the checkbox (see nocaptcha below), the full e-mail address is displayed.

While recaptcha is traditionally not only hard for screen readers but also humans as well, with the roleout of google's nocaptcha recaptcha which you can read about here as they relate to accessibility tests. It appears to show promise with to screen readers as it renders as a traditional checkbox from their view. Nocaptcha reCAPTCHA

Example #1 - Not secure but for easy illustration of the idea

Here is some code as an example without using mailhide but implementing something using recaptcha yourself: https://jsfiddle.net/43fad8pf/36/

<div class="container">
    <div id="recaptcha"></div>
</div>
<div id="email">
    Verify captcha to get e-mail
</div>

function createRecaptcha() {
    grecaptcha.render("recaptcha", {sitekey: "6LcgSAMTAAAAACc2C7rc6HB9ZmEX4SyB0bbAJvTG", theme: "light", callback: showEmail});
}
 createRecaptcha();

function showEmail() {
    // ideally you would do server side verification of the captcha and then the server would return the e-mail
  document.getElementById("email").innerHTML = "[email protected]";
}

Note: In my example I have the e-mail in a JavaScript function. Ideally you would have the recaptcha validated on the server end, and return the e-mail, otherwise the bot can simply get it in the code.

Example #2 - Server side validation and returning of e-mail

If we use an example more like this, we get additional security: https://designracy.com/recaptcha-using-ajax-php-and-jquery/

function showEmail() {
    /* Check if the captcha is complete */
    if ($("#g-recaptcha-response").val()) {
        $.ajax({
            type: ‘POST’,
            url: "verify.php", // The file we’re making the request to
            dataType: ‘html’,
            async: true,
            data: {
                captchaResponse: $("#g-recaptcha-response").val() // The generated response from the widget sent as a POST parameter
        },
        success: function (data) {
            alert("everything looks ok. Here is where we would take 'data' which contains the e-mail and put it somewhere in the document");
        },
        error: function (XMLHttpRequest, textStatus, errorThrown) {
            alert("You’re a bot");
        }
    });
} else {
    alert("Please fill the captcha!");
}
});

Where verify.php is:

$captcha = filter_input(INPUT_POST, ‘captchaResponse’); // get the captchaResponse parameter sent from our ajax

/* Check if captcha is filled */
if (!$captcha) {
    http_response_code(401); // Return error code if there is no captcha
}
$response =     file_get_contents("https://www.google.com/recaptcha/api/siteverify?secret=YOUR-SECRET-KEY-HERE&amp;amp;response=" . $captcha);
if ($response . success == false) {
echo ‘SPAM’;
http_response_code(401); // It’s SPAM! RETURN SOME KIND OF ERROR
} else {
// Everything is ok, should output this in json or something better, but this is an example
    echo '[email protected]';
}
Motta answered 3/1, 2017 at 4:39 Comment(1)
If you're having a bad feeling about using a google product, you can have more or less the same functionality with hCaptcha.Empty
D
4

People who write scrapers want to make their scrapers as efficient as possible. Therefore, they won't download styles, scripts, and other external resources. There's no method that I know of to set a mailto link using CSS. In addition, you specifically said you didn't want to set the link using Javascript.

If you think about what other types of resources there are, there's also external documents (i.e. HTML documents using iframes). Almost no scrapers would bother downloading the contents of iframes. Therefore, you can simply do:

index.html:

<iframe src="frame.html" style="height: 1em; width: 100%; border: 0;"></iframe>

frame.html:

My email is <a href="mailto:[email protected]" target="_top">[email protected]</a>

To human users, the iframe looks just like normal text. Iframes are inline and transparent by default, so we just need set its border and dimensions. You can't make the size of the iframe match its content's size without using Javascript, so the best we can do is giving it predefined dimensions.

Detest answered 27/12, 2016 at 7:1 Comment(1)
I agree with you on your 1st paragraph, but you're 2nd paragraph about iframe contents is incorrect. Bots want as much HTML content as possible. They will download the contents of iframes, as they are looking for links, text, etc.... Bots don't care if it's an iframe tag or not. They will simply crawl pages. If the URL is in the src section of an iframe or a javascript tag, it will be crawled.Urinary
H
3

First, I don't think doing anything with CSS will work. All bots (except Google's crawler) simply ignore all styling on websites. Any solution has to work with JS or server-side.

A server-side solution could be making an <a> that links to a new tab, which simply redirects to the desired mailto:

That's all my ideas for now. Hope it helps.

Hallo answered 29/12, 2016 at 0:36 Comment(2)
While when I tested this about a year ago, all major browsers supported it, I could see handling mailto: as a location in a 302 redirect going away for "security" reasons, much as you already can't have file:s anymore. (That being said, we use this redirect as a fallback when javascript is disabled.)Mouldy
That's true. Good thinkingHallo
H
2

Short answer to fulfill all your requirements is that it's impossible

Some of the script-based options answered here may work for certain bots, but you wanted no-script, so, no, you can't.

Hospodar answered 2/1, 2017 at 15:4 Comment(1)
They could use some sort of encryption on the email, and decrypt it dynamically In JavaScript. Even a simple +1 cypher would do the trick. TheorericLly breakable but no bot would break it.Sori
F
2

based on the code of MaanooAk, here is my version:

<a href="mailto: Mike Myers"
onclick="this.href=this.href.replace(' Mike ','MikeMy'); this.href=this.href.replace('Myers','[email protected]')">&#9993; Send Email</a>

The difference to MaanookAks version is, that on hover you don't see mailto: and a broken email adress but mailto: and the name of contact. And when you click on it, the name is replaced by the email adress.

In the code the email adress is splitted into two parts. Nowhere in the code the email adress is visible complete.

Florenceflorencia answered 21/5, 2021 at 8:12 Comment(2)
Someone suggested me that I should change in the code ' Mike ' into '%20Mike%20'. But in my browser this works only, if I also change "mailto: Mike Myers" into "mailto:%20Mike%20Myers". I don't know if it is really neccessary to change all white spaces here into %20.Florenceflorencia
Unfortunately some browsers execute the href before the onclick, SO THIS IS NOT A GOOD SOLUTION. A better solution is to put both, the correction of the email adress and the mailto:, in the right order into a function, which is called for example by "onclick". I will post this solution.Florenceflorencia
Q
0

PHP solution

function printEmail($email){
    $email = '<a href="mailto:'.$email.'">'.$email.'</a>';
    $a = str_split($email);
    return "<script>document.write('".implode("'+'",$a)."');</script>";
}

Use

echo printEmail('[email protected]');

Result

<script>document.write('<'+'a'+' '+'h'+'r'+'e'+'f'+'='+'"'+'m'+'a'+'i'+'l'+'t'+'o'+':'+'t'+'e'+'s'+'t'+'@'+'g'+'m'+'a'+'i'+'l'+'.'+'c'+'o'+'m'+'"'+'>'+'t'+'e'+'s'+'t'+'@'+'g'+'m'+'a'+'i'+'l'+'.'+'c'+'o'+'m'+'<'+'/'+'a'+'>');</script>

P.S. Requirement: user must have JavaScript enabled

Qatar answered 7/8, 2020 at 6:7 Comment(0)
F
0

Here is my new solution for this. I first build the email adress string by addition of small pieces and then use this string also as title:

adress = 'mailt' + 'o:MikeM' + 'yers@v' + 'wx.yz';
document.getElementsByClassName('Email')[0].title = adress;
function mail(){window.location.href = adress;}
<a class='Email' onclick='mail()'>&#9993; Send Email</a>

I use this in a footer of a website. Many pages with all the same footer.

Florenceflorencia answered 12/6, 2021 at 15:18 Comment(0)
E
-4

The one method I found effective is using it with CSS like below:

<a href="mailto:[email protected]">myemail@<span style="display:none;">ignore-</span>example.com

and then write a JavaScript to remove the ignoreme- word from the href="mailto:..." attribute with regex. This will hide email from bot as it will append ignore- word before real domain and this will work on screen reader and when user clicks on the link custom JS function will remove the ignore- word from href attribute so it will open the real email.

This method has been working very effectively for me till date. you can read more on this - http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfuscate-e-mail-addresses-compared/

Eustache answered 19/1, 2017 at 7:22 Comment(2)
Sorry but this method is not a good one because most if not all good bots look within the anchor text and the a href. Using display:none isn't going to make the cut.Load
That method is akin to the Ostrich EffectCrider

© 2022 - 2024 — McMap. All rights reserved.