Code Golf: Email Address Validation without Regular Expressions
Asked Answered
C

15

17

(Edit: What is Code Golf: Code Golf are challenges to solve a specific problem with the shortest amount of code by character count in whichever language you prefer. More info here on Meta StackOverflow. )

Code Golfers, here's a challenge on string operations.

Email Address Validation, but without regular expressions (or similar parsing library) of course. It's not so much about the email addresses but how short you can write the different string operations and constraints given below.

The rules are the following (yes, I know, this is not RFC compliant, but these are going to be the 5 rules for this challenge):

  • At least 1 character out of this group before the @:

    A-Z, a-z, 0-9, . (period), _ (underscore)
    
  • @ has to exist, exactly one time

    [email protected]
        ^
    
  • Period (.) has to exist exactly one time after the @

    [email protected]
              ^
    
  • At least 1 only [A-Z, a-z] character between @ and the following . (period)

    [email protected]
         ^
    
  • At least 2 only [A-Z, a-z] characters after the final . period

    [email protected]
               ^^
    

Please post the method/function only, which would take a string (proposed email address) and then return a Boolean result (true/false) depending on the email address being valid (true) or invalid (false).

Samples:
[email protected]    (valid/true)          @w.org     (invalid/false)    
b@[email protected]  (invalid/false)       test@org   (invalid/false)    
test@%.org (invalid/false)       s%[email protected]  (invalid/false)    
[email protected] (invalid/false)       [email protected]  (valid/true)
[email protected]  (valid/true)          foo@a%.com (invalid/false)

Good luck!

Creath answered 7/9, 2009 at 17:25 Comment(31)
@JaredPar: Challenges to write code with the shortest amount of characters possible (in whatever language your prefer). Browse the code-golf tag for more info and challenges if you like!Creath
I don't think email address is standardized by IEEE.Braunstein
@Jared: meta.stackexchange.com/questions/20736/…Lacker
@Mehrdad: Meant RFC ;) corrected.Creath
@Mehrdad, RFC 2822 tools.ietf.org/html/rfc2822#section-3.4.1Turtleneck
where you say "at least ... character(s)", are you intending that the presence of any other characters be invalid? if so, you should say so.Look
Well that's rather strange. I've never seen this before.Alberich
and where and what is the input/output? recommend you say "write a function that takes a single parameter and returns true or false (in a way natural to the language you use)"Look
@ysth: "At least 1" would mean one or more. The characters allowed are specified (e.g. A-Z, a-z) in each part of the rules. Other characters are invalid. To your second question: Yes, just the function that takes a string input and returns Boolean output. I will add that info to the question.Creath
Are we allowed to write one or more support functions?Blackheart
Can we have a convention like 0 - False, 1 - True for, say, C? Please? :)Postlude
@Alex: you might want to replace your parenthetical comment at the top (defining code golf) with a link to the meta.SO question created by yshuditelu.Burse
@strager: Yes, you can write your own support functions. Try to keep the character count as low as you can! @Michael: You can return 0 for false and 1 for true of course.Creath
When you say "At least 1 [A-Z, a-z] character between @ and the following . (period)" does that mean foo@a%.com is valid? Or must all characters between the @ and . be alpha-nums?Exhilarative
Why are people voting to close this?Blackheart
@jeffamaphone: foo@a%.com would be invalid. Only A-Z and a-z is allowed in the 'between @ and final period' rule. I will add your example to the samples section.Creath
@strager, because brainfuck is not allowed by tags :)Turtleneck
@Kirill: Brainf*ck would be allowed but I don't think anybody would be able to solve it using it. If anyone does, I'll gladly take that person out for a beer :DCreath
Code Crazy golf would be fun. This is more like Code TrickshotReinold
Too many [code-golf]s of late. If this continues I will relucatnatly join Pax and start voting to close.Contribution
Only the rules above apply, so chris@localhost would be invalid. This doesn't try to be 100% RFC compliant or include all cases of allowable email addresses. The challenge is more about string operations and constraints.Creath
dmckee: What, is stackoverflow running out of question numbers?Seeseebeck
@caf: Like best [joke|comic|...] and similar questions these lie outside the remit of SO. That is not a problem as long as they are rare. Indeed, they serve as diversions and provide a sense of community. But if they grow too common they will give new-comers the wrong impression about the culture and purpose of the site; they will dive the appearance of a lot of drivel. Which is a shame, because I like code golf, enjoy playing with some of the problems that come up, and am quite proud of some on my entries.Contribution
@dmckee - While I'm a fan of code golf, I'm inclined to agree that we're seeing a deluge of golf questions recently.Cluster
If you want to discuss [code-golf] questions, do so on meta.stackoverflow.comSilviasilviculture
[code-golf] questions are being discussed on meta.stackexchange.com/questions/20912/so-weekly-code-golfSilviasilviculture
I stay up an extra hour to finish an answer for this question. Only to discover it had been closed just over a half hour ago.Silviasilviculture
3 more people needed for reopening. What is amazing though, is that those 5 gentlemen that voted to close it classified it as not a real question, but still participated (at least in comments). And, what else is this, if not a question. A statement of fact? Unbelievable.Creath
Even better, the "not a real question" closers have really great questions themselves. Steven A. Lowe: "What is the fascination with code metrics", Mehrdad: "Computer science undergraduate project ideas"... nice.Creath
I have a 152 char Python solution ready. Just waiting for the fifth reopen-vote.Detour
Why is [email protected] considered invalid? Many domain names contain two dots. e.g. [email protected]Eardrop
B
20

C89 (166 characters)

#define B(c)isalnum(c)|c==46|c==95
#define C(x)if(!v|*i++-x)return!1;
#define D(x)for(v=0;x(*i);++i)++v;
v;e(char*i){D(B)C(64)D(isalpha)C(46)D(isalpha)return!*i&v>1;}

Not re-entrant, but can be run multiple times. Test bed:

#include<stdio.h>
#include<assert.h>
main(){
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(!e("b@[email protected]"));
    assert(!e("test@%.org"));
    assert(!e("[email protected]"));
    assert(!e("@w.org"));
    assert(!e("test@org"));
    assert(!e("s%[email protected]"));
    assert(!e("foo@a%.com"));
    puts("success!");
}
Blackheart answered 7/9, 2009 at 17:25 Comment(12)
+1, love the nested macros and declaring a global variable without a type!Institutive
I just have to ask, did you come up with this 100% by yourself, or did you have any clues from somewhere else? Not suggesting that you weren't capable of coming up with it yourself :) I'm just really amazed by the shortness of your solution.Creath
@Alex, Completely self-made. If you expand the macros, it's pretty straight forward. A returns 1 if c is a letter. B returns 1 if c is a letter, a digit, or _ or .. D iterates to the next character not matching x (A or B), while counting characters. C returns from the function with 0 if no characters were iterated over (!v) or if the current character is not x (@ or .). The final return is 1 if the full string has been parsed and the count is not 1.Blackheart
I just realized there's a bug (foo@abc. passes). I'll fix it soon.Blackheart
Managed to save a character by fixing the bug. Cool! Also found another optimization, saving yet another character.Blackheart
Really this should be C89+ASCII, I'm pretty sure it'd fail on a C89 implementation that used EBCDIC ;)Seeseebeck
@Seeseebeck - Most C code golfs assume ASCII. I know the "C" locale is generally defined to be ASCII. I almost believe it's in the standard, but I don't know where it would be if it was. Gots to get me a copy of that sometime soon.Cluster
@strager: isalpha() is considerably shorter than c<91&c>64|c<123&c>96 and works without an #include. In fact, then you would just need #define A isalpha. Hmm... Come to think of it, it would be the same length to just use isalpha wherever you have A now, and leave the #define A out all together.Buroker
+1 for a really cool solution, by the way. I love the functional nature of this.Buroker
@caf, I assume ASCII in all my code-golf answers. =] On isalpha: this page (seems to be a good reference; bookmarking now...) shows <ctype.h> is required for isalpha.Blackheart
<ctype.h> declares isalpha. That doesn't mean it's required. Without the prototype, the compiler will assume it returns an int, which it does, and will not check the number or types of arguments, but that's okay.Buroker
@P Daddy, Oh, you're right... I thought for a second it took char*. I'll update my answer in a bit. Thanks!Blackheart
B
12

J

:[[/%^(:[[+-/^,&i|:[$[' ']^j+0__:k<3:]]
Buroker answered 7/9, 2009 at 17:25 Comment(2)
That's about the fifth J program I've seen that started with :[[ and ended with :]] - what gives?Cluster
It's extra sad at the beginning, but by the end it gets really happy.Buroker
I
6

C89, 175 characters.

#define G &&*((a+=t+1)-1)==
#define H (t=strspn(a,A
t;e(char*a){char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;return H))G 64&&H+12))G 46&&H+12))>1 G 0;}

I am using the standard library function strspn(), so I feel this answer isn't as "clean" as strager's answer which does without any library functions. (I also stole his idea of declaring a global variable without a type!)

One of the tricks here is that by putting . and _ at the start of the string A, it's possible to include or exclude them easily in a strspn() test: when you want to allow them, use strspn(something, A); when you don't, use strspn(something, A+12). Another is assuming that sizeof (short) == 2 * sizeof (char), and building up the array of valid characters 2 at a time from the "seed" pair Aa. The rest was just looking for a way to force subexpressions to look similar enough that they could be pulled out into #defined macros.

To make this code more "portable" (heh :-P) you can change the array-building code from

char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;

to

char*A="_.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

for a cost of 5 additional characters.

Institutive answered 7/9, 2009 at 17:25 Comment(4)
I think the #include<string.h> should be included. Otherwise, it's not portable. (Your short thing isn't portable either but at least you provide a cheap alternative.)Blackheart
size_t strspn(); is less characters than the #include and will do the job (and also doesn't require a newline).Seeseebeck
@Seeseebeck - On many platforms (and by "many" I mean "mine"), size_t is only defined in <stddef.h> but if you said to hell with portability you could maybe get away with letting it be implicitly declared as returning int since it's the same size on many (once again, "my") platforms.Cluster
@strager: Point taken, but I think that since most of us are assuming ASCII anyway, portability is already out the window. Surely if it compiles (and it does on at least MSVC++9 and Linux gcc 4.1.2), it's OK?Institutive
B
5

C (166 characters)

#define F(t,u)for(r=s;t=(*s-64?*s-46?isalpha(*s)?3:isdigit(*s)|*s==95?4:0:2:1);++s);if(s-r-1 u)return 0;
V(char*s){char*r;F(2<,<0)F(1=)F(3=,<0)F(2=)F(3=,<1)return 1;}

The single newline is required, and I've counted it as one character.

Buroker answered 7/9, 2009 at 17:25 Comment(2)
Nice! Calling a macro with fewer arguments than declared is interesting -- I find it compiles (with warnings) on MSVC++ but not on gcc 4.1.2. Any idea what is "officially" allowed in the language spec?Institutive
@j_random_hacker: I'm not sure what the spec says, but gcc doesn't like this code at all. Putting commas in those problematic macro calls (F(1=,) and F(2=,)) fixes the "macro 'F' requires 2 arguments, but only 1 given" error, but my version (3.4.6) still blows up with "syntax error before '=' token" and "syntax error before ')' token".Buroker
C
5

Python (181 characters including newlines)

def v(E):
 import string as t;a=t.ascii_letters;e=a+"1234567890_.";t=e,e,"@",e,".",a,a,a,a,a,"",a
 for c in E:
  if c in t[0]:t=t[2:]
  elif not c in t[1]:return 0>1
 return""==t[0]

Basically just a state machine using obfuscatingly short variable names.

Camarillo answered 7/9, 2009 at 17:25 Comment(5)
You can drop ~10 characters by making t into a flat list, and incrementing by two. t[s][1] becomes t[s+1] Also, the last return is one space too far.Fluidize
@ACoolie: Thanks! It actually appears to put my^H^Hour solution in the lead so far.Camarillo
Nevermind, it's only in the lead if I cheat on the count. Oh well.Camarillo
I golfed it a little further, by reording the list, changing it to a tuple, eliminating spaces, eliminating the list index, etc.Calender
It's possible to save two more spaces by changing the indentation inside the loop to tabs.Calender
D
4

Python, 149 chars (after putting the whole for loop into one semicolon-separated line, which I haven't done here for "readability" purposes):

def v(s,t=0,o=1):
 for c in s:
   k=c=="@"
   p=c=="."
   A=c.isalnum()|p|(c=="_")
   L=c.isalpha()
   o&=[A,k|A,L,L|p,L,L,L][t]
   t+=[1,k,1,p,1,1,0][t]
 return(t>5)&o

Test cases, borrowed from strager's answer:

assert v("[email protected]")
assert v("[email protected]")
assert v("[email protected]")
assert not v("b@[email protected]")
assert not v("test@%.org")
assert not v("[email protected]")
assert not v("@w.org")
assert not v("test@org")
assert not v("s%[email protected]")
assert not v("foo@a%.com")
print "Yeah!"

Explanation: When iterating over the string, two variables keep getting updated.

t keeps the current state:

  • t = 0: We're at the beginning.
  • t = 1: We where at the beginning and have found at least one legal character (letter, number, underscore, period)
  • t = 2: We have found the "@"
  • t = 3: We have found at least on legal character (i.e. letter) after the "@"
  • t = 4: We have found the period in the domain name
  • t = 5: We have found one legal character (letter) after the period
  • t = 6: We have found at least two legal characters after the period

o as in "okay" starts as 1, i.e. true, and is set to 0 as soon as a character is found that is illegal in the current state. Legal characters are:

  • In state 0: letter, number, underscore, period (change state to 1 in any case)
  • In state 1: letter, number, underscore, period, at-sign (change state to 2 if "@" is found)
  • In state 2: letter (change state to 3)
  • In state 3: letter, period (change state to 4 if period found)
  • In states 4 thru 6: letter (increment state when in 4 or 5)

When we have gone all the way through the string, we return whether t==6 (t>5 is one char less) and o is 1.

Detour answered 7/9, 2009 at 17:25 Comment(1)
Quite a bit shorter than the other Python solution here! +1.Institutive
G
2

C89 character set agnostic (262 characters)

#include <stdio.h>

/* the 'const ' qualifiers should be removed when */
/* counting characters: I don't like warnings :) */
/* also the 'int ' should not be counted. */

/* it needs only 2 spaces (after the returns), should be only 2 lines */
/* that's a total of 262 characters (1 newline, 2 spaces) */

/* code golf starts here */

#include<string.h>
int v(const char*e){
const char*s="0123456789._abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(e=strpbrk(e,s))
  if(e=strchr(e+1,'@'))
    if(!strchr(e+1,'@'))
      if(e=strpbrk(e+1,s+12))
        if(e=strchr(e+1,'.'))
          if(!strchr(e+1,'.'))
            if(strlen(e+1)>1)
              return 1;
return 0;
}

/* code golf ends here */

int main(void) {
  const char *t;
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "b@[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "test@%.org"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "@w.org"; printf("%s ==> %d\n", t, v(t));
  t = "test@org"; printf("%s ==> %d\n", t, v(t));
  t = "s%[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "foo@a%.com"; printf("%s ==> %d\n", t, v(t));

  return 0;
}

Version 2

Still C89 character set agnostic, bugs hopefully corrected (303 chars; 284 without the #include)

#include<string.h>
#define Y strchr
#define X{while(Y
v(char*e){char*s="0123456789_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(*e!='@')X(s,*e))e++;if(*e++=='@'&&!Y(e,'@')&&Y(e+1,'.'))X(s+12,*e))e++;if(*e++=='.'
&&!Y(e,'.')&&strlen(e)>1){while(*e&&Y(s+12,*e++));if(!*e)return 1;}}}return 0;}

That #define X is absolutely disgusting!

Test as for my first (buggy) version.

Golter answered 7/9, 2009 at 17:25 Comment(2)
Instead of the if chain, why not a, &&/|| chain? That should remove quite a number of characters.Blackheart
Seems we came up with the same idea of using suffixes of a single string as arguments to str...() functions... And actually I noticed a bug in my code after seeing yours!Institutive
H
2

Not the greatest solution no doubt, and pretty darn verbose, but it is valid.

Fixed (All test cases pass now)

    static bool ValidateEmail(string email)
{
    var numbers = "1234567890";
    var uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    var lowercase = uppercase.ToLower();
    var arUppercase = uppercase.ToCharArray();
    var arLowercase = lowercase.ToCharArray();
    var arNumbers = numbers.ToCharArray();
    var atPieces = email.Split(new string[] { "@"}, StringSplitOptions.RemoveEmptyEntries);
    if (atPieces.Length != 2)
        return false;
    foreach (var c in atPieces[0])
    {
        if (!(arNumbers.Contains(c) || arLowercase.Contains(c) || arUppercase.Contains(c) || c == '.' || c == '_'))
            return false;
    }
    if(!atPieces[1].Contains("."))
        return false;
    var dotPieces = atPieces[1].Split('.');
    if (dotPieces.Length != 2)
        return false;
    foreach (var c in dotPieces[0])
    {
        if (!(arLowercase.Contains(c) || arUppercase.Contains(c)))
            return false;
    }
    var found = 0;
    foreach (var c in dotPieces[1])
    {
        if ((arLowercase.Contains(c) || arUppercase.Contains(c)))
            found++;
        else
            return false;
    }
    return found >= 2;
}
Hautboy answered 7/9, 2009 at 17:25 Comment(4)
Maybe try to also post a compressed solution (single character variable names, least amount of white space etc.) so you can compete on the character count. Keep the longer one as well though, it's nice to see how you did it! +1Creath
Just noticed it fails 2 of the test cases! I'll be back with an update in a sec. :)Hautboy
What language is this? Also, you understand that the purpose of code golf is the smallest possible program? :)Calender
That would be C#. I didn't realize it was shortest solution, but I just did it out of a desire to see if I could. I added "code-golf" to my preferred tags after seeing this post. :)Hautboy
C
2

Whatever version of C++ MSVC2008 supports.

Here's my humble submission. Now I know why they told me never to do the things I did in here:

#define N return 0
#define I(x) &&*x!='.'&&*x!='_'
bool p(char*a) {
 if(!isalnum(a[0])I(a))N;
 char*p=a,*b=0,*c=0;
 for(int d=0,e=0;*p;p++){
  if(*p=='@'){d++;b=p;}
  else if(*p=='.'){if(d){e++;c=p;}}
  else if(!isalnum(*p)I(p))N;
  if (d>1||e>1)N;
 }
 if(b>c||b+1>=c||c+2>=p)N;
 return 1;
}
Cuddy answered 7/9, 2009 at 17:25 Comment(2)
Assumes a is properly NULL-terminated. <shrug/>Exhilarative
It's nice to provide a character count in your answers, as well as the language used.Blackheart
A
1

Haskell (GHC 6.8.2), 165 161 144C Characters


Using pattern matching, elem, span and all:

a=['A'..'Z']++['a'..'z']
e=f.span(`elem`"._0123456789"++a)
f(_:_,'@':d)=g$span(`elem`a)d
f _=False
g(_:_,'.':t@(_:_:_))=all(`elem`a)t
g _=False

The above was tested with the following code:

main :: IO ()
main = print $ and [
  e "[email protected]",
  e "[email protected]",
  e "[email protected]",
  not $ e "b@[email protected]",
  not $ e "test@%.org",
  not $ e "[email protected]",
  not $ e "@w.org",
  not $ e "test@org",
  not $ e "s%[email protected]",
  not $ e "foo@a%.com"
  ]
Ardenardency answered 7/9, 2009 at 17:25 Comment(0)
C
1

'Using no regex': PHP 47 Chars.

<?=filter_var($argv[1],FILTER_VALIDATE_EMAIL);
Cockeye answered 7/9, 2009 at 17:25 Comment(0)
F
1

Ruby, 225 chars. This is my first Ruby program, so it's probably not very Ruby-like :-)

def v z;r=!a=b=c=d=e=f=0;z.chars{|x|case x when'@';r||=b<1||!e;e=!1 when'.'
e ?b+=1:(a+=1;f=e);r||=a>1||(c<1&&!e)when'0'..'9';b+=1;r|=!e when'A'..'Z','a'..'z'
e ?b+=1:f ?c+=1:d+=1;else r=1 if x!='_'||!e|!b+=1;end};!r&&d>1 end
Franny answered 7/9, 2009 at 17:25 Comment(0)
I
1

Erlang 266 chars:

-module(cg_email).

-export([test/0]).

%%% golf code begin %%%
-define(E,when X>=$a,X=<$z;X>=$A,X=<$Z).
-define(I(Y,Z),Y([X|L])?E->Z(L);Y(_)->false).
-define(L(Y,Z),Y([X|L])?E;X>=$0,X=<$9;X=:=$.;X=:=$_->Z(L);Y(_)->false).
?L(e,m).
m([$@|L])->a(L);?L(m,m).
?I(a,i).
i([$.|L])->l(L);?I(i,i).
?I(l,c).
?I(c,g).
g([])->true;?I(g,g).
%%% golf code end %%%

test() ->
  true  = e("[email protected]"),
  false = e("b@[email protected]"),
  false = e("test@%.org"),
  false = e("[email protected]"),
  true  = e("[email protected]"),
  false = e("test@org"),
  false = e("s%[email protected]"),
  true  = e("[email protected]"),
  false = e("foo@a%.com"),
  ok.
Icelander answered 7/9, 2009 at 17:25 Comment(0)
O
1

VBA/VB6 - 484 chars

Explicit off
usage: VE("[email protected]")

Function V(S, C)
V = True
For I = 1 To Len(S)
 If InStr(C, Mid(S, I, 1)) = 0 Then
  V = False: Exit For
 End If
Next
End Function

Function VE(E)
VE = False
C1 = "abcdefghijklmnopqrstuvwxyzABCDEFGHILKLMNOPQRSTUVWXYZ"
C2 = "0123456789._"
P = Split(E, "@")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1 & C2) Then GoTo X
E = P(1): P = Split(E, ".")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1) Or Len(P(1)) < 2 Or Not V(P(1), C1) Then GoTo X
VE = True
X:
End Function
Outgo answered 7/9, 2009 at 17:25 Comment(0)
F
1

Java: 257 chars (not including the 3 end of lines for readability ;-)).

boolean q(char[]s){int a=0,b=0,c=0,d=0,e=0,f=0,g,y=-99;for(int i:s)
d=(g="@._0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm".indexOf(i))<0?
y:g<1&&++e>0&(b<1|++a>1)?y:g==1&e>0&(c<1||f++>0)?y:++b>0&g>12?f>0?d+1:f<1&e>0&&++c>0?
d:d:d;return d>1;}

Passes all the tests (my older version was incorrect).

Franny answered 7/9, 2009 at 17:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.