There are three RFCs that lay down the foundation for the "Internet Message Format".
- RFC 822
- RFC 2822 (Supersedes RFC 822)
- RFC 5322 (Supersedes RFC 2822)
The RFC 5322, however, defines the e-mail IDs and their naming structure in the most technical manner. That is more suitable laying down the foundation an Internet Standard that liberal enough to allow all the use-cases yet, conservative enough to bind it in some formalism.
However, the e-mail validation requirement from the software developer community, has the following needs -
- to stave off unwanted spammers
- to ensure the user does not make inadvertent mistake
- to ensure that the e-mail ID belongs to the actual person inputting it
They are not exactly interested in implementing a technically all-encompassing definition that allows all the forms (IP addresses, including port IDs and all) of e-mail id. The solution suitable for their use-case is expected to solely ensure that all the legitimate e-mail holders should be able to get through. The definition of "legitimate" differs vastly from technical stand-point (RFC 5322 way) to usability stand-point(this solution). The usability aspect of the validation aims to ensure that all the e-mail IDs validated by the validation mechanism belong to actual people, using them for their communication purposes. This, thus introduces another angle to the validation process, ensuring an actually "in-use" e-mail ID, a requirement for which RFC-5322 definition is clearly not sufficient.
Thus, on practical grounds, the actual requirements boil down to this -
- To ensure some very basic validation checks
- To ensure that the inputted e-mail is in use
Second requirement typically involves, sending a standard response seeking e-mail to the inputted e-mail ID and authenticating the user based on the action delineated in the response mechanism. This is the most widely used mechanism to ensure the second requirement of validating an "in use" e-mail ID. This does involve round-tripping from the back-end server implementation and is not a straight-forward single-screen implementaion, however, one cannot do away with this.
The first requirement, stems from the need that the developers do not want totally "non e-mail like" strings to pass as an e-mail. This typically involves blanks, strings without "@" sign or without a domain name. Given the punycode representations of the domain names, if one needs to enable domain validation, they need to engage in full-fledged implementation that ensures a valid domain name. Thus, given the basic nature of requirement in this regard, validating for "<something>@<something>.<something>" is the only apt way of satisfying the requirement.
A typical regex that can satisfy this requirement is:
^[^@\s]+@[^@\s.]+.[^@\s.]+$
The above regex, follows the standard Perl regular-expression standard, widely followed by majority of the programming languages. The validation statement is:
<anything except whitespaces and "@" sign>@<anything except whitespaces and "@" sign>.<anything except whitespaces, @ sign and dot>
For those who want to go one step deeper into the more relevant implementations, they can follow the following validation methodology.
<e-mail local part>@<domain name>
For <e-mail local part> - Follow the guidelines by the "Universal Acceptance Steering Group" - UASG-026
For <domain name>, you can follow any domain validation methodology using standard libraries, depending on your programming language. For the recent studies on the subject, follow the document UASG-018A.
Those who are interested to know the overall process, challenges and issues one may come across while implementing the Internationalized Email Solution, they can also go through the following RFCs:
RFC 6530 (Overview and Framework for Internationalized Email)
RFC 6531 (SMTP Extension for Internationalized Email)
RFC 6532 (Internationalized Email Headers)
RFC 6533 (Internationalized Delivery Status and Disposition Notifications)
RFC 6855 (IMAP Support for UTF-8)
RFC 6856 (Post Office Protocol Version 3 (POP3) Support for UTF-8)
RFC 6857 (Post-Delivery Message Downgrading for Internationalized Email Messages)
RFC 6858 (Simplified POP and IMAP Downgrading for Internationalized Email).
validateEmail
would be corret, as well as passing$email
, not$EMAIL
. – Whitman[-a-zA-Z0-9_.+]
(if it's not at the beginning the "-" is interpreted as range). – Luggage