Is FILTER_SANITIZE_EMAIL pointless if already using FILTER_VALIDATE_EMAIL?
Asked Answered
H

4

40

I am just creating a registration form, and I am looking only to insert valid and safe emails into the database.

Several sites (including w3schools) recommend running FILTER_SANITIZE_EMAIL before running FILTER_VALIDATE_EMAIL to be safe; however, this could change the submitted email from an invalid into a valid email, which could not be what the user wanted, for example:

The user has the email address [email protected], but accidentally inserts jeff"@gmail.com.

FILTER_SANITIZE_EMAIL would remove the " making the email [email protected] which FILTER_VALIDATE_EMAIL would say is valid even though it's not the user's actual email address.

To avoid this problem, I plan only to run FILTER_VALIDATE_EMAIL. (assuming I don't intend to output/process any emails declared invalid)

This will tell me whether or not the email is valid. If it is then there should be no need to pass it through FILTER_SANITIZE_EMAIL because any illegal/unsafe characters, would've already caused the email to be returned invalid, correct?

I also don't know of any email approved as valid by FILTER_VALIDATE_EMAIL that could be used for injection/xss due to the fact that white spaces, parentheses () and semicolons would invalidate the email. Or am I wrong?

(note: I will be using prepared statements to insert the data in addition to this, I just wanted to clear this up)

Hix answered 3/9, 2011 at 1:52 Comment(4)
So, all of the answers below has actually proven FILTER_SANITIZE_EMAIL is indeed pointless, because we can just use FILTER_VALIDATE_EMAIL alone.Conciliar
Yes it is because it is testing to see if the string matches that of an email type of sting. It returns false on fail so any risk is neutralized at the point of attack.Stoller
I'd say the greatest contention with this question is characterizing the Sanitization filter as "pointless" in general (see question title). It is indeed useful, even necessary, to have this filter. The question would be better phrased to ask if it is necessary to run the Sanitization filter on a string that currently already passes the Validation filter. And remove language characterizing the filter itself is "useless". For clarity, I arrived here because I had the same curiosity. I am keeping both JIC. Also, if the validation routine ever changes, I don't want the backend be compromised.Imprimis
@Imprimis international email addresses maybe? I'm pretty sure those would fail with FILTER_VALIDATE_EMAIL, but if you sanitized the address string to check first, then the validation check should pass. But I guess that might lead to false positives since you are technically now validating something different from the original string.Hawes
O
38

Here's how to insert only valid emails.

<?php
$original_email = 'jeff"@gmail.com';

$clean_email = filter_var($original_email,FILTER_SANITIZE_EMAIL);

if ($original_email == $clean_email && filter_var($original_email,FILTER_VALIDATE_EMAIL)){
   // now you know the original email was safe to insert.
   // insert into database code go here. 
}

FILTER_VALIDATE_EMAIL and FILTER_SANITIZE_EMAIL are both valuable functions and have different uses.

Validation is testing if the email is a valid format. Sanitizing is to clean the bad characters out of the email.

<?php
$email = "[email protected]"; 
$clean_email = "";

if (filter_var($email,FILTER_VALIDATE_EMAIL)){
    $clean_email =  filter_var($email,FILTER_SANITIZE_EMAIL);
} 

// another implementation by request. Which is the way I would suggest
// using the filters. Clean the content and then make sure it's valid 
// before you use it. 

$email = "[email protected]"; 
$clean_email = filter_var($email,FILTER_SANITIZE_EMAIL);

if (filter_var($clean_email,FILTER_VALIDATE_EMAIL)){
    // email is valid and ready for use
} else {
    // email is invalid and should be rejected
}

PHP is open source, so these questions are easily answered by just using it.

Source for FILTER_SANITIZE_EMAIL:

/* {{{ php_filter_email */
#define SAFE        "$-_.+"
#define EXTRA       "!*'(),"
#define NATIONAL    "{}|\\^~[]`"
#define PUNCTUATION "<>#%\""
#define RESERVED    ";/?:@&="

void php_filter_email(PHP_INPUT_FILTER_PARAM_DECL)
{
    /* Check section 6 of rfc 822 http://www.faqs.org/rfcs/rfc822.html */
    const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_`{|}~@.[]";
    filter_map     map;

    filter_map_init(&map);
    filter_map_update(&map, 1, allowed_list);
    filter_map_apply(value, &map);
}    

Source for FILTER_VALIDATE_EMAIL:

void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */
{
const char regexp[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD";

pcre       *re = NULL;
pcre_extra *pcre_extra = NULL;
int preg_options = 0;
int         ovector[150]; /* Needs to be a multiple of 3 */
int         matches;


/* The maximum length of an e-mail address is 320 octets, per RFC 2821. */
if (Z_STRLEN_P(value) > 320) {
    RETURN_VALIDATION_FAILED
}

re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC);
if (!re) {
    RETURN_VALIDATION_FAILED
}
matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3);

/* 0 means that the vector is too small to hold all the captured substring offsets */
if (matches < 0) {
    RETURN_VALIDATION_FAILED
}

}
Overscrupulous answered 13/5, 2013 at 19:22 Comment(6)
ok, fair enough @Izkata. Answer updated so he can insert only valid emails.Overscrupulous
Can you please show an example that clearly shows how sanitizing an email would pass-through along with a validation? It seems your first block makes the email pass-through ONLY IF it can't be sanitized, or did I miss something obvious? And if this is so, why sanitize in the first place? Looks redundant.Parisi
ok, added an example that clearly shows how sanitizing should be used. In the first example I was focused on what they return vs how they should be used. Thanks for the feedback and suggestion.Overscrupulous
What's the point of altering address entered by user with filter_var($email,FILTER_SANITIZE_EMAIL)? If provided e-mail address is invalid, it should be rejected, not blindly "fixed" by sanitization and validated afterwards.Hew
Read the code, it only inserts emails that are valid and match the original email address.Overscrupulous
This is interesting. Thank you for your detailed examples. I guess, though, that this does still beg the question: is there ever an example email that would pass validation, yet contain unsafe characters, such that it would not pass through sanitization unchanged? I think that is what OP is asking. To be fair, I came here because, though I am using both in my code, I had the same curiosity. BTW that validation regex (thanks for including it) is not trivial to decipher, and lends little insight to determining the answer to this curiosity. Programming is full of collectible gotchas,Imprimis
N
5

I read the same article and thought the same thing: Simply changing an invalid variable is not good enough. We need to actually tell the user that there was a problem, instead of just ignoring it. The solution, I think, is to compare the original to the sanitized version. I.e. to use the w3schools example, just add:

$cleanfield = filter_var($field, FILTER_SANITIZE_EMAIL);

if ($cleanfield != $field){
    return FALSE;
}
Nels answered 7/3, 2013 at 13:22 Comment(0)
K
3

The "proper" way of doing this is asking for the user's email two times (which is common/good practice). But to answer your question, FILTER_SANITIZE_EMAIL is not pointless. It's a filter that sanitizes emails and it does its job well.

You need to understand that a filter that validates either returns true or false whereas a filter that sanitizes actually modifies the given variable. The two do not serve the same purpose.

Kurth answered 3/9, 2011 at 2:3 Comment(3)
The two do not serve the same purpose. Which is exactly what was stated in part of the question. This does not address what is being asked.Gregoriagregorian
filter_var with a validate filter such as FILTER_VALIDATE_EMAIL does not return true or false.Ulcerative
This is misleading. VALIDATE filters don't all return booleans. The number-ones return numbers, for instance.Eyeless
H
1

always use validation filters early at moment of input, while sanitization is better used late in output as it clean the value before it reach the user

Hippo answered 22/11, 2022 at 4:41 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Graphomotor

© 2022 - 2024 — McMap. All rights reserved.