PHP trim and space not working
Asked Answered
S

5

15

I have some data imported from a csv. The import script grabs all email addresses in the csv and after validating them, imports them into a db.

A client has supplied this csv, and some of the emails seem to have a space at the end of the cell. No problem, trim that sucker off... nope, wont work.

The space seems to not be a space, and isn't being removed so is failing a bunch of the emails validation.

Question: Any way I can actually detect what this erroneous character is, and how I can remove it?

Not sure if its some funky encoding, or something else going on, but I dont fancy going through and removing them all manually! If I UTF-8 encode the string first it shows this character as a:

Â

Stave answered 18/8, 2013 at 14:4 Comment(2)
Have you tried var_dump(ord(substr($email, -1))); and then passing that character (using \xHEX syntax) to trim()?Caudle
there is such a thing called Ideographic Space alsoCorbicula
G
44

If that "space" is not affected by trim(), the first step is to identify it.

Use urlencode() on the string. Urlencode will percent-escape any non-printable and a lot of printable characters besides ASCII, so you will see the hexcode of the offending characters instantly. Depending on what you discover, you can act accordingly or update your question to get additional help.

Guimpe answered 18/8, 2013 at 14:56 Comment(4)
okay so its coming out as %A0 after a urlencode, which is an nbsp, which should get trimmed right? but isnt :(Stave
The non breaking space is not on the list of characters that get trimmed. If you want it removed, you have to add it to the list of characters yourself. See the documentation of trim: de1.php.net/trimGuimpe
I didn't know how to add the non-breaking space to the list of characters, I tried trim($value, urlencode("%A0")) and it workedRefrigeration
That should probably read urldecode("%A0"), as it is the inverse. PHP allows to directly add bytes to strings with the escape sequence \x, so this string would be "\xA0" here: trim($value, "\xA0"). And there is the chr() function accepting an integer value that does the same: trim($value, chr(0xA0). 0xA0 is the hex writing for an integer, much like 160 is it's decimal writing. Note that this simply adds the byte, and does not respect any charset encoding.Guimpe
R
11

Replace all UTF-8 spaces with standard spaces and then do the trim!

$string = preg_replace('/\s/u', ' ', $string);
echo trim($string)

This is it.

Richmound answered 19/5, 2022 at 11:51 Comment(0)
C
3

I had a similar problem, also loading emails from CSVs and having issues with "undetectable" whitespaces.

Resolved it by replacing the most common urlencoded whitespace chars with ''. This might help if can't use mb_detect_encoding() and/or iconv()

    $urlEncodedWhiteSpaceChars   = '%81,%7F,%C5%8D,%8D,%8F,%C2%90,%C2,%90,%9D,%C2%A0,%A0,%C2%AD,%AD,%08,%09,%0A,%0D';
    $temp = explode(',', $urlEncodedWhiteSpaceChars); // turn them into a temp array so we can loop accross
    $email_address  = urlencode($row['EMAIL_ADDRESS']);
        foreach($temp as $v){
            $email_address  =  str_replace($v, '', $email_address);     // replace the current char with nuffink
        }
        $email_address = urldecode($email_address); // undo the url_encode

Note that this does NOT strip the 'normal' space character and that it removes these whitespace chars from anywhere in the string - not just start or end.

Conceit answered 19/1, 2016 at 2:3 Comment(0)
G
1

In most of the cases a simple strip_tags($string) will work.

If the above doesn't work, then you should try to identify the characters resorting to urlencode() and then act accordingly.

Gerard answered 10/11, 2014 at 11:48 Comment(2)
strip_tags() will never remove whitespace.Guimpe
In this scenario Trim() is not working! So using strip_tags() solves this particularly strange problem.Gerard
U
0

I see couples of possible solutions

1) Get last char of string in PHP and check if it is a normal character (with regexp for example). If it is not a normal character, then remove it.

$length = strlen($string);
$string[($length-1)] = '';

2) Convert your character from UTF-8 to encoding of you CSV file and use str_replace. For example if you CSV is encoded in ISO-8859-2

echo iconv('UTF-8', 'ISO-8859-2', "Â"); 
Uund answered 18/8, 2013 at 14:14 Comment(1)
It seems to be an encoding issue. You should get the hold on what char it is. otherwhise you might get other errors to. Remember to be consequent in usage of encoding. For the database, etc.Calise

© 2022 - 2025 — McMap. All rights reserved.