I can see this is quite old question but is still in the top results when searching for a way to check if string contains only whitespace characters.
PHP trim method (as well as other methods like ctype_space
) doesn't actually remove all possible whitespace characters. Trim removes only:
" " (ASCII 32 (0x20)), an ordinary space.
"\t" (ASCII 9 (0x09)), a tab.
"\n" (ASCII 10 (0x0A)), a new line (line feed).
"\r" (ASCII 13 (0x0D)), a carriage return.
"\0" (ASCII 0 (0x00)), the NUL-byte.
"\v" (ASCII 11 (0x0B)), a vertical tab.
(all copied from https://www.php.net/manual/en/function.trim.php)
And wiki specifies about 32 whitespace chars (6 from which are only related to whitespace but marked as not-whitespace, so the table is separate into two groups, you decide if you need the second group)
Then to check if string contains only whitespace:
// BE AWARE that I'm assuming that text is in UTF-8 (which will be probably true in most cases)
/**
* Find the end of the UTF-8 char
* https://mcmap.net/q/281060/-how-to-iterate-utf-8-string-in-php
* @param string $string UTF-8 string
* @param int $pointer Start of the character (byte)
* @param int $nextLetter Reference to save where next character starts
* @return string Whole letter
*/
function get(string $string, int $pointer, int &$nextLetter): string|bool
{
if (!isset($string[$pointer])) {
return false;
}
$char = ord($string[$pointer]);
if ($char < 128) {
$nextLetter = $pointer + 1;
return $string[$pointer];
}
if ($char < 224) {
$bytes = 2;
} elseif ($char < 240) {
$bytes = 3;
} else {
$bytes = 4;
}
$str = substr($string, $pointer, $bytes);
$nextLetter = $pointer + $bytes;
return $str;
}
function isWhitespace(string $string): bool
{
if (strlen($string) == 0) {
return false;
}
// https://en.wikipedia.org/wiki/Whitespace_character
$table = [
// Unicode characters with property White_Space=yes
"\u{0009}" => true, "\u{000A}" => true, "\u{000B}" => true, "\u{000C}" => true,
"\u{000D}" => true, "\u{0020}" => true, "\u{0085}" => true, "\u{00A0}" => true,
"\u{1680}" => true, "\u{2000}" => true, "\u{2001}" => true, "\u{2002}" => true,
"\u{2003}" => true, "\u{2004}" => true, "\u{2005}" => true, "\u{2006}" => true,
"\u{2007}" => true, "\u{2008}" => true, "\u{2009}" => true, "\u{200A}" => true,
"\u{2028}" => true, "\u{2029}" => true, "\u{202F}" => true, "\u{205F}" => true,
"\u{3000}" => true,
// Related Unicode characters with property White_Space=no
"\u{180E}" => true, "\u{200B}" => true, "\u{200C}" => true, "\u{200D}" => true,
"\u{2060}" => true, "\u{FEFF}" => true,
];
$nextLetter = $i = 0;
// Iterate over the string and cut it into proper UTF-8 letters
// We have to do this to have actual letters and not letters chunked into bytes
// as they will be saved in multiple spaces, so spliting text by 1 with substr
// will only return in trash characters if we encounter something that has length
// bigger then 1 byte
//
// If you have mbstring extension installed you might want to use mb_substr but this
// will be a lot slower as for each cut it will iterate over the string (again from the start)
// to cut it into properly sized chunks. Have a read:
// https://www.php.net/manual/en/function.mb-substr.php - note from `qbolec at gmail dot com`
while (($letter = get($string, $i, $nextLetter)) !== false) {
// Quick exit if any char is not whitespace to prevent additional loops
if (!isset($table[$letter])) {
return false;
}
$i = $nextLetter;
}
return true;
}
Usage:
isWhitespace("\t\n\r"); // true
isWhitespace("\td\n\r"); // false
isWhitespace(""); // false
// This one is "\u{2003}\u{2000}\u{2009}" (em space, en quad and thin space) but in UTF-8
isWhitespace(" "); // true
To check if something is empty you can use empty
method:
if (empty($string)) {
// Is empty!
}
If you don't want to separate string into proper letters manually in each project (like me) you might want to use my library - https://github.com/Mortimer333/Content
empty()
is nearly useless for such things.empty(0)
is true. try strlen()==0 instead. – Xylol