I want to be able to detect (using regular expressions) if a string contains hebrew characters both utf8 and iso8859-8 in the php programming language. thanks!
Here's map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew. You could check for those characters in a character class:
[\xE0-\xFA]
For UTF-8, the range reserved for Hebrew appears to be 0591 to 05F4. So you could detect that with:
[\u0591-\u05F4]
Here's an example of a regex match in PHP:
echo preg_match("/[\u0591-\u05F4]/", $string);
well if your PHP file is encoded with UTF-8 as should be in cases that you have hebrew in it, you should use the following RegX:
$string="אבהג";
echo preg_match("/\p{Hebrew}/u", $string);
// output: 1
Here's a small function to check whether the first character in a string is in hebrew:
function IsStringStartsWithHebrew($string)
{
return (strlen($string) > 1 && //minimum of chars for hebrew encoding
ord($string[0]) == 215 && //first byte is 110-10111
ord($string[1]) >= 144 && ord($string[1]) <= 170 //hebrew range in the second byte.
);
}
good luck :)
First, such a string would be completely useless - a mix of two different character sets?
Both the hebrew characters in iso8859-8, and each byte of multibyte sequences in UTF-8, have a value ord($char) > 127
. So what I would do is find all bytes with a value greater than 127, and then check if they make sense as is8859-8, or if you think they would make more sense as an UTF8-sequence...
function is_hebrew($string)
{
return preg_match("/\p{Hebrew}/u", $string);
}
© 2022 - 2024 — McMap. All rights reserved.