I am working on getting some song lyrics using an API, and converting the lyrics string into an array of words. I am getting some unusual behaviors in preg_replace function. When I did some debugging using var_dump, I see that var_dump returns a value of 10 for the string "you", which tells me that there might be something wrong. After that preg_replace acts weirdly.
This is my code:
$source = get_chart_lyrics_data("madonna","frozen");
$pieces = explode("\n", $source);
$lyrics = array();
for($i=0;$i<count($pieces);$i++){
if($i>10){
$words = explode(" ",$pieces[$i]);
foreach($words as $_word){
if($_word=="")
continue;
var_dump($_word);
$word = strtolower($_word);
var_dump($word);
$word = trim($word);
var_dump($word);
$word = preg_replace("/[^A-Za-z ]/", '', $word);
var_dump($word);
$lyrics[$word]++;
}
}
}
This is the first 4 lines this code returns:
string(10) “You”
string(10) “you”
string(10) “you”
string(8) “lyricyou”
How come var_dump is returning a value of 10 for "you"? And why preg_replace is acting like that?
Thanks.
var_dump
returns the number of byte, not the number of character, can you show the original string or better, where it comes from. – Amesecho htmlentities(htmlentities($word));
to see if there are any special characters or something – Cvar_dump
output; I'd assume they just turned "fancy" while copy and pasting into SO. – Relationalpreg_replace('/[^\pL ]+/u', '', $word);
– Ames