php explode: split string into words by using space a delimiter
Asked Answered
G

8

25
$str = "This is a    string";
$words = explode(" ", $str);

Works fine, but spaces still go into array:

$words === array ('This', 'is', 'a', '', '', '', 'string');//true

I would prefer to have words only with no spaces and keep the information about the number of spaces separate.

$words === array ('This', 'is', 'a', 'string');//true
$spaces === array(1,1,4);//true

Just added: (1, 1, 4) means one space after the first word, one space after the second word and 4 spaces after the third word.

Is there any way to do it fast?

Thank you.

Glaciology answered 5/9, 2013 at 14:15 Comment(7)
#3432683Pucida
Do you want the number of spaces or the position of each space?Pamphlet
@JasonMcCreary He wants the number of consecutive spaces in each space group: ' ' (1), ' ' (1), ' ' (4).Barrios
Thank you. The number of spaces.Glaciology
@Haradzieniec, I don't think you understand the difference. Number of spaces = 6. Which is not what you want.Pamphlet
What do you mean "which is not I want"? You are right, the total number of spaces is 6. BUT, I neeed the information about spaces BETWEEN the words.Glaciology
for 1st part of the question, to not consider the spaces just use trim function.Applied
G
37

For splitting the String into an array, you should use preg_split:

$string = 'This is a    string';
$data   = preg_split('/\s+/', $string);

Your second part (counting spaces):

$string = 'This is a    string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]
Glauce answered 5/9, 2013 at 14:17 Comment(3)
Where is the number of spaces about questioner expected?Juliojulis
Thank you for your answer. Howerer, you loose the information about the number of spaces in between. That's what I asked in the question (please see the bolded text).Glaciology
@Glaciology just typing that, yes. Thanks you, I've updated.Glauce
S
3

Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:

$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());

You can see from this demo that $words is:

Array
(
    [0] => This
    [1] => is
    [2] => a
    [3] => string
)

And $spaces is:

Array
(
    [0] => 1
    [1] => 1
    [2] => 4
)
Sirdar answered 5/9, 2013 at 14:30 Comment(1)
thank you very much for you answer. I've tested both your and Alma Do Mundo / silkfire solutions. All solutions work fine, but Alma Do Mundo's work about two times faster. Thank you for your solution anyway. You can compare both if you want (pleae see my reply on my own question in a second).Glaciology
B
1

You can use preg_split() for the first array:

$str   = 'This is a    string';
$words = preg_split('#\s+#', $str);

And preg_match_all() for the $spaces array:

preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);
Barrios answered 5/9, 2013 at 14:20 Comment(1)
1, 1, 4 means one space after the first word, one space after the second word and 4 spaces after the third word.Glaciology
I
0

Another way to do it would be using foreach loop.

$str = "This is a    string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}
Imbecilic answered 5/9, 2013 at 14:22 Comment(2)
Thank you. However, it collects spaces, but it doesn't' contain information about number of spaces between the words.Glaciology
This answer is provably incorrect. 3v4l.org/Tq66WMauceri
G
0

Here are the results of performance tests:

$str = "This is a    string";

var_dump(time());

for ($i=1;$i<100000;$i++){
//Alma Do Mundo  - the winner
$rgData = preg_split('/\s+/', $str);


preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]


}
print_r($rgData); print_r( $rgResult);
var_dump(time());




for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());
}


print_r( $words); print_r( $spaces);
var_dump(time());

int(1378392870) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392871) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392873)

Glaciology answered 5/9, 2013 at 15:7 Comment(1)
I am very surprised to see that two regex calls are somehow outperforming a single regex call.Mauceri
M
0

Splitting with regex has been demonstrated well by earlier answers, but I think this is a perfect case for calling ctype_space() to determine which result array should receive the encountered value.

Code: (Demo)

$string = "This is a    string";

$words = [];
$spaces = [];

foreach (preg_split('~( +)~', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $s) {
    if (ctype_space($s)) {
        $spaces[] = strlen($s);
    } else {
        $words[] = $s;
    }
}

var_export([
    'words' => $words,
    'spaces' => $spaces
]);

Output:

array (
  'words' => 
  array (
    0 => 'This',
    1 => 'is',
    2 => 'a',
    3 => 'string',
  ),
  'spaces' => 
  array (
    0 => 1,
    1 => 1,
    2 => 4,
  ),
)

If you want to replace the piped constants used by preg_split() you can just use 3 (Demo). This represents PREG_SPLIT_NO_EMPTY which is 1 plus PREG_SPLIT_DELIM_CAPTURE which is 2. Be aware that with this reduction in code width, you also lose code readability.

preg_split('~( +)~', $string, -1, 3)
Mauceri answered 14/7, 2021 at 7:40 Comment(0)
C
0

What about this? Does someone care to profile this?

    $str = str_replace(["\t", "\r", "\r", "\0", "\v"], ' ', $str); // \v -> vertical space, see trim()
    $words = explode(' ', $str);
    $words = array_filter($words); // there would be lots elements from lots of spaces so skip them.
Courage answered 25/10, 2021 at 7:30 Comment(0)
L
-1

$financialYear = 2015-2016;

$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016
Lucia answered 15/4, 2015 at 8:57 Comment(1)
This doesn't resemble the question! and is YEARS late.Mauceri

© 2022 - 2024 — McMap. All rights reserved.