PHP - iterate on string characters
Asked Answered
R

10

177

Is there a nice way to iterate on the characters of a string? I'd like to be able to do foreach, array_map, array_walk, array_filter etc. on the characters of a string.

Type casting/juggling didnt get me anywhere (put the whole string as one element of array), and the best solution I've found is simply using a for loop to construct the array. It feels like there should be something better. I mean, if you can index on it shouldn't you be able to iterate as well?

This is the best I've got

function stringToArray($s)
{
    $r = array();
    for($i=0; $i<strlen($s); $i++) 
         $r[$i] = $s[$i];
    return $r;
}

$s1 = "textasstringwoohoo";
$arr = stringToArray($s1); //$arr now has character array

$ascval = array_map('ord', $arr);  //so i can do stuff like this
$foreach ($arr as $curChar) {....}
$evenAsciiOnly = array_filter( function($x) {return ord($x) % 2 === 0;}, $arr);

Is there either:

A) A way to make the string iterable
B) A better way to build the character array from the string (and if so, how about the other direction?)

I feel like im missing something obvious here.

Reamonn answered 5/1, 2011 at 5:14 Comment(6)
Maybe you should say more about that you're trying to accomplish... it seems like there might be a better way to do it using normal string operations.Mesoderm
dont have a real objective here. just a curiosity i was playing with. seemed weird that even though you can index on strings you cant iterate. i was at a loss to even think up meaningful example uses, but i still would like to know if there is some way to iterate on the strings characters without constructing a character array explictlyReamonn
thats good point though, obviously my examples are pretty shallow. ie - mostly anything you'd do with array_filter in this sense could be better done with string or reg-ex functionsReamonn
Solving projecteuler.net/problem=20 might be an example (though somewhat contrived) use case.Means
one note, regarding for($i=0; $i<strlen($s); $i++) I would store the strlen($s) in a variable before looping, this way you won't call strlen() more than 1 timeVelda
String sanitation is a good example of when to use this. if you want to replace all occurrences of '%' with '[%]' you would just use str_replace. But if you want to replace all occurrences of '[' with '[[]' and all occurrences of ']' with '[]]' you would need to iterate through the string to test each character to prevent the replaces from clobbering each-other.Anteroom
V
249

Use str_split to iterate ASCII strings (since PHP 5.0)

If your string contains only ASCII (i.e. "English") characters, then use str_split.

$str = 'some text';
foreach (str_split($str) as $char) {
    var_dump($char);
}

Use mb_str_split to iterate Unicode strings (since PHP 7.4)

If your string might contain Unicode (i.e. "non-English") characters, then you must use mb_str_split.

$str = 'μυρτιὲς δὲν θὰ βρῶ';
foreach (mb_str_split($str) as $char) {
    var_dump($char);
}
Valedictory answered 5/1, 2011 at 5:20 Comment(6)
@Reamonn I don't know your application, but do take note that each entry in an array has a significant overhead (4bytes IIRC). Skip that, it is 'quite' way more: nikic.github.com/2011/12/12/…Phytohormone
str_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string. - So str_split cannot work with UnicodeCarib
mb_str_split would be the multi-byte equivalent. $array = mb_str_split($your_string);Manning
Any reason why the loop isn't simplified to foreach (str_split($your_string) as $char)?Drub
Pay attention that str_split() will produce at least one element even in case of empty strings, which, on your context will produce at least one iteration in that case. This may be a good source of tricky bugs.Epimenides
@DemisPalmaツ True for PHP before 8.2. Since PHP 8.2 this bug is fixed. See PHP 8.2 upgrade notes.Drub
A
129

Iterate string:

for ($i = 0; $i < strlen($str); $i++){
    echo $str[$i];
}
Archimage answered 24/10, 2016 at 9:30 Comment(7)
This seems like a better answer because it answers the question - i.e. how to iterate over a string as opposed to 'convert to array'.Stagg
LOL!!!!! Everything @OmarTariq. This is much more efficient than the answer provided.Neoterize
Just note that you're calling strlen() on each iteration. Not a terrible thing, since PHP has the length precalculated, but still a function call. If you have a need for speed, better save that in a variable before starting the loop.Individually
This is not good for multibyte strings, because here we're gettings byte offset, not a symbolStricklin
@OmarTariq "This is the answer. What is wrong with the world?" .... The wrong with the world is that the world has other languages than English, this function as alvery said will iterate the bytes in the string, not the characters.Halfblood
The fun thing is $string[-1] will return you 'g'. I thought it should have returned some index not found error. It's not just weird but a blunder in PHP (IMO)Bricklaying
I tried it on a string containing among others an utf8 character and it did not work : it seems to iterate over the bytes of the string instead of the characters of the string.Garfish
I
21

If your strings are in Unicode you should use preg_split with /u modifier

From comments in php documentation:

function mb_str_split( $string ) { 
    # Split at all position not after the start: ^ 
    # and not before the end: $ 
    return preg_split('/(?<!^)(?!$)/u', $string ); 
} 
Incommodity answered 8/1, 2012 at 18:28 Comment(5)
For multibyte strings, mb_split is more reliable.Figurant
Citation required @FigurantGrayson
@Grayson It's been a couple years (and these days you should probably be using the stdlib mb_str_split if you're on PHP≥7.4 anyway), and I can't really recall what I meant there, but my guess would be that preg_split with /.../u is UTF-8 only (NOT 'Unicode', as OP says) while mb_split allows for arbitrary encoding (additionally, mb_split is explicitly designed for regex-splitting over multibyte strings so it might have some extra optimizations and such? and in general since it's purpose-built my default assumption is that it's more reliable and/or complete than a /u PCRE extension)Figurant
I am not personally aware of any differences between mb_str_split() and preg_split('//u', $string). I am just saying that it is important that we not perpetuate potentially false claims based on assumptions. If one technique is provably inferior to another, we should be able to substantiate this truth.Grayson
Ye! thanks for calling me out on that. Unfortunately it's a bit too late for me to edit the original comment but hopefully the follow up clears up what I meant; info from here and here btw since I hit charlimit on the previous comment.Figurant
G
14

You can also just access $s1 like an array, if you only need to access it:

$s1 = "hello world";
echo $s1[0]; // -> h
Gratianna answered 18/2, 2016 at 21:47 Comment(0)
R
9

Most of the answers forgot about non English characters !!!

strlen counts BYTES, not characters, that is why it is and it's sibling functions works fine with English characters, because English characters are stored in 1 byte in both UTF-8 and ASCII encodings, you need to use the multibyte string functions mb_*

This will work with any character encoded in UTF-8

// 8 characters in 12 bytes
$string = "abcdأبتث";

$charsCount = mb_strlen($string, 'UTF-8');
for($i = 0; $i < $charsCount; $i++){
    $char = mb_substr($string, $i, 1, 'UTF-8');
    var_dump($char);
}

This outputs

string(1) "a"
string(1) "b"
string(1) "c"
string(1) "d"
string(2) "أ"
string(2) "ب"
string(2) "ت"
string(2) "ث"
Reenareenforce answered 1/9, 2019 at 17:15 Comment(0)
P
8

For those who are looking for the fastest way to iterate over strings in php, Ive prepared a benchmark testing.
The first method in which you access string characters directly by specifying its position in brackets and treating string like an array:

$string = "a sample string for testing";
$char = $string[4] // equals to m

I myself thought the latter is the fastest method, but I was wrong.
As with the second method (which is used in the accepted answer):

$string = "a sample string for testing";
$string = str_split($string);
$char = $string[4] // equals to m

This method is going to be faster cause we are using a real array and not assuming one to be an array.

Calling the last line of each of the above methods for 1000000 times lead to these benchmarking results:

Using string[i]
0.24960017204285 Seconds

Using str_split
0.18720006942749 Seconds

Which means the second method is way faster.

Protohuman answered 2/9, 2016 at 7:36 Comment(0)
O
6

Expanded from @SeaBrightSystems answer, you could try this:

$s1 = "textasstringwoohoo";
$arr = str_split($s1); //$arr now has character array
Overtask answered 15/5, 2015 at 7:55 Comment(1)
I disagree, this answer does add value, it gives a working example of how str_split might work in a PHP application. @Valedictory just links to the documentation, which is sometimes not that helpful when a person is trying to see how a function may work, given an example. Otherwise most SO answers would just be links to php.netMerrow
R
5

Hmm... There's no need to complicate things. The basics work great always.

    $string = 'abcdef';
    $len = strlen( $string );
    $x = 0;

Forward Direction:

while ( $len > $x ) echo $string[ $x++ ];

Outputs: abcdef

Reverse Direction:

while ( $len ) echo $string[ --$len ];

Outputs: fedcba

Retard answered 21/12, 2018 at 6:19 Comment(0)
S
3
// Unicode Codepoint Escape Syntax in PHP 7.0
$str = "cat!\u{1F431}";

// IIFE (Immediately Invoked Function Expression) in PHP 7.0
$gen = (function(string $str) {
    for ($i = 0, $len = mb_strlen($str); $i < $len; ++$i) {
        yield mb_substr($str, $i, 1);
    }
})($str);

var_dump(
    true === $gen instanceof Traversable,
    // PHP 7.1
    true === is_iterable($gen)
);

foreach ($gen as $char) {
    echo $char, PHP_EOL;
}
Screwy answered 29/8, 2016 at 5:4 Comment(0)
G
0

Depending on your needs/definition of "characters", it may be most helpful to keep multibyte "clusters" intact.

From PHP8.2.18, better handling of multi-component emojis has been implemented with grapheme_ functions.

Code: (Demo)

$text = 'Hey 🙇‍♂️ boy';
for ($i = 0, $len = grapheme_strlen($text); $i < $len; ++$i) {
    echo grapheme_substr($text, $i, 1) . "\n";
}

Output:

H
e
y
 
🙇‍♂️
 
b
o
y

Even using mb_ functions would have produced: (Demo)

H
e
y
 
🙇
‍
♂
️
 
b
o
y

To simplify this task, PHP8.4 has added a new splitting function to the grapheme_ family: grapheme_split().

Code:

$text = 'Hey 🙇‍♂️ boy';
foreach (grapheme_split($text) as $g) {
    echo $g . "\n";
}
Grayson answered 15/4 at 21:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.