PHP explode the string, but treat words in quotes as a single word
Asked Answered
F

5

54

How can I explode the following string:

Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor

into

array("Lorem", "ipsum", "dolor sit amet", "consectetur", "adipiscing elit", "dolor")

So that the text in quotation is treated as a single word.

Here's what I have for now:

$mytext = "Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor"
$noquotes = str_replace("%22", "", $mytext");
$newarray = explode(" ", $noquotes);

but my code divides each word into an array. How do I make words inside quotation marks treated as one word?

Foolery answered 4/2, 2010 at 19:8 Comment(3)
This sounds like a job for RegexJulie
See also An explode() function that ignores characters inside quotes?Montparnasse
From 2009: Split words in string into an array without breaking phrases wrapped in double quotesChuringa
S
90

You could use a preg_match_all(...):

$text = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing \\"elit" dolor';
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);

which will produce:

Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => "dolor sit amet"
            [3] => consectetur
            [4] => "adipiscing \"elit"
            [5] => dolor
        )

)

And as you can see, it also accounts for escaped quotes inside quoted strings.

EDIT

A short explanation:

"           # match the character '"'
(?:         # start non-capture group 1 
  \\        #   match the character '\'
  .         #   match any character except line breaks
  |         #   OR
  [^\\"]    #   match any character except '\' and '"'
)*          # end non-capture group 1 and repeat it zero or more times
"           # match the character '"'
|           # OR
\S+         # match a non-whitespace character: [^\s] and repeat it one or more times

And in case of matching %22 instead of double quotes, you'd do:

preg_match_all('/%22(?:\\\\.|(?!%22).)*%22|\S+/', $text, $matches);
Sadiron answered 4/2, 2010 at 19:18 Comment(15)
Is there a reason not to use preg_split instead of preg_match_all? it seems like a more natural fit IMO.Aribold
That's Awesome! I'll have to study the code for a bit to figure what just happened! thanksFoolery
@prodigitalson: no, using preg_split(...) you cannot account for escaped characters. preg_match_all(...) "behaves" more like a parser which is the more natural thing to do here. Besides, using a preg_split(...), you'll need to look ahead on each space to see how many quotes are ahead of it, making it an O(n^2) operation: no problem for small strings, but might decrease the runtime when larger strings are involved.Sadiron
@timofey, see my edit. Don't hesitate to ask for more clarification if it's not clear to you: you're the one maintaining the code, so you should understand it (and I'm more than happy to provide extra information if it's needed).Sadiron
Thanks Bart K.! I was already searching google for answers on that one:)Foolery
But then if I want to replace Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor (basically the quotation marks are listed as %22) the following doesn't seem to work: preg_match_all('/%22(?:\\\\.|[^\\\\"])*%22|\S+/', $text, $matches);Foolery
That's beginning to make sense! ThanksFoolery
In single quoted php strings the '\' won't escape so you don't need \\\\ for one \.Apodosis
Oh it's not true. \ and ' still should be escaped. sryApodosis
why is your solution doing this pastebin.com/bhrnMGST to this string - this has a \"quoted sentence\" insideMisleading
@Bart Kiers does your solution apply to my example?Misleading
@Bart Kiers Thanks! If I have single quotes?Misleading
@Bart Kiers Things have changed a little bit. Sorry about this. After using mysql_real_escape_string() I get this, - this has a \\\'quoted sentence\\\' inside. So I need to account for those extra slashes (i dont know if it makes a difference) and single or double quotes.Misleading
@Bart Kiers it wouldnt last 2 minutes. haha. Give me one more hit of regex and i'll be gone.Misleading
Preg split alternative: https://mcmap.net/q/339600/-split-string-on-spaces-except-words-in-quotesHover
P
90

This would have been much easier with str_getcsv().

$test = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor';
var_dump(str_getcsv($test, ' '));

Gives you

array(6) {
  [0]=>
  string(5) "Lorem"
  [1]=>
  string(5) "ipsum"
  [2]=>
  string(14) "dolor sit amet"
  [3]=>
  string(11) "consectetur"
  [4]=>
  string(15) "adipiscing elit"
  [5]=>
  string(5) "dolor"
}
Pastor answered 7/7, 2011 at 10:56 Comment(6)
This works on my development machine, but not on my production server. :-/Trio
str_getcsv requires PHP 5.3.Ersatz
Be aware that it "ignores" the quotes. If you need them to be there in the split also then this wont work.Aleksandr
I've made some speed test and preg_match_all is about 3-5 times quicker. Probably not an issue for most people, specially if don't need the quotes (in this case it's much easier to use), but I think worth a mention.Intussuscept
@Intussuscept care to share you tests?Pastor
Nothing special, just wrapped around both with a 1 to 10000 for cycle and checked microtimes before and after. Both fast enough for single use, even with the test quantity, hence I mentioned it probably won't be a problem to most of us.Intussuscept
G
4

You can also try this multiple explode function

function multiexplode ($delimiters,$string)
{

$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return  $launch;
}

$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);

print_r($exploded);
Goa answered 25/5, 2013 at 9:10 Comment(1)
This answer is good, but if you ask it to split on spaces and quotes, it splits on spaces inside the quotes.Jeffiejeffrey
J
2

I came here with a complex string splitting problem similar to this, but none of the answers here did exactly what I wanted - so I wrote my own.

I am posting it here just in case it is helpful to someone else.

This is probably a very slow and inefficient way to do it - but it works for me.

function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
    $chars = str_split($str);
    $parts = [];
    $nextpart = "";
    $toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
    $depth = 0;
    foreach($chars as $char)
    {
        if(in_array($char, $openers))
            $depth++;
        elseif(in_array($char, $closers))
            $depth--;
        elseif(in_array($char, $togglers))
        {
            if($toggle_states[$char])
                $depth--; // we are inside a toggle block, leave it and decrease the depth
            else
                // we are outside a toggle block, enter it and increase the depth
                $depth++;

            // invert the toggle block state
            $toggle_states[$char] = !$toggle_states[$char];
        }
        else
            $nextpart .= $char;

        if($depth < 0) $depth = 0;

        if(in_array($char, $delimiters) &&
           $depth == 0 &&
           !in_array($char, $closers))
        {
            $parts[] = substr($nextpart, 0, -1);
            $nextpart = "";
        }
    }
    if(strlen($nextpart) > 0)
        $parts[] = $nextpart;

    return $parts;
}

Usage is as follows. explode_adv takes 5 arguments:

  1. An array of characters that open a block - e.g. [, (, etc.
  2. An array of characters that close a block - e.g. ], ), etc.
  3. An array of characters that toggle a block - e.g. ", ', etc.
  4. An array of characters that should cause a split into the next part.
  5. The string to work on.

This method probably has flaws - edits are welcome.

Jeffiejeffrey answered 20/5, 2015 at 17:13 Comment(0)
P
1

In some situations the little known token_get_all() might prove useful:

$tokens = token_get_all("<?php $text ?>");
$separator = ' ';
$items = array();
$item = "";
$last = count($tokens) - 1;
foreach($tokens as $index => $token) {
    if($index != 0 && $index != $last) {
        if(count($token) == 3) {
            if($token[0] == T_CONSTANT_ENCAPSED_STRING) {
                $token = substr($token[1], 1, -1);
            } else {
                $token = $token[1];
            }
        }
        if($token == $separator) {
            $items[] = $item;
            $item = "";
        } else {
            $item .= $token;
        }
    }
}

Results:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor sit amet
    [3] => consectetur
    [4] => adipiscing elit
    [5] => dolor
)
Photoplay answered 1/11, 2014 at 20:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.