Split words in string into an array without breaking phrases wrapped in double quotes
Asked Answered
R

6

0

I want to let the user type in tags: windows linux "mac os x"

and then split them up by white space but also recognizing "mac os x" as a whole word.

Is this possible to combine the explode function with other functions for this?

Rajkot answered 19/12, 2009 at 9:56 Comment(2)
Tell the user to use mac-os-x :)Wortman
I would create a dictionary file of tags and use the levenshtein function to find the best match.Tombaugh
E
2

As long as there can't be quotes within quotes (eg. "foo\"bar" isn't allowed), you can do this with a regular expression. Otherwise you need a full parser.

This should do:

function split_words($input) {
  $matches = array();
  if (preg_match_all('/("([^"]+)")|(\w+)/', $input, $reg)) {
    for ($ii=0,$cc=count($reg[0]); $ii < $cc; ++$ii) {
      $matches[] = $reg[2][$ii] ? $reg[2][$ii] : $reg[3][$ii];
    }
  }
  return $matches;
}

Usage:

$input = 'windows linux "mac os x"';
var_dump(split_words($input));
Entente answered 19/12, 2009 at 10:28 Comment(0)
S
8

I would ask the user to enter the tags commas separated and explode with comma delimiter:

$string = "windows, linux, mac os x";
$pieces = explode(',', $string);

This is they way most tag system work anyway.

otherwise you'll need to construct a parser because explode cannot cope with what you want. Regex is an overkill in my opinion.

Scrivens answered 19/12, 2009 at 10:4 Comment(0)
R
2

Either have the user separate their tag values with commas as Elzo Valugi suggested, or improve on your UI so that users enter one tag at a time (similar to Google Wave or Wordpress's tagging UI). I suggest the later.

If you really want to stick with your proposed entry format (which I don't suggest), you could maintain a list of multi-word tags (those that aren't supposed to be split). Compare the combined tag string provided by the user against this list and make sure that you don't split those terms. If you're set on sticking to this method, I could go into the details more, but I don't think it's a good idea as the entry format itself is flawed.

Rosannarosanne answered 19/12, 2009 at 10:13 Comment(2)
The underscore _ could also be used in tags in place of spaces, then a simple str_replace done.Feverous
That's not really a legitimate request to make of a userRosannarosanne
E
2

As long as there can't be quotes within quotes (eg. "foo\"bar" isn't allowed), you can do this with a regular expression. Otherwise you need a full parser.

This should do:

function split_words($input) {
  $matches = array();
  if (preg_match_all('/("([^"]+)")|(\w+)/', $input, $reg)) {
    for ($ii=0,$cc=count($reg[0]); $ii < $cc; ++$ii) {
      $matches[] = $reg[2][$ii] ? $reg[2][$ii] : $reg[3][$ii];
    }
  }
  return $matches;
}

Usage:

$input = 'windows linux "mac os x"';
var_dump(split_words($input));
Entente answered 19/12, 2009 at 10:28 Comment(0)
G
0

You could do a regex. I'm not the best at writing them, but someone else here should be able to match the 'words' breaking them on spaces that aren't in quotes.

Gordongordy answered 19/12, 2009 at 10:2 Comment(0)
T
0

When the user is entering the string "mac os x" you can automatically detect the white space and change to string to "mac-os-x" then you can still explode this way:

$os = "metasys solaris mac-os-x";
$strings = explode(' ', $os);

You can do this using the replace function.

Transalpine answered 19/12, 2009 at 10:24 Comment(2)
This implies that the user is entering tags one at a time, in which case, it's possible to keep the tags separate from the beginning. Also making the user convert spaces to "-" isn't very usable.Rosannarosanne
Even if the user will be entering the tags one at a time you aren't going to be sending it to the server immediately. And by the way i talking about entering a tag and just hitting the enter button. It's user friendly and it it is equivalent to spaces in between. Users just have to hit ENTER in my case and SPACE BAR in the other case.Transalpine
S
0

You are parsing a delimited string -- that delimiter in this case is a space.

PHP has str_getcsv() which will protect substrings wrapped in a particular character -- the default wrapping character is a double quote (how convenient for you). If your input string was comma-delimited, you could omit the 2nd parameter because that is the default value.

The double quotes will be stripped from the value in the result array.

Code: (Demo)

$string = 'windows linux "mac os x"';

var_export(
    str_getcsv($string, ' ')
);

Output:

array (
  0 => 'windows',
  1 => 'linux',
  2 => 'mac os x',
)
Sparrow answered 17/1 at 9:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.