Get non-numeric characters then number on each line of a block of texf
Asked Answered
M

3

5

I have some strings which can be in the following format:

sometext moretext 01 text
text sometext moretext 002
text text 1 (somemoretext)
etc

I want to split these strings into following:

  • text before the number and
  • the number

For example:

text text 1 (somemoretext)

When split will output:

text = text text
number = 1

Anything after the number can be discarded.

Margy answered 15/1, 2013 at 22:23 Comment(0)
E
15
preg_match('/[^\d]+/', $string, $textMatch);
preg_match('/\d+/', $string, $numMatch);

$text = $textMatch[0];
$num = $numMatch[0];

Alternatively, you can use preg_match_all with capture groups to do it all in one shot:

preg_match_all('/^([^\d]+)(\d+)/', $string, $match);

$text = $match[1][0];
$num = $match[2][0];
Emmyemmye answered 15/1, 2013 at 22:24 Comment(0)
T
2

Use preg_match_all() + if you wish to match every line use m modifier:

$string = 'sometext moretext 01 text
text sometext moretext 002
text text 1 (somemoretext)
etc';
preg_match_all('~^(.*?)(\d+)~m', $string, $matches);

All your results are in $matches array, which looks like this:

Array
(
    [0] => Array
        (
            [0] => sometext moretext 01
            [1] => text sometext moretext 002
            [2] => text text 1
        )
    [1] => Array
        (
            [0] => sometext moretext 
            [1] => text sometext moretext 
            [2] => text text 
        )
    [2] => Array
        (
            [0] => 01
            [1] => 002
            [2] => 1
        )
)

Output example:

foreach ($matches[1] as $k => $text) {
    $int = $matches[2][$k];
    echo "$text => $int\n";
}
Templia answered 15/1, 2013 at 22:26 Comment(0)
Y
0

The other answers do not demonstrate the use of \D to match non-digit characters. \D is the opposite of \d.

* as a quantifier means zero or more and + means one or more. Quantifiers immediately followed by ? are made "lazy" -- effectively they try to make the shortest qualifying match, but this has a negative impact on performance and should be avoided when possible.

The ^ means the start of a line when the pattern has a m flag.

Code: (Demo)

$text = 'sometext moretext 01 text
text sometext moretext 002
text text 1 (somemoretext)
etc';

preg_match_all('/^(\D*)(\d+)/m', $text, $matches);

var_export([
    'non-digit' => $matches[1],
    'digit' => $matches[2]
]);

Output:

array (
  'non-digit' => 
  array (
    0 => 'sometext moretext ',
    1 => 'text sometext moretext ',
    2 => 'text text ',
  ),
  'digit' => 
  array (
    0 => '01',
    1 => '002',
    2 => '1',
  ),
)

If you want to discard potential spaces at the end of the non-numeric string, add ? to make the first group lazy and match zero or more whitespace characters without capturing. (Demo)

preg_match_all('/^(\D*?)\s*(\d+)/m', $text, $matches);
York answered 15/7, 2022 at 21:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.