Get line number from preg_match_all()
Asked Answered
C

10

21

I'm using PHP's preg_match_all() to search a string imported using file_get_contents(). The regex returns matches but I would like to know at which line number those matches are found. What's the best technique to achieve this?

I could read the file as an array and perform the regex for each line, but the problem is that my regex matches results across carriage returns (new lines).

Chumley answered 19/1, 2011 at 1:24 Comment(3)
I'm going to throw out a guess and say that you may not be able to use preg_match_all for this.Adams
preg_split and count lines in the results? That sounds dumb now that I said it.Altricial
I don't see any easy way to accomplish what you want to do...Ascertain
H
17

well it's kinda late, maybe you alrady solved this, but i had to do it and it's fairly simple. using PREG_OFFSET_CAPTURE flag in preg_match will return the character position of the match. lets assume $charpos, so

list($before) = str_split($content, $charpos); // fetches all the text before the match

$line_number = strlen($before) - strlen(str_replace("\n", "", $before)) + 1;

voilá!

Herpes answered 4/11, 2011 at 17:9 Comment(0)
M
11

You can't do this with only regexs. At least not cleanly. What can you do it to use the PREG_OFFSET_CAPTURE flag of the preg_match_all and do a post parsing of the entire file.

I mean after you have the array of matches strings and starting offsets for each string just count how many \r\n or \n or \r are between the beginning of the file and the offset for each match. The line number of the match would be the number of distinct EOL terminators (\r\n | \n | \r) plus 1.

Milliary answered 19/1, 2011 at 1:35 Comment(0)
S
3

Late to the game but I needed this functionality today and I realized that @Javier's and @iguito's answers could be combined into a simple solution. I also replaced the check for \n with PHP_EOL for my use case:

// Get your matches
preg_match_all( '[YOUR REGEX HERE]', $data, $matches, PREG_OFFSET_CAPTURE );

// This is my loop format, yours may need to be different
foreach ( $matches[0] as $match ) {

    // Get the line number for the current match 
    list( $before ) = str_split( $data, $match[1] );
    $line_number = substr_count( $before, PHP_EOL ) + 1;
    echo $line_number;

}
Skepticism answered 8/11, 2019 at 0:24 Comment(0)
I
2
$data = "Abba
Beegees
Beatles";

preg_match_all('/Abba|Beegees|Beatles/', $data, $matches, PREG_OFFSET_CAPTURE);
foreach (current($matches) as $match) {
    $matchValue = $match[0];
    $lineNumber = substr_count(mb_substr($data, 0, $match[1]), PHP_EOL) + 1;

    echo "`{$matchValue}` at line {$lineNumber}\n";
}

Output

`Abba` at line 1
`Beegees` at line 2
`Beatles` at line 3

(check your performance requirements)

Inhaler answered 12/1, 2017 at 15:55 Comment(0)
K
2

Using preg_match_all with the PREG_OFFSET_CAPTURE flag is necessary to solve this problem, the code comments should explain what kind of array preg_match_all returns and how the line numbers can be calculated:

// Given string to do a match with
$string = "\n\nabc\nwhatever\n\ndef";

// Match "abc" and "def" in a string
if(preg_match_all("#(abc).*(def)#si", $string, $matches, PREG_OFFSET_CAPTURE)) {
  // Now $matches[0][0][0] contains the complete matching string
  // $matches[1][0][0] contains the results for the first substring (abc)
  // $matches[2][0][0] contains the results for the second substring (def)
  // $matches[0][0][1] contains the string position of the complete matching string
  // $matches[1][0][1] contains the string position of the first substring (abc)
  // $matches[2][0][1] contains the string position of the second substring (def)

  // First (abc) match line number
  // Cut off the original string at the matching position, then count
  // number of line breaks (\n) for that subset of a string
  $line = substr_count(substr($string, 0, $matches[1][0][1]), "\n") + 1;
  echo $line . "\n";

  // Second (def) match line number
  // Cut off the original string at the matching position, then count
  // number of line breaks (\n) for that subset of a string
  $line = substr_count(substr($string, 0, $matches[2][0][1]), "\n") + 1;
  echo $line . "\n";
}

This will return 3 for the first substring and 6 for the second substring. You can change \n to \r\n or \r if you use different newlines.

Kolk answered 15/10, 2018 at 19:2 Comment(0)
I
1

You have got a couple options, but none are "simple":

a) exec() and use the system grep command, which can report line numbers:

exec("grep -n 'your pattern here' file.txt", $output);`

b) Slurp in the file using file_get_contents(), split it into an array of lines, then use preg_grep() to find the matching lines.

$dat = file_get_contents('file.txt');
$lines = explode($dat, "\n");
$matches = preg_grep('/your pattern here/', $lines);

c) Read the file in line-sized chunks, keep a running line count, and do your pattern match on each line.

$fh = fopen('file.txt', 'rb');
$line = 1;
while ($line = fgets($fh)) {
     if (preg_match('/your pattern here/', $line)) {
         ... whatever you need to do with matching lines ...
     }
     $line++;
}

Each has its ups and downs

a) You're invoking an external program, and if your pattern contains any user-supplied data, you're potentially opening yourself up to the shell equivalent of an SQL injection attack. On the plus side, you don't have to slurp in the entire file and will save a bit on memory overhead.

b) You're safe from shell injection attacks, but you have to slurp in the entire file. If your file is large, you'll probably exhaust available memory.

c) You're invoking a regex every line, which would have significant overhead if you're dealing with a large number of lines.

Impost answered 19/1, 2011 at 1:49 Comment(1)
I think you missed this part of my question: I could read the file as an array and perform the regex for each line, but the problem is that my regex matches results across carriage returns (new lines).Chumley
U
0

i think first of all, you need to read the $String into an array, each element stand for each line, and do look like this :

$List=file($String);
for($i=0;$i<count($List),$i++){
if(preg_match_all()){;//your work here
echo $i;//echo the line number where the preg_match_all() works
}
}
Urial answered 19/1, 2011 at 1:45 Comment(1)
I think you missed this part of my question: I could read the file as an array and perform the regex for each line, but the problem is that my regex matches results across carriage returns (new lines).Chumley
M
-1

You can use preg_match_all to find offsets of every linefeed and then compare them against the offsets you already have.

// read file to buffer
$data = file_get_contents($datafile);

// find all linefeeds in buffer    
$reg = preg_match_all("/\n/", $data, $lfall, PREG_OFFSET_CAPTURE );
$lfs = $lfall[0];

// create an array of every offset
$linenum = 1;
$offset = 0;    
foreach( $lfs as $lfrow )
{
    $lfoffset = intval( $lfrow[1] );
    for( ; $offset <= $lfoffset; $offset++ )
        $offsets[$offset] = $linenum;   // offset => linenum
    $linenum++;
}
Malicious answered 9/8, 2011 at 7:46 Comment(0)
L
-1

This works but performs a new preg_match_all on every line which could be quite expensive.

$file = file.txt;

$log = array();

$line = 0;

$pattern = '/\x20{2,}/';

if(is_readable($file)){

    $handle = fopen($file, 'rb');

    if ($handle) {

        while (($subject = fgets($handle)) !== false) {

            $line++;

            if(preg_match_all ( $pattern,  $subject, $matches)){

                $log[] = array(
                    'str' => $subject, 
                    'file' =>  realpath($file),
                    'line' => $line,
                    'matches' => $matches,
                );
            } 
        }
        if (!feof($handle)) {
            echo "Error: unexpected fgets() fail\n";
        }
        fclose($handle);
    }
}

Alternatively you could read the file once yo get the line numbers and then perform the preg_match_all on the entire file and catpure the match offsets.

$file = 'file.txt';
$length = 0;
$pattern = '/\x20{2,}/';
$lines = array(0);

if(is_readable($file)){

    $handle = fopen($file, 'rb');

    if ($handle) {

        $subject = "";

        while (($line = fgets($handle)) !== false) {

            $subject .= $line;
            $lines[] = strlen($subject);
        }
        if (!feof($handle)) {
            echo "Error: unexpected fgets() fail\n";
        }
        fclose($handle);

        if($subject && preg_match_all ( $pattern, $subject, $matches, PREG_OFFSET_CAPTURE)){

            reset($lines);

            foreach ($matches[0] as $key => $value) {

                while( list($line, $length) = each($lines)){ // continues where we left off

                    if($value[1] < $length){

                        echo "match is on line: " . $line;

                        break; //break out of while loop;
                    }
                }
            }
        }
    }
}}
Lolalolande answered 19/5, 2016 at 10:53 Comment(0)
F
-1
//Keep it simple, stupid

$allcodeline = explode(PHP_EOL, $content);

foreach ( $allcodeline as $line => $val ) :
    if ( preg_match("#SOMEREGEX#i",$val,$res) ) {
        echo $res[0] . '!' . $line . "\n";
    }
endforeach;
Fraudulent answered 23/12, 2016 at 17:18 Comment(1)
I think you missed this part of my question: I could read the file as an array and perform the regex for each line, but the problem is that my regex matches results across carriage returns (new lines).Chumley

© 2022 - 2024 — McMap. All rights reserved.