Get content between two strings PHP
Asked Answered
M

6

34

Whats is the best way to obtain the content between two strings e.g.

ob_start();
include('externalfile.html'); ## see below
$out = ob_get_contents();
ob_end_clean();

preg_match('/{FINDME}(.|\n*)+{\/FINDME}/',$out,$matches);
$match = $matches[0];

echo $match;

## I have used .|\n* as it needs to check for new lines. Is this correct?

## externalfile.html

{FINDME}
Text Here
{/FINDME}

For some reason this appears to work on one place in my code and not another. Am I going about this in the right way? Or is there a better way?

Also is output buffer the way to do this or file_get_contents?

Thanks in advance!

Marciano answered 18/9, 2009 at 16:6 Comment(1)
If it works in some situations and not others, you should provide examples of when it works and when it does not.Cawthon
H
51
  • Use # instead of / so you dont have to escape them.

  • The modifier s allows . to match newlines.

  • { may be the start of a {n} or {n,m} quantifier. The closing } has no special meaning, but escaping it doesn't cause an error.

  • The basic

      preg_match('#\{FINDME}(.+)\{/FINDME}#s', $out, $matches);
    
  • The advanced for various tags etc (styling is not so nice by the javascript).

      $delimiter = '#';
      $startTag = '{FINDME}';
      $endTag = '{/FINDME}';
      $regex = $delimiter . preg_quote($startTag, $delimiter) 
                          . '(.*?)' 
                          . preg_quote($endTag, $delimiter) 
                          . $delimiter 
                          . 's';
      preg_match($regex,$out,$matches);
    

Put this code in a function

  • For any file which you do not want to execue any stray php code, you should use file_get_contents. include/require should not even be an option there.
Hallvard answered 18/9, 2009 at 16:11 Comment(3)
I bet {FINDME} is just for illustrationDonnelldonnelly
Doesn't work (no output), dunno why... my startTag: src=¦ my endTag: ¦Wish
This is a good solution. By adding the U (ungreedy) modifier (#sU) it's possible to use multiple instances of the same search tags.Hennessy
X
53

You may as well use substr and strpos for this.

$startsAt = strpos($out, "{FINDME}") + strlen("{FINDME}");
$endsAt = strpos($out, "{/FINDME}", $startsAt);
$result = substr($out, $startsAt, $endsAt - $startsAt);

You'll need to add error checking to handle the case where it doesn't FINDME.

Xylotomy answered 18/9, 2009 at 16:10 Comment(2)
This will only find one match.Tetzel
@Tetzel yes but you can write a wrapper function which performs this code in a while loop or recursively. This answer is a very good base.Cotinga
H
51
  • Use # instead of / so you dont have to escape them.

  • The modifier s allows . to match newlines.

  • { may be the start of a {n} or {n,m} quantifier. The closing } has no special meaning, but escaping it doesn't cause an error.

  • The basic

      preg_match('#\{FINDME}(.+)\{/FINDME}#s', $out, $matches);
    
  • The advanced for various tags etc (styling is not so nice by the javascript).

      $delimiter = '#';
      $startTag = '{FINDME}';
      $endTag = '{/FINDME}';
      $regex = $delimiter . preg_quote($startTag, $delimiter) 
                          . '(.*?)' 
                          . preg_quote($endTag, $delimiter) 
                          . $delimiter 
                          . 's';
      preg_match($regex,$out,$matches);
    

Put this code in a function

  • For any file which you do not want to execue any stray php code, you should use file_get_contents. include/require should not even be an option there.
Hallvard answered 18/9, 2009 at 16:11 Comment(3)
I bet {FINDME} is just for illustrationDonnelldonnelly
Doesn't work (no output), dunno why... my startTag: src=¦ my endTag: ¦Wish
This is a good solution. By adding the U (ungreedy) modifier (#sU) it's possible to use multiple instances of the same search tags.Hennessy
B
8

I like to avoid using regex if possible, here is alternative solution to fetch all strings between two strings and returns an array.

function getBetween($content, $start, $end) {
    $n = explode($start, $content);
    $result = Array();
    foreach ($n as $val) {
        $pos = strpos($val, $end);
        if ($pos !== false) {
            $result[] = substr($val, 0, $pos);
        }
    }
    return $result;
}
print_r(getBetween("The quick brown {{fox}} jumps over the lazy {{dog}}", "{{", "}}"));

Results :

Array
(
    [0] => fox
    [1] => dog
)
Bjorn answered 11/7, 2018 at 3:33 Comment(0)
E
5

I love these two solutions

function GetBetween($content,$start,$end)
{
    $r = explode($start, $content);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $r[0];
    }
    return '';
}


function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);   
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

I also made few benchmarks as well with both solutions above and both are giving almost the same time. You can test it as well. I gave both functions a file to read which had about 60000 characters (reviewed with Ms. Word's word count) and both functions resulted in about 0.000999 seconds to find.

$startTime = microtime(true);
GetBetween($str, '<start>', '<end>');
echo "Explodin Function took: ".(microtime(true) - $startTime) . " to finish<br />";

$startTime = microtime(true);
get_string_between($str, '<start>', '<end>');
echo "Subsring Function took: ".(microtime(true) - $startTime) . " to finish<br />";
Ecdysiast answered 15/2, 2014 at 22:30 Comment(1)
This is great. Can it be made to work to find multiple matches? So return an array with all the matches?Albinus
D
1

Line breaks can cause problems in RegEx, try removing or replacing them with \n before processing.

Donnelldonnelly answered 18/9, 2009 at 16:12 Comment(1)
Mutating a string so that a regex pattern will work generally means that the regex pattern is not well designed. Don't fault the string, fault the pattern.Riddle
A
0

This is a PHP solution that returns the strings found between tags in a haystack. It works, but I haven't tested for efficiency. I needed this and was inspired by Adam Wright's answer on this page.

Returns an array() containing all the strings found between $tag and $end_symbold.$tag in $haystack, or FALSE if no $end_symbol.$tag was found hence no tag pair exists in the $haystack.

function str_between_tags($haystack, $tag, $end_symbol){
    $c_end_tags = substr_count($haystack, $end_symbol.$tag);
    if(!$c_end_tags) return FALSE;

    for($i=0; $i<$c_end_tags; $i++){
        $p_s = strpos($haystack, $tag, (($p_e)?$p_e+strlen($end_symbol.$tag):NULL) ) + strlen($tag );
        $p_e = strpos($haystack, $end_symbol.$tag, $p_s);
        $result[] = substr($haystack, $p_s, $p_e - $p_s);
    }
    return $result;
}
Asp answered 18/12, 2016 at 16:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.