PHP Parse HTML code [duplicate]
Asked Answered
A

1

49

Possible Duplicate:
Best methods to parse HTML

How can I parse HTML code held in a PHP variable if it something like:

<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG!

I want to only get the text that's between the headings and I understand that it's not a good idea to use Regular Expressions.

Ascogonium answered 2/9, 2010 at 13:22 Comment(1)
@everyone who closed this is duplicate. This is different because OP does not want text T1, T2, T3 but the text after one heading ends and before next heading begins. e.g. Lorem ipsum.. So, this is different. Please take a look.Beneficiary
B
113

Use PHP Document Object Model:

<?php
   $str = '<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG';
   $DOM = new DOMDocument;
   $DOM->loadHTML($str);

   //get all H1
   $items = $DOM->getElementsByTagName('h1');

   //display all H1 text
   for ($i = 0; $i < $items->length; $i++)
        echo $items->item($i)->nodeValue . "<br/>";
?>

This outputs as:

 T1
 T2
 T3

[EDIT]: After OP Clarification:

If you want the content like Lorem ipsum. etc, you can directly use this regex:

<?php
   $str = '<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG';
   echo preg_replace("#<h1.*?>.*?</h1>#", "", $str);
?>

this outputs:

Lorem ipsum.The quick red fox...... jumps over the lazy brown FROG

Beneficiary answered 2/9, 2010 at 13:30 Comment(3)
Thanks, but I need to get the text between <h1></h1> as in: "Lorem ipsum.", "The quick red fox..." etc. So not the text between H1 tags, but rather the text between an ending </h1> tag and a starting <h1>.Ascogonium
That's closer, thank you. I'll try to be more clear: I want to get the text between headings, count its length and decide if I want to hide part of it. You're answer is very helpful though. But what I want to do is keep all the text, just add a bit of html to hide part of it.Ascogonium
This is a great tip @shamittomar! Thanks for that! One suggestion, maybe a foreach instead of a for loop would be a smidgen cleaner, but this really helped me out.Obsessive

© 2022 - 2024 — McMap. All rights reserved.