Getting DOM elements by classname
Asked Answered
R

7

146

I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?

Update: I ended up using Mechanize for PHP which was much easier to work with.

Ras answered 16/6, 2011 at 2:1 Comment(1)
Reletated: PHP dom to get tag class with multiple css class nameElectroplate
S
181

Update: Xpath version of *[@class~='my-class'] css selector

So after my comment below in response to hakre's comment, I got curious and looked into the code behind Zend_Dom_Query. It looks like the above selector is compiled to the following xpath (untested):

[contains(concat(' ', normalize-space(@class), ' '), ' my-class ')]

So the PHP would be:

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

Basically, all we do here is normalize the class attribute so that even a single class is bounded by spaces, and the complete class list is bounded in spaces. Then append the class we are searching for with a space. This way we are effectively looking for and find only instances of my-class .


Use an xpath selector?

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(@class, '$classname')]");

If it is only ever one type of element you can replace the * with the particular tagname.

If you need to do a lot of this with very complex selector I would recommend Zend_Dom_Query which supports CSS selector syntax (a la jQuery):

$finder = new Zend_Dom_Query($html);
$classname = 'my-class';
$nodes = $finder->query("*[class~=\"$classname\"]");
Squashy answered 16/6, 2011 at 2:7 Comment(9)
finds the class my-class2 as well, but pretty sweet. Any way to only pick the first of all elements?Electroplate
I dont think you can without xpath2... However the example for Zend_Dom_Query does exactly that. IF you dont want to use that compkenet in your project then you might want to see how they are translating that css selector to xpath. Maybe DomXPath supports xpath 2.0 - im not sure about that.Squashy
@prodigitalson: Thanks much for your answer. I went and taught myself XPath and had a question.. Why do you use use contains, rather than simply doing [@class="$classname"]?Ras
because class can have more than one class for example: <a class="my-link link-button nav-item">.Squashy
@prodigitalson: This is incorrect as it does not reflect the spaces, try //*[contains(concat(' ', normalize-space(@class), ' '), ' classname ')] (Very informative: CSS Selectors And XPath Expressions).Electroplate
@hakare: am i mistaken or is the only differene the leading space? because technically that wouldnt matter anything ` classname ` matches will alos be matched by classname . GOOD LINK. Whis i had found that instead o reading the code in Zend_Dom_Query... would have been faster, haha.Squashy
so..contains would still be the way to go?Ras
@babonk: yes, you need to use contains in combination with concat... we are jsut discussing the particulars of padding the spaces on both sides of the class youre searching for or only padding one side. Either should work though.Squashy
@babonk: also make sure you take a look at the link hakre posted in his comment to my answer. There is a wealth of good info there dealing with xpath in comparison to css selectors.Squashy
P
27

If you wish to get the innerhtml of the class without the zend you could use this:

$dom = new DomDocument();
$dom->load($filePath);
$classname = 'main-article';
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$tmp_dom = new DOMDocument(); 
foreach ($nodes as $node) 
    {
    $tmp_dom->appendChild($tmp_dom->importNode($node,true));
    }
$innerHTML.=trim($tmp_dom->saveHTML()); 
echo $innerHTML;
Progressist answered 1/11, 2012 at 14:47 Comment(0)
C
19

I think the accepted way is better, but I guess this might work as well

function getElementByClass(&$parentNode, $tagName, $className, $offset = 0) {
    $response = false;

    $childNodeList = $parentNode->getElementsByTagName($tagName);
    $tagCount = 0;
    for ($i = 0; $i < $childNodeList->length; $i++) {
        $temp = $childNodeList->item($i);
        if (stripos($temp->getAttribute('class'), $className) !== false) {
            if ($tagCount == $offset) {
                $response = $temp;
                break;
            }

            $tagCount++;
        }

    }

    return $response;
}
Comnenus answered 3/11, 2014 at 4:53 Comment(3)
Where is the example for this? It would've been nice.Rather
That's great. I got the element with the class. Now I want to edit content of the element, like append child to the element containing the class. How to append the child and recreate whole HTML? Please help. This is what I have done. $classResult = getElementByClass($dom, 'div', 'm-signature-pad'); $classResult->nodeValue = ''; $enode = $dom->createElement('img'); $enode->setAttribute('src', $signatureImage); $classResult->appendChild($enode);Commendation
for dom modification by php I think its better to use phpquery github.com/punkave/phpQueryComnenus
S
14

There is also another approach without the use of DomXPath or Zend_Dom_Query.

Based on dav's original function, I wrote the following function that returns all the children of the parent node whose tag and class match the parameters.

function getElementsByClass(&$parentNode, $tagName, $className) {
    $nodes=array();

    $childNodeList = $parentNode->getElementsByTagName($tagName);
    for ($i = 0; $i < $childNodeList->length; $i++) {
        $temp = $childNodeList->item($i);
        if (stripos($temp->getAttribute('class'), $className) !== false) {
            $nodes[]=$temp;
        }
    }

    return $nodes;
}

suppose you have a variable $html the following HTML:

<html>
 <body>
  <div id="content_node">
    <p class="a">I am in the content node.</p>
    <p class="a">I am in the content node.</p>
    <p class="a">I am in the content node.</p>    
  </div>
  <div id="footer_node">
    <p class="a">I am in the footer node.</p>
  </div>
 </body>
</html>

use of getElementsByClass is as simple as:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");

$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');//will contain the three nodes under "content_node".
Saharan answered 24/7, 2015 at 17:54 Comment(0)
S
13

DOMDocument is slow to type and phpQuery has bad memory leak issues. I ended up using:

https://github.com/wasinger/htmlpagedom

To select a class:

include 'includes/simple_html_dom.php';

$doc = str_get_html($html);
$href = $doc->find('.lastPage')[0]->href;

I hope this helps someone else as well

Sough answered 5/7, 2016 at 23:7 Comment(1)
So simple, so beautiful! Usability at it's very finest, compared to PHP's native DOM handling! Please upvote, this is the most useful answer.Shumaker
S
5

PHP's native DOM handling is so absurdly bad, do yourself a favour and use this or any other modern HTML parsing package which can handle this within in few lines:

Install paquettg/php-html-parser with

composer require paquettg/php-html-parser

Then create a .php file in the same folder with this content

<?php

// load dependencies via Composer
require __DIR__ . '/vendor/autoload.php';

use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadFromUrl("https://example.com");
$links = $dom->find('.classname a');

foreach ($links as $link) {
    echo $link->getAttribute('href');
}

P.S. You'll find information on how to install Composer on Composer's homepage.

Shumaker answered 30/4, 2021 at 12:21 Comment(0)
H
4

I prefer using Symfony for this. Their libraries are pretty nice.

Use the The DomCrawler Component

Example:

$browser = new HttpBrowser(HttpClient::create());
$crawler = $browser->request('GET', 'example.com');
$class = $crawler->filter('.class')->first();
Horribly answered 6/9, 2020 at 8:7 Comment(1)
Quite a lot of power between those BrowserKit and DomCrawler components!Shivers

© 2022 - 2024 — McMap. All rights reserved.