XPath select all text content for a <div> except for a specific tag <h5>

Asked 27/2, 2013 at 21:28 Answered 27/2, 2013 at 22:1

I searched and tried several solutions for this problem but none of them worked: I have this HTML

<div class="detalhes_colunadados">
   <div class="detalhescolunadados_blocos">
     <h5>Descrição completa</h5>
    Sala de estar/jantar,2 vagas de garagem cobertas.<br>
    </div>
    <div class="detalhescolunadados_blocos">
      <h5>Valores</h5>
            Venda: R$ 600.000,00<br>
          Condomínio: R$ 660,00<br>
    </div>
</div>

And wanna to extract by XPath only the text content in the first div class="detalhescolunadados_blocos" that are not h5 tags.

I tried: //div[@class='detalhescolunadados_blocos']/[1]/*[not(self::h5)]

Quevedo answered 27/2, 2013 at 21:28 Comment(1)

I'm not good at xPath, but I know that for extracting only the text you have to use the text() function... – Brandy 27/2, 2013 at 21:49

Try the following XPath expression:

//div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]

This will return:

$ xmllint --html --shell so.html
/ > xpath //div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]    
Object is a Node Set :
Set contains 2 nodes:
1  TEXT
    content=      
2  TEXT
    content=     Sala de estar/jantar,2 vagas de gar...

Mandelbaum answered 27/2, 2013 at 22:1 Comment(3)

Why not using xmllint --html --xpath '//foo' file.html ? =) – Hengist 27/2, 2013 at 22:3

Thanks for pointing me to the --xpath option. It's actually undocumented. – Mandelbaum 27/2, 2013 at 22:14

Thanks a lot, i was forgetting that the text part is child of h5, i inclusive tried //text()[not(self::h5)]. – Quevedo 28/2, 2013 at 2:7

It seems to me that this works:

//div[@class="detalhescolunadados_blocos"]/text()

Brandy answered 27/2, 2013 at 21:59 Comment(0)

Try doing this :

//div[@class="detalhes_colunadados"]/div/text()

Costar answered 27/2, 2013 at 22:1 Comment(0)

Recommended topics

Hot tags