XPath select all text content for a <div> except for a specific tag <h5>
Asked Answered
Q

3

9

I searched and tried several solutions for this problem but none of them worked: I have this HTML

<div class="detalhes_colunadados">
   <div class="detalhescolunadados_blocos">
     <h5>Descrição completa</h5>
    Sala de estar/jantar,2 vagas de garagem cobertas.<br>
    </div>
    <div class="detalhescolunadados_blocos">
      <h5>Valores</h5>
            Venda: R$ 600.000,00<br>
          Condomínio: R$ 660,00<br>
    </div>
</div>

And wanna to extract by XPath only the text content in the first div class="detalhescolunadados_blocos" that are not h5 tags.

I tried: //div[@class='detalhescolunadados_blocos']/[1]/*[not(self::h5)]

Quevedo answered 27/2, 2013 at 21:28 Comment(1)
I'm not good at xPath, but I know that for extracting only the text you have to use the text() function...Brandy
M
12

Try the following XPath expression:

//div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]

This will return:

$ xmllint --html --shell so.html
/ > xpath //div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]    
Object is a Node Set :
Set contains 2 nodes:
1  TEXT
    content=      
2  TEXT
    content=     Sala de estar/jantar,2 vagas de gar...
Mandelbaum answered 27/2, 2013 at 22:1 Comment(3)
Why not using xmllint --html --xpath '//foo' file.html ? =)Hengist
Thanks for pointing me to the --xpath option. It's actually undocumented.Mandelbaum
Thanks a lot, i was forgetting that the text part is child of h5, i inclusive tried //text()[not(self::h5)].Quevedo
B
0

It seems to me that this works:

//div[@class="detalhescolunadados_blocos"]/text()
Brandy answered 27/2, 2013 at 21:59 Comment(0)
C
0

Try doing this :

//div[@class="detalhes_colunadados"]/div/text()
Costar answered 27/2, 2013 at 22:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.