beautifulsoup: find the n-th element's sibling
Asked Answered
M

2

8

I have a complex html DOM tree of the following nature:

<table>
    ...
    <tr>
        <td>
            ...
        </td>
        <td>
            <table>
                <tr>
                    <td>
                        <!-- inner most table -->
                        <table>
                            ...
                        </table>

                        <h2>This is hell!</h2>
                    <td>
                </tr>
            </table>
        </td>
    </tr>
</table>

I have some logic to find out the inner most table. But after having found it, I need to get the next sibling element (h2). Is there anyway you can do this?

Magical answered 10/4, 2010 at 13:25 Comment(0)
C
10

If tag is the innermost table, then

tag.findNextSibling('h2')

will be

<h2>This is hell!</h2>

To literally get the next sibling, you could use tag.nextSibling, which in this case, is u'\n'.

If you want the next sibling that is not a NavigableString (such as u'\n'), then you could use

tag.findNextSibling(text=None)

If you want the second sibling (no matter what it is), you could use

tag.nextSibling.nextSibling

(but note that if tag does not have a next sibling, then tag.nextSibling will be None, and tag.nextSibling.nextSibling will raise an AttributeError.)

Copula answered 10/4, 2010 at 13:44 Comment(3)
didnt mean to say find 'h2'...it could be anything. How to get whtever is next...?Magical
tag.findNextSibling(text!=u'\n') is not valid Python. You might have meant tag.findNextSibling(text=lambda x: not x.isspace()).Triangular
@Max: Thanks for pointing out my error. not x.isspace() unfortunately doesn't work because the text keyword argument only applies to NavigableStrings, which the <h2>...</h2> tag is not. So, I edited my answer to suggest text=None which skips all NavigableStrings.Copula
P
1

Every tag object has a nextSibling attribute that's exactly what you're looking for -- the next sibling (or None for a tag that's the last child of its parent tag, of course).

Pteridophyte answered 10/4, 2010 at 16:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.