I'm parsing HTML
with BeautifulSoup. At the end, I would like to obtain the body
contents, but without the body
tags. But BeautifulSoup adds html
, head
, and body
tags. I this googlegrops discussion one possible solution is proposed:
>>> from bs4 import BeautifulSoup as Soup
>>> soup = Soup('<p>Some paragraph</p>')
>>> soup.body.hidden = True
>>> soup.body.prettify()
u' <p>\n Some paragraph\n </p>'
This solution is a hack. There should be a better and obvious way to do it.
str(soup.body)[6:-7]
orsoup.body.prettify()[6:-7]
– Redcap