In all the examples and tutorials I have seen of BeautifulSoup, an HTML/XML document is passed and a soup object is returned which can then be used to modify the document. However, how can I use BeautifulSoup to create a HTML/XML document from scratch? In other words, I have data that I would like to put in an XML file, but the XML file does not exist yet and I would like to build it from scratch. How can I go about it?
Creating an XML document with BeautifulSoup
Asked Answered
Just create an empty BeautifulSoup()
object:
soup = BeautifulSoup()
and start adding elements:
soup.append(soup.new_tag("a", href="http://www.example.com"))
For XML you could start out with a XML header by using the xml
tree builder:
soup = BeautifulSoup(features='xml')
This requires lxml to be installed first. This sets the .is_xml
flag on the BeautifulSoup
object (which can also be set manually).
What about for XML documents? Will the same code work or will I need to specify different parser when creating BeautifulSoup object? –
Calculation
Here is my use case (creating a htmx response, runnable in python>=3.12
):
from dataclasses import dataclass, field
from typing import LiteralString, TypedDict, Unpack, assert_never
from bs4 import BeautifulSoup
@dataclass(frozen=True, kw_only=True)
class TagModel[T: LiteralString]:
name: T
children: tuple["TagModel | str", ...] = ()
attributes: dict[str, str] = field(default_factory=dict)
def to_html(self, soup: BeautifulSoup | None = None):
if soup is None:
soup = BeautifulSoup()
tag = soup.new_tag(self.name)
for child in self.children:
match child:
case TagModel():
tag.append(child.to_html(soup))
case str():
tag.append(child)
case _:
assert_never(child)
for key, value in self.attributes.items():
tag[key] = value
return tag
class HTMXAttribute(TypedDict, total=False):
id: str
hx_get: str
hx_post: str
hx_select: str
hx_select_oob: str
hx_swap: str
hx_swap_oob: str
hx_target: str
hx_trigger: str
hx_vals: str
@dataclass(frozen=True)
class HTMXTagBuilder[T: LiteralString]:
name: T
def __call__(self, *children: TagModel | str, **attrs: Unpack["HTMXAttribute"]):
hyphen_attrs = {k.replace("_", "-"): f"{v}" for k, v in attrs.items()}
return TagModel(
name=self.name,
children=children,
attributes=hyphen_attrs,
)
Body = HTMXTagBuilder("body")
Div = HTMXTagBuilder("div")
P = HTMXTagBuilder("p")
def main():
print(
Body(
Div("I'm a div!", id="div0", hx_get="/div0"),
"Hello World!",
P("Hello World!"),
)
.to_html()
.prettify()
)
if __name__ == "__main__":
main()
© 2022 - 2025 — McMap. All rights reserved.