Creating an XML document with BeautifulSoup
Asked Answered
C

2

7

In all the examples and tutorials I have seen of BeautifulSoup, an HTML/XML document is passed and a soup object is returned which can then be used to modify the document. However, how can I use BeautifulSoup to create a HTML/XML document from scratch? In other words, I have data that I would like to put in an XML file, but the XML file does not exist yet and I would like to build it from scratch. How can I go about it?

Calculation answered 30/4, 2013 at 17:18 Comment(0)
W
14

Just create an empty BeautifulSoup() object:

soup = BeautifulSoup()

and start adding elements:

soup.append(soup.new_tag("a", href="http://www.example.com"))

For XML you could start out with a XML header by using the xml tree builder:

soup = BeautifulSoup(features='xml')

This requires lxml to be installed first. This sets the .is_xml flag on the BeautifulSoup object (which can also be set manually).

Whiteman answered 30/4, 2013 at 17:20 Comment(1)
What about for XML documents? Will the same code work or will I need to specify different parser when creating BeautifulSoup object?Calculation
C
0

Here is my use case (creating a htmx response, runnable in python>=3.12):

from dataclasses import dataclass, field
from typing import LiteralString, TypedDict, Unpack, assert_never

from bs4 import BeautifulSoup


@dataclass(frozen=True, kw_only=True)
class TagModel[T: LiteralString]:
    name: T
    children: tuple["TagModel | str", ...] = ()
    attributes: dict[str, str] = field(default_factory=dict)

    def to_html(self, soup: BeautifulSoup | None = None):
        if soup is None:
            soup = BeautifulSoup()
        tag = soup.new_tag(self.name)
        for child in self.children:
            match child:
                case TagModel():
                    tag.append(child.to_html(soup))
                case str():
                    tag.append(child)
                case _:
                    assert_never(child)
        for key, value in self.attributes.items():
            tag[key] = value
        return tag


class HTMXAttribute(TypedDict, total=False):
    id: str
    hx_get: str
    hx_post: str
    hx_select: str
    hx_select_oob: str
    hx_swap: str
    hx_swap_oob: str
    hx_target: str
    hx_trigger: str
    hx_vals: str


@dataclass(frozen=True)
class HTMXTagBuilder[T: LiteralString]:
    name: T

    def __call__(self, *children: TagModel | str, **attrs: Unpack["HTMXAttribute"]):
        hyphen_attrs = {k.replace("_", "-"): f"{v}" for k, v in attrs.items()}
        return TagModel(
            name=self.name,
            children=children,
            attributes=hyphen_attrs,
        )


Body = HTMXTagBuilder("body")
Div = HTMXTagBuilder("div")
P = HTMXTagBuilder("p")


def main():
    print(
        Body(
            Div("I'm a div!", id="div0", hx_get="/div0"),
            "Hello World!",
            P("Hello World!"),
        )
        .to_html()
        .prettify()
    )


if __name__ == "__main__":
    main()

Cedell answered 24/9, 2024 at 9:39 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.