Recursive data types in Python
Asked Answered
P

3

6

What might be the thing in Python that is closest to the to the recursive data types in Haskell? (i.e. using the type's own definition while defining itself.)

Edit:

To give a more concrete definition of a recursive type, below is a binary tree in Haskell:

data Tree a             = Leaf a | Branch (Tree a) (Tree a) 

How I read this is like the following: A binary tree can either be a leaf, or can contain two sub-trees which are again the type tree itself.

For further information about recursive types in Haskell, you can refer to here: https://www.haskell.org/tutorial/goodies.html

What I actually had in mind was converting a word tree definition in Haskell to Python. This is the definition of the WordTree from an old project of mine:

data WordTree = Word String | Subword String [WordTree] | Root [WordTree]

A WordTree is a n-arytree structure where common prefixes of words are stored at the parents and remaining parts are stored at the leaf of the trees in a sorted manner. I believe this type definition is somewhat similar to a Trie. Yet, as Haskell is a functional programming language, it allows this type definition to be recursive. What might be the closest thing in Python (or maybe, in object-oriented programming, in general) for this kind of a definition of a type?

Phane answered 1/9, 2020 at 20:14 Comment(10)
perhaps a class where the init has self.self = self :)Repulse
Can you perhaps give an example of such a thing in Haskell and maybe someone can show you how it's done in pythonPolloch
Could you describe a use case?Kropp
@DerekEden That's a recursive value, not a recursive type.Amoy
@Phane adding example of Haskell in your question section might help others to find how it can work in Python.Parkins
Thank you @Parkins I tried to clarify my question above.Draftsman
Also, I agree with @chepner. That seems more like a recursive value.Draftsman
There isn't really anything terribly similar in Python, given the wildly different type system used. One could probably implement Haskell-style algebraic types with a suitable metaclass, but I don't think it's trivial.Amoy
Does this answer your question? Defining a recursive type hint in Python?Clavicorn
Perhaps related: #16259053Clavicorn
C
6

Since Python is dynamically typed, there is no issue defining whatever classes you need.

class Tree:
    left = None
    right = None
    def __init__(self, left, right):
        self.left = left
        self.right = right

Even if you are interested in typing these definitions, you can do that like in any other class-based object oriented language:

from typing import Union

class Tree:
    left: Union['Tree', int]
    right: Union['Tree', int]
    def __init__(self, left: Union['Tree', int], right: Union['Tree', int]) -> None:
        self.left = left
        self.right = right

Note the use of strings for the name of the type (which you can avoid in more recent Python versions).

See this open issue in mypy for direct recursive algebraic types such as

Tree = Union[Tuple['Tree', 'Tree'], int]

The most common (though not necessarily recommended) way of defining the WordTree you describe is using a superclass and a shallow hierarchy:

from typing import List, final

class WordTree: pass

@final
class Word(WordTree):
    word: str

@final
class Subword(WordTree):
    subword: str
    children: List[WordTree]

@final
class Root(WordTree):
    children: List[WordTree]

Using such an implementation might require using isinstance checks (though Python3.10 gives you nice sugar for those). Constructors are omitted in this example to avoid clutter; you might want to use dataclass to get them, and other kinds of behavior, easily.

To date, Python gives you no way to disallow unrelated classes from inheriting from WordTree, thus breaking some of the ability to statically reason about such programs.

Some other OOP languages, such as Scala and Kotlin and (soon) Java, can take such a definition (using sealed classes) and give you type checks and syntactic constructs that are similar to the ones given by functional languages such as Haskell.


For all I know, this kind of design is usually recommended only for pure-data classes, such as ASTs. It is less suited for defining user-facing container such as trie, since it exposes the inner workings of the data structure. So even if you go with that design, you might want to use it as an implementation detail, and use another class, Trie, to be used by client code through a well-defined API. That class can have a WordTree field, or any other way of implementing the same logic.

IMO this is essential to how object-oriented design differs from functional design. The latter focuses on data flow and on static reasoning, whereas the former focuses on APIs, extensibility and decoupling. I think this is helpful to note, when porting between languages and environments - though as noted above, some languages try to enable both design approaches.

Clavicorn answered 1/9, 2020 at 20:19 Comment(0)
H
4

Here's an equivalent implementation of the Haskell binary tree in Python 3.10. Static type checking can be done with mypy.

from __future__ import annotations
from dataclasses import dataclass
from typing import Generic, TypeVar

T = TypeVar("T")

@dataclass
class Branch(Generic[T]):
    value: T
    left: Tree[T]
    right: Tree[T]

@dataclass
class Leaf(Generic[T]):
    value: T

Tree = Branch[T] | Leaf [T]

You can use it like this (note the pattern-matching in the contains function - a new feature of Python 3.10):

def contains(tree: Tree[T], value: T):
    match tree:
        case Leaf(x):
            return x == value
        case Branch(x, left, right):
            return x == value or contains(left, value) or contains(right, value)

tree = Branch(
    1,
    Branch(2, Leaf(3), Leaf(4)),
    Branch(5, Leaf(6), Branch(4, Leaf(7), Leaf(8)))
)

assert contains(tree, 1)
assert contains(tree, 5)
assert contains(tree, 8)

To implement your WordTree, you would do the following:

from __future__ import annotations
from dataclasses import dataclass

@dataclass
class Word:
    value: str

@dataclass
class Subword:
    value: str
    trees: list[WordTree]

@dataclass
class Root:
    trees: list[WordTree]

WordTree = Word | Subword | Root

A note on the imports:

  • from __future__ import annotations allows you to annotate with the name of a type that hasn't been defined yet.
  • @dataclass automatically defines a constructor for you.
Harragan answered 19/3, 2022 at 13:15 Comment(0)
L
0

I came across this post looking to implement the following:

... a nested list of integers nestedList. Each element is either an integer or a list whose elements may also be integers or other lists.

And my implementation is as follows.

class NestedInteger(ABC):
    @staticmethod
    # "List" is invariant. https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    def create(data: Sequence[int | list[int]]) -> list[NestedInteger]:
        root: list[NestedInteger] = []
        for x in data:
            match x:
                case int():
                    root.append(NestedInt(x))
                case _:
                    root.append(NestedList(NestedInteger.create(x)))
        return root

    @abc.abstractmethod
    def is_integer(self) -> bool:
        pass

    @abc.abstractmethod
    def get_integer(self) -> int | None:
        pass

    @abc.abstractmethod
    def get_list(self) -> list[NestedInteger] | None:
        pass


@final
@dataclasses.dataclass
class NestedInt(NestedInteger):
    val: int

    def is_integer(self) -> bool:
        return True

    def get_integer(self) -> int | None:
        return self.val

    def get_list(self) -> list[NestedInteger] | None:
        return None


@final
@dataclasses.dataclass
class NestedList(NestedInteger):
    values: list[NestedInteger]

    def is_integer(self) -> bool:
        return False

    def get_integer(self) -> int | None:
        return None

    def get_list(self) -> list[NestedInteger] | None:
        return self.values
    

Given [1, [4, [6]]], the above code creates:

{list: 2}
├── NestedInt(val=1)
└── NestedList
    ├── NestedInt(val=4)
    └── NestedList
        └── NestedInt(val=6)
Lydalyddite answered 20/8, 2023 at 22:59 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.