How to manipulate a builtin's type hinting

Asked 13/7, 2018 at 21:0 Answered 15/7, 2018 at 18:14

I use ElementTree to parse/build a number of slightly complicated but well-defined xml files, and use mypy for static typing. I have .find statements strewn all over the place, which leads to things like this:

from xml.etree.ElementTree import Element
...
root.find('tag_a').append(Element('tag_b'))

# run mypy..
-> type None from Optional[Element] has no attribute append

This makes sense, since find could simply not find the tag I give it. But I know that it's there and don't want to add stuff like try..except or assert statements to essentially simply silence mypy without adding functionality while making the code less readable. I'd also like to avoid commenting # type: ignore everywhere.

I tried monkey patching Element.find.__annotations__, which would be a good solution in my opinion. But since it's a builtin I can't do that, and subclassing Element feels like too much again.

Is there a good way to solve this?

Those answered 13/7, 2018 at 21:0 Comment(2)

i guess there is no way to be sure that some tag is presented, you can write a function which calls .find method and checks if result is not None like assert result is not None, then specify its return type as Element, may work (less ugly than subclassing I guess) – Sneaker 13/7, 2018 at 21:22

@AzatIbrakov Ah, like def certain_find(elem, tag) which handles the typing stuff once? Feel free to post it as an answer, it sounds reasonable. – Those 13/7, 2018 at 21:27

We can write an utility function which internally handles None-found cases and raises an exception/returns some dummy value of given type:

from xml.etree.ElementTree import Element


def find(element: Element,
         tag: str) -> Element:
    result = element.find(tag)
    assert result is not None, ('No tag "{tag}" found '
                                'in element "{element}".'
                                .format(tag=tag,
                                        element=element))
    return result

advantage of assertions (compared to raising an exception manually) is that they can be disabled but if you are working with some provided-by-user data I recommend to raise an exception like

if result is None:
    raise LookupError('No tag "{tag}" found '
                      'in element "{element}".'
                      .format(tag=tag,
                              element=element))

Digression

I use type annotations since it helps to IDE and it also saves a lot of time while reading API, but I'm not a mypy user because I don't like an idea of checking everything like in this case: if a function user passes garbage then it is his fault, we should let him do this instead of writing something about "you have a union of types and not handling cases with some of them", EAFP after all.

Sneaker answered 14/7, 2018 at 5:41 Comment(0)

I think here, there are three different options you can take.

The first option is the approach suggested in Azat Ibrakov's answer: create a helper method that explicitly performs a 'None' check at runtime to satisfy mypy. This is the most typesafe option.
The second option is to configure mypy and loosen how it handles values of type 'None'. Currently, mypy will consider 'None' and 'Element' to be two distinct types: if you have a value that's 'None', it can't be an 'Element' and vice-versa. You can actually weaken this by giving mypy the --no-strict-optional flag, which will make mypy treat values of type 'None' as being a member of all types.

Or to put it another way, if you're familiar with languages like Java, it's legal to do things like this:
```
String myString = null;
```
Passing in the --no-strict-optional flag to mypy will make it start accepting code like the above.

This obviously means that your code will be less typesafe: mypy is no longer capable of detecting potential "null pointer exceptions". To help mitigate this, you can try disabling strict-optional locally, rather then globally, by creating a mypy config file.

In a nutshell, you'd create a config file that looks roughly like this:
```
[mypy]
# Global options can go here. We'll leave this empty since we don't
# want to change any of the defaults.

[mypy-mycodebase.my.xml.processing.module]
# We weaken mypy in *just* this module
strict_optional = False
```
The third option is to just stop using static typing for your XML parsing code altogether: cast your root variable to be of either type 'Any' or 'object' and go to town. Then, as you collect useful data from your XML, do any necessary runtime checks to validate your data and create (typesafe!) objects to store the relevant info. (You can continue using static typing on the rest of your code, of course).

The observation here is that any runtime input is going to be inherently dynamic: the user could always pass in malformed XML, the data could be structured incorrectly, etc... The only real way of checking these kinds of issues is using runtime checks: static type checking won't be of much help. So, if static type checking provides minimal value in a certain region of code, why continue using it there?

This tactic does have several downsides, of course. In particular, mypy won't be able to detect blatant misuses of the ElementTree API, you'lll need to be fairly diligent with your runtime checks to make sure bad data doesn't creep into the typechecked regions of your code, etc...

Aught answered 15/7, 2018 at 18:14 Comment(3)

Thanks for the in depth answer, I really appreciate that you added reasoning to each route. – Those 16/7, 2018 at 7:43

@Arne: fixed -- I meant to just end at "this is the most typesafe option" and defer to Azat's answer, but clearly I missed the trailing thought while proofreading... Eh, whatever. – Aught 16/7, 2018 at 7:56

I didn't think of option 3, that can be a good solution in some cases! – Leena 18/10, 2019 at 0:20

Mypy does not use __annotations__, that is a runtime construct. Mypy's analysis is completely static.

"builtin" types (aka types from the standard library) are sourced from typeshed. If you wish to modify these types for your own purposes, you can (though I would strongly discourage it as a solution to your problem). To use a custom typeshed with mypy, you can do mypy --custom-typeshed-dir=/path/to/my/typeshed ... and mypy will use your modified typeshed.

A more ergonomic solution would be to do as Azat suggests, and write a wrapper that moves type narrowing to a utility function, so that the local readability does not suffer and you maintain type safety.

Matherne answered 14/7, 2018 at 3:56 Comment(0)

Digression

Recommended topics

Hot tags