How to specify the type of pandas series elements in type hints?
Asked Answered
N

7

38

My function returns a pandas series, where all elements have a specific type (say str). The following MWE should give an impression:

import pandas as pd 
def f() -> pd.Series:
    return pd.Series(['a', 'b']) 

Within the type hints I want to make clear, that f()[0] will always be of type str (compared for example to a function that would returnpd.Series([0, 1])). I did this:

def f() -> pd.Series[str]:

But

TypeError: 'type' object is not subscriptable

So, how to specify the type of pandas series elements in type hints?. Any ideas?

Ninepins answered 9/9, 2019 at 13:22 Comment(7)
pd.Series(dtype=str) allows you to specify the data type of a series' elements. My guess is that this also works for type hints.Fathomless
pd.Series(dtype=str) does not work for type hints.Aeonian
Is there an "str" type in pandas ? Not sure, according to pbpython.com/pandas_dtypes.html (but maybe deprecated ?)Bibliophile
@ItamarMushkin: just out of couriosity, why do you think pd.Series(dtype=str) does not work for type hints? My 3.7 interpretor at least accepts it syntactically.Aqaba
@Aqaba -- it's not a valid PEP 484 type. So while there's nothing stopping you from writing such a type hint, it would end up causing any tooling designed to analyze PEP 484 type hints to choke. (Static type checkers, linters, autocompletion tools...). Losing access to those tools would greatly diminish the usefulness of type hints to the point where you're probably better off not using them at all.Floccus
@Michael0x2a: ok I see. Thank you for the explanation.Aqaba
Also, it didn't run for me on 3.6.1 (Jupyter notebook if that matters)Aeonian
P
14

you can use pandera for type-hinting and validating dataframes and series: https://pandera.readthedocs.io/en/stable/schema_models.html#schema-models

so in this case:

from pandera.typing import Series
import pandas as pd 

def f() -> Series[str]:
    return pd.Series(['a', 'b']) 
Pellikka answered 13/9, 2022 at 8:22 Comment(3)
mypy still complains: error: Incompatible return value type (got "pandas.core.series.Series[str]", expected "pandera.typing.pandas.Series[str]")Quetzalcoatl
You probably need to enable the pandera.mypy plugin. See herePellikka
I did enable the plugin. Maybe it's because I also use pydantic.mypy. With pydantic, pandera and mypy somehow two are ok and the third one is complaining, no matter how you configure it. MyPy complains about valid slices, thinks selecting a MultiIndex df results in a scalar etc. The closest I got was to type functions with comments # type: (pd.Series[float], pd.Series[float]) -> pd.DataFrame but that did not really validate the type. And I ended up with hundreds of # type: ignore. I gave up on mypy with pandas.Quetzalcoatl
E
9

For python 3.8 try:

def f() -> "pd.Series[str]":
    pass

or:

f_return_type = "pd.Series[str]"
def f() -> f_return_type:
    pass

or # type: pd.Series[str] for variables

Epidemiology answered 7/6, 2021 at 8:54 Comment(3)
This works, but where can I find this type of annotations (inside stringns) in the python docs? I didn't find them in pep. and in the docs of typing there is only two example with no info about them: docs.python.org/3/library/typing.htmlEducatory
This does NOT work. def f() -> "pd.Series[float]": return pd.Series(["a", "b"], dtype=str) is happily validated as correctly typed.Quetzalcoatl
@Quetzalcoatl I'm curious what you mean by "validated as correctly typed" since the type hinting in python doesn't validate anything. Similar to typescript it's merely a static type checking solution. In fact the Pylance Language Server in VsCode does notify me that there is a type mismatch (at least in the current version 1.91.1) Edit: It seems I've partially misread your comment. The first solution where only the return type is declared via a string, does not work. Using a type alias on the other hand is working just fine with the LSPBeatification
S
2

I want to make clear that f()[0] will always be of type str (compared to a function that would return pd.Series([0, 1]))

This may be a great use-case to annotate the type based on how the value will be used, instead of what it is. ("has method" vs "is type" annotation).

In this case, the slice behavior is covered by the Sequence type.

from typing import Sequence
import pandas as pd

def returns_str_sequence() -> Sequence[str]:
    return pd.Series(['a', 'b'])

def uses_str_sequence(data: Sequence[str]):
    for _ in data:
        pass  # Iterable behavior also covered
    return data[0]  # slice works via __getitem__

For a fuller list of possible types you can use, feel free to review this document for the collections.abc module.

This may have a side benefit of de-coupling your code from 3rd party code/types as well, as your functions will be defined to handle more abstract types.

Sloshy answered 23/6, 2023 at 15:49 Comment(0)
P
1

Your example doesn't work currently and will not until PEP 563 is finalized.

To get it to work

from __future__ import annotations

The PEPs around typing continue to evolve, as does Python... See this for more details.

Praline answered 2/9, 2023 at 7:6 Comment(0)
B
0

Unfortunately Python's type hinting does not support this out of the shelf. Nonetheless, you can always make use of dataenforce library (link) to add hints or even enforce validation.

Bucharest answered 8/1, 2020 at 10:24 Comment(1)
Can you provide how this actually would be done for ´pd.Series´? Is this done using DatasetMeta (github.com/CedricFR/dataenforce/blob/master/dataenforce/…)? If yes, how?Ninepins
P
-1

You can utilize typing.TypeVar to accomplish this:

from typing import (
    TypeVar
)

SeriesString = TypeVar('pandas.core.series.Series(str)')
def f() -> SeriesString:
Prawn answered 11/5, 2021 at 20:15 Comment(5)
Isn't the string there just a name (conventionally the same as the variable it's assigned to)?Bother
This is absolutely not a valid type annotation, nor is it the correct use of TypeVar.Pelting
@Bother not just conventionally, it has to be for it to follow PEP 484Pelting
@Pelting insofar as that says "The argument to TypeVar() must be a string equal to the variable name to which it is assigned", yes. But I don't think anything actually breaks if it's not.Bother
@Bother it breaks static type checkersPelting
H
-4

You can specify using dtype parameter

import pandas as pd
data = pd.Series(['a', 'b'], dtype='str') 

for more information click here

Hurdygurdy answered 18/1, 2021 at 9:39 Comment(2)
HI Mohan, thank you for your answer. Unfortunately, the solution does not work, since data is an unresolved reference, or am I missing something?Ninepins
Hi @Ninepins the Data is the data you want add it to the series.. for your problem below code will work. The data type will show as object. str data type is also object type import pandas as pd pd.Series(['a', 'b'], dtype='str')Hurdygurdy

© 2022 - 2025 — McMap. All rights reserved.