I am trying to use pandera library (I am very new with this) for pandas dataframe validation. What I want to do is to ignore the rows which are not valid as per the schema. How can I do that?
for example: pandera schema looks like below:
import pandera as pa
import pandas as pd
schema: pa.DataFrameSchema = pa.DataFrameSchema(columns={
'Col1': pa.Column(str),
'Col2': pa.Column(float, checks=pa.Check(lambda x: (0 <= x <= 1)), nullable=True),
})
df: pd.DataFrame = pd.DataFrame({
"Col1": ["1", "2", "3", nan],
"Col2": [0.3, 0.4, 5, 0.2],
})
What I want to do is when I apply validation on the df
I get a result:
Col1 Col2
0 1 0.3
1 2 0.4
The other rows with error dropped.