how to check if a value exists in a dataframe

About

Asked 17/11, 2017 at 10:11 Answered 30/7, 2020 at 15:39

Solved python pandas dataframe data-analysis

hi I am trying to get the column name of a dataframe which contains a specific word,

eg: i have a dataframe,

NA              good    employee
Not available   best    employer
not required    well    manager
not eligible    super   reportee

my_word=["well"]

how to check if "well" exists in a df and the column name which has "well"

thanks in Advance!

Claypool answered 17/11, 2017 at 10:11 Comment(4)

there is only one word? – Borne 17/11, 2017 at 10:12

no,sometimes more than one – Claypool 17/11, 2017 at 10:13

And it should be substring? – Borne 17/11, 2017 at 10:15

yes, list of strings – Claypool 17/11, 2017 at 10:17

Use DataFrame.isin for check all columns and DataFrame.any for check at least one True per row:

m = df.isin(my_word).any()
print (m)
0    False
1     True
2    False
dtype: bool

And then get columns names by filtering:

cols = m.index[m].tolist()
print(cols)
[1]

Data:

print (df)
               0      1         2
0            NaN   good  employee
1  Not available   best  employer
2   not required   well   manager
3   not eligible  super  reportee

Detail:

print (df.isin(my_word))
       0      1      2
0  False  False  False
1  False  False  False
2  False   True  False
3  False  False  False

print (df.isin(my_word).any())
0    False
1     True
2    False
dtype: bool

EDIT After converting get nested lists, so flattening is necessary:

my_word=["well","manager"]

m = df.isin(my_word).any()
print (m)
0    False
1     True
2     True
dtype: bool

nested = df.loc[:,m].values.tolist()
flat_list = [item for sublist in nested for item in sublist]
print (flat_list)
['good', 'employee', 'best', 'employer', 'well', 'manager', 'super', 'reportee']

Borne answered 17/11, 2017 at 10:14 Comment(19)

ok,how to add those column values to my_word, do we need to append one by one or any pandas way to add – Claypool 17/11, 2017 at 10:20

If both are lists, simply use + like my_word += cols or my_word = my_word + cols – Borne 17/11, 2017 at 10:22

no, not only column names, I want to add all the values of the datacolumns in cols – Claypool 17/11, 2017 at 10:24

I can do using a for loop, for item in cols: my_word+=df[item].values.tolist() but it would be better if we have pandas way – Claypool 17/11, 2017 at 10:28

Yes, I add answer ;) – Borne 17/11, 2017 at 10:28

Thanks for the solution :) – Claypool 17/11, 2017 at 10:30

sometimes my_word contains column name itself, in that case your solution is not working, if I read that df with index false then it will work, – Claypool 17/11, 2017 at 10:37

how to consider column names also as a column values, I mean i have to read my df file without column heading – Claypool 17/11, 2017 at 10:42

do you think parameter header=None? – Borne 17/11, 2017 at 10:44

yes, when we are doing df.isin(my_word).any() it should consider column names also, so I want column names =1,2,3,... – Claypool 17/11, 2017 at 10:48

no 0,1,2 but 1,2,3 ? Then use df.columns += 1 – Borne 17/11, 2017 at 10:52

ya sorry it is 0,1,2,.. how to make it – Claypool 17/11, 2017 at 10:53

So if use df = pd.read_csv(file, header=None) get it. But if there are some columns names, use df = pd.read_csv(file, header=None, skiprows=1) - it omit first header row. – Borne 17/11, 2017 at 10:56

I dont want to skip rows, df = pd.read_csv(file, header=None) works for me but when it is reading "NA" it reads as NaN, how to overcome – Claypool 17/11, 2017 at 10:59

Use keep_default_na=False – Borne 17/11, 2017 at 11:1

No it is for all data in df. – Borne 17/11, 2017 at 11:21

Do you think this is the fastest way? df.isin(my_word).any(), my dataframe is huuge... Trying with df.eq(my_word).any(), but too slow... – Bibliotaph 25/10, 2019 at 9:45

@Bibliotaph - what about numpy ? – Borne 25/10, 2019 at 10:8

It isn't numerical unfortunately. Otherwise I'd be working with numpy – Bibliotaph 25/10, 2019 at 15:48

For checking in a specific column,you can simply check as below:

'test' in df.cloumn.values #which returns True or False

For checking in complete df :

df.isin(["test"]).any().any() #which will return True or False

Seville answered 30/7, 2020 at 15:39 Comment(1)

How would you adapt this to get the column name or index which the condition is true for? – Ignoble 26/6, 2022 at 21:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags