how to check if a value exists in a dataframe
Asked Answered
C

2

6

hi I am trying to get the column name of a dataframe which contains a specific word,

eg: i have a dataframe,

NA              good    employee
Not available   best    employer
not required    well    manager
not eligible    super   reportee

my_word=["well"]

how to check if "well" exists in a df and the column name which has "well"

thanks in Advance!

Claypool answered 17/11, 2017 at 10:11 Comment(4)
there is only one word?Borne
no,sometimes more than oneClaypool
And it should be substring?Borne
yes, list of stringsClaypool
B
9

Use DataFrame.isin for check all columns and DataFrame.any for check at least one True per row:

m = df.isin(my_word).any()
print (m)
0    False
1     True
2    False
dtype: bool

And then get columns names by filtering:

cols = m.index[m].tolist()
print(cols)
[1]

Data:

print (df)
               0      1         2
0            NaN   good  employee
1  Not available   best  employer
2   not required   well   manager
3   not eligible  super  reportee

Detail:

print (df.isin(my_word))
       0      1      2
0  False  False  False
1  False  False  False
2  False   True  False
3  False  False  False

print (df.isin(my_word).any())
0    False
1     True
2    False
dtype: bool

EDIT After converting get nested lists, so flattening is necessary:

my_word=["well","manager"]

m = df.isin(my_word).any()
print (m)
0    False
1     True
2     True
dtype: bool

nested = df.loc[:,m].values.tolist()
flat_list = [item for sublist in nested for item in sublist]
print (flat_list)
['good', 'employee', 'best', 'employer', 'well', 'manager', 'super', 'reportee']
Borne answered 17/11, 2017 at 10:14 Comment(19)
ok,how to add those column values to my_word, do we need to append one by one or any pandas way to addClaypool
If both are lists, simply use + like my_word += cols or my_word = my_word + colsBorne
no, not only column names, I want to add all the values of the datacolumns in colsClaypool
I can do using a for loop, for item in cols: my_word+=df[item].values.tolist() but it would be better if we have pandas wayClaypool
Yes, I add answer ;)Borne
Thanks for the solution :)Claypool
sometimes my_word contains column name itself, in that case your solution is not working, if I read that df with index false then it will work,Claypool
how to consider column names also as a column values, I mean i have to read my df file without column headingClaypool
do you think parameter header=None?Borne
yes, when we are doing df.isin(my_word).any() it should consider column names also, so I want column names =1,2,3,...Claypool
no 0,1,2 but 1,2,3 ? Then use df.columns += 1Borne
ya sorry it is 0,1,2,.. how to make itClaypool
So if use df = pd.read_csv(file, header=None) get it. But if there are some columns names, use df = pd.read_csv(file, header=None, skiprows=1) - it omit first header row.Borne
I dont want to skip rows, df = pd.read_csv(file, header=None) works for me but when it is reading "NA" it reads as NaN, how to overcomeClaypool
Use keep_default_na=FalseBorne
No it is for all data in df.Borne
Do you think this is the fastest way? df.isin(my_word).any(), my dataframe is huuge... Trying with df.eq(my_word).any(), but too slow...Bibliotaph
@Bibliotaph - what about numpy ?Borne
It isn't numerical unfortunately. Otherwise I'd be working with numpyBibliotaph
S
6

For checking in a specific column,you can simply check as below:

'test' in df.cloumn.values #which returns True or False

For checking in complete df :

df.isin(["test"]).any().any() #which will return True or False
Seville answered 30/7, 2020 at 15:39 Comment(1)
How would you adapt this to get the column name or index which the condition is true for?Ignoble

© 2022 - 2024 — McMap. All rights reserved.