Filter pandas dataframe by list [duplicate]
Asked Answered
W

7

10

I have a dataframe that has a row called "Hybridization REF". I would like to filter so that I only get the data for the items that have the same label as one of the items in my list.

Basically, I'd like to do the following:

dataframe[dataframe["Hybridization REF"].apply(lambda: x in list)] 

but that syntax is not correct.

Whitebait answered 11/7, 2017 at 16:45 Comment(0)
M
4

Update using reindex,

df.reindex(collist, axis=1)

and

df.reindex(rowlist, axis=0)

and both:

df.reindex(index=rowlist, columns=collist)

You can use .loc or column filtering:

df = pd.DataFrame(data=np.random.rand(5,5),columns=list('ABCDE'),index=list('abcde'))

df
          A         B         C         D         E
a  0.460537  0.174788  0.167554  0.298469  0.630961
b  0.728094  0.275326  0.405864  0.302588  0.624046
c  0.953253  0.682038  0.802147  0.105888  0.089966
d  0.122748  0.954955  0.766184  0.410876  0.527166
e  0.227185  0.449025  0.703912  0.617826  0.037297

collist = ['B','D','E']

rowlist = ['a','c']

Get columns in list:

df[collist]

Output:

          B         D         E
a  0.174788  0.298469  0.630961
b  0.275326  0.302588  0.624046
c  0.682038  0.105888  0.089966
d  0.954955  0.410876  0.527166
e  0.449025  0.617826  0.037297

Get rows in list

df.loc[rowlist]

          A         B         C         D         E
a  0.460537  0.174788  0.167554  0.298469  0.630961
c  0.953253  0.682038  0.802147  0.105888  0.089966
Murage answered 11/7, 2017 at 16:59 Comment(0)
K
21

Suppose df is your dataframe, lst is our list of labels.

df.loc[ df.index.isin(lst), : ]

Will display all rows whose index matches any value of the list item. I hope this helps solve your query.

Kenley answered 11/7, 2017 at 16:48 Comment(0)
O
11

Is there a numpy dataframe? I am guessing it is pandas dataframe, if so here is the solution.

df[df['Hybridization REF'].isin(list)]
Overstretch answered 11/7, 2017 at 16:55 Comment(1)
This filter works and I prefer this way. You can read the docs how to filtering dataframe this way.Spiracle
M
4

Update using reindex,

df.reindex(collist, axis=1)

and

df.reindex(rowlist, axis=0)

and both:

df.reindex(index=rowlist, columns=collist)

You can use .loc or column filtering:

df = pd.DataFrame(data=np.random.rand(5,5),columns=list('ABCDE'),index=list('abcde'))

df
          A         B         C         D         E
a  0.460537  0.174788  0.167554  0.298469  0.630961
b  0.728094  0.275326  0.405864  0.302588  0.624046
c  0.953253  0.682038  0.802147  0.105888  0.089966
d  0.122748  0.954955  0.766184  0.410876  0.527166
e  0.227185  0.449025  0.703912  0.617826  0.037297

collist = ['B','D','E']

rowlist = ['a','c']

Get columns in list:

df[collist]

Output:

          B         D         E
a  0.174788  0.298469  0.630961
b  0.275326  0.302588  0.624046
c  0.682038  0.105888  0.089966
d  0.954955  0.410876  0.527166
e  0.449025  0.617826  0.037297

Get rows in list

df.loc[rowlist]

          A         B         C         D         E
a  0.460537  0.174788  0.167554  0.298469  0.630961
c  0.953253  0.682038  0.802147  0.105888  0.089966
Murage answered 11/7, 2017 at 16:59 Comment(0)
B
1

Same code with this correction should work.

dataframe[dataframe["Hybridization REF"].apply(lambda x : x in list)] 
Baram answered 19/7, 2022 at 22:32 Comment(0)
C
0

You can try the following:

df.loc[ df.index.intersection(lst), : ]

This way you only get the intersection

Cns answered 13/1, 2022 at 18:30 Comment(0)
Q
0

Another alternative is to use query:

df.query('`Hybridization REF` == @list')

The `'s before and after Hybridization REF are needed due to the whitespace in the column name. With @ you can access the variable list.

Keep in mind that Python's built-in list type is named list. So it is a good idea to rename this variable.

Quadruple answered 7/6, 2022 at 20:44 Comment(0)
E
0

For future reference, if you are looking to match just a sub portion of your string you can also use:

new_df = df.loc[df.index.str.contains('sub_string_you_need'), :]
Edgardo answered 21/10, 2022 at 17:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.