pandas - AttributeError 'dataframe' object has no attribute

Asked 24/7, 2018 at 15:18 Answered 1/2, 2023 at 22:30

Solved python pandas dataframe indexing attributeerror

I am trying to filter out the dataframe that contains a list of product. However, I am getting the pandas - 'dataframe' object has no attribute 'str' error whenever I run the code.

Here is the line of code:

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

Product is an object datatype.

import pandas as pd
import numpy as np

data = pd.read_csv("FILE.csv", header = None)

headerName = ["DRID", "Product", "M24", "M23", "M22", "M21"] 
data.columns = [headerName]

log_df = np.log(1 + data[["M24", "M23", "M22", "M21"]])
copy = data[["DRID", "Product"]].copy()
log_df = copy.join(log_df)

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

Here is the head:

       ID  PRODUCT       M24       M23       M22  M21
0  123421        A  0.000000  0.000000  1.098612  0.0   
1  141840        A  0.693147  1.098612  0.000000  0.0   
2  212006        A  0.693147  0.000000  0.000000  0.0   
3  216097        A  1.098612  0.000000  0.000000  0.0   
4  219517        A  1.098612  0.693147  1.098612  0.0

Rankin answered 24/7, 2018 at 15:18 Comment(35)

Your code should work. Are you sure you are not doing log_df.str somewhere (instead of log_df['Product'].str)? Or maybe you have duplicated indexes with this name Product (e.g. two columns with same name) ? – Kestrel 24/7, 2018 at 15:19

@RafaelC yes I am positive. It was working yesterday, but now it is not working anymore. – Rankin 24/7, 2018 at 15:20

What do you see for type(log_df['Product']) ? – Jackhammer 24/7, 2018 at 15:22

@RafaelC no, there are no duplicated indexes. – Rankin 24/7, 2018 at 15:22

@Jackhammer pandas.core.frame.DataFrame – Rankin 24/7, 2018 at 15:23

@DavidLuong, So now what do you see for type(log_df) ? – Jackhammer 24/7, 2018 at 15:23

@Jackhammer pandas.core.frame.DataFrame for type(log_df) – Rankin 24/7, 2018 at 15:25

What version of pandas are you using? – Skylar 24/7, 2018 at 15:28

Are you sure you don't have extra brackets, e.g. log_df[['Product']]? Otherwise, I think you need to share a reproducible example. – Jackhammer 24/7, 2018 at 15:28

@Jackhammer yes i am sure, I will share the entire code with you. – Rankin 24/7, 2018 at 15:30

@RafaelC I can't share that because it is confidential information.. But I get 3.6 million products that have been sold. – Rankin 24/7, 2018 at 15:34

Post only the first five lines of your df and change whatever confidential info to foo, bar, blablabla etc. Just want to understand the structure of your df – Kestrel 24/7, 2018 at 15:35

@RafaelC sorry, kind of new to stackoverflow. I will post it in the body. – Rankin 24/7, 2018 at 15:37

@RafaelC Done, let me know if you need anything else – Rankin 24/7, 2018 at 15:40

@Skylar 0.23.3 – Rankin 24/7, 2018 at 15:41

Add the column names too – Kestrel 24/7, 2018 at 15:41

@RafaelC Got it – Rankin 24/7, 2018 at 15:43

problem lies in the line of code np.log, log_df = np.log(1+data[["M24","M23","M22","M21","M20","M19","M18","M17","M16","M15","M14","M13","M12","M11","M10","M9","M8","M7","M6","M5","M4","M3","M2","M1"]]) – Ricardaricardama 24/7, 2018 at 15:52

@Mr.J do you have any suggestions? – Rankin 24/7, 2018 at 15:58

yes, i found problem but i need your input. can you please print print(data) after line data.columns=[headerName]. give me output result. the problem seems to be column mapping with Product. – Ricardaricardama 24/7, 2018 at 17:2

just show me 3X4 rows – Ricardaricardama 24/7, 2018 at 17:2

@Mr.J i printed it out. it seems whenever i print it out, the product is not actually under the column. – Rankin 24/7, 2018 at 17:13

exactly that is what causing the issue on conversion to str, since right columns names are not getting mapped. – Ricardaricardama 24/7, 2018 at 17:14

@Mr.J can you please help me on how to fix this problem? I am inexperienced with python. Also, it was working fine yesterday. Do you have any idea why it might not work anymore? – Rankin 24/7, 2018 at 17:15

add columns names as parameter to dataframe on creation. or put columns as header to csv file. something like this. data = pd.DataFrame({'ID':[123421,141840,212006],'PRODUCT':['A','A','A'],'M24':[0.000000,0.693147,0.693147],'M23':[0.000000,1.098612,0.693147]},columns=["ID","Product","M24","M23"]) – Ricardaricardama 24/7, 2018 at 17:17

@Mr.J i added it as a parameter and it still is not working. – Rankin 24/7, 2018 at 17:21

@Mr.J i am doing data =pd.read_csv("file.csv" names= "A...") – Rankin 24/7, 2018 at 17:21

@Mr.J whenever I print out 'data' alone, it looks like it is getting mapped, but print(data) still doesnt map it. – Rankin 24/7, 2018 at 17:22

this works for me, try to change your accordingly. import pandas as pd import numpy as np cl = ['ID','PRODUCT','M24','M23'] data = pd.DataFrame({'ID':[123421,141840,212006], 'PRODUCT':["A","A","A"], 'M24':[0.000000,0.693147,0.693147],'M23':[0.000000,1.098612,0.693147]},columns=cl) data.set_index('ID') log_df = np.log(1+data[['M24','M23']]) log_df = data[["ID","PRODUCT"]].copy().join(log_df) include_clique = log_df.loc[log_df['PRODUCT'].str.contains("A")] include_clique – Ricardaricardama 24/7, 2018 at 17:35

let me know if it works – Ricardaricardama 24/7, 2018 at 17:35

@Mr.J i might be wrong, but I am still able to call data["product"] .. if this was not mapped, wouldn't an error occur? – Rankin 24/7, 2018 at 17:35

are you getting correct product data? – Ricardaricardama 24/7, 2018 at 17:37

@Mr.J yeah i am, i did data.product.unique() i noticed that there is a bunch of white space, which makes the print(data) look awkward. – Rankin 24/7, 2018 at 17:39

try to strip white spaces on columns – Ricardaricardama 24/7, 2018 at 17:40

okay, i'm not sure what happened but everything works now. i didn't change any code at all. i ignored the white space without striping it and it works for me... well thank you for your help. this is confusing me.. – Rankin 24/7, 2018 at 17:45

Short answer: change data.columns=[headerName] into data.columns=headerName

Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.

When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.

For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.

products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]

However, I guess the first solution is what you're looking for

Miguel answered 28/9, 2018 at 9:24 Comment(1)

If entries in the column are NaN, you may still run into problems and require this answer's 2nd solution. I had an unexpected scenario where some of my df['column'] were single columns and others were DataFrames. – Suave 28/8, 2023 at 19:58

You get AttributeError: 'DataFrame' object has no attribute ... when you try to access an attribute your dataframe doesn't have.

A common case is when you try to select a column using . instead of [] when the column name contains white space (e.g. 'col1 ').

df.col1       # <--- error
df['col1 ']   # <--- no error

Another common case is when you try to call a Series method on a DataFrame. For example, tolist() (or map()) are Series methods so they must be called on a column. If you call them on a DataFrame, you'll get

AttributeError: 'DataFrame' object has no attribute 'tolist'

AttributeError: 'DataFrame' object has no attribute 'map'

As hoang tran explains, this is what is happening with OP as well. .str is a Series accessor and it's not implemented for DataFrames.

Yet another case is if you have a typo and try to call/access an attribute that's simply not defined; e.g. if you try to call rows() instead of iterrows(), you'll get

AttributeError: 'DataFrame' object has no attribute 'rows'

You can check the full list of attributes using the following comprehension.

[x for x in dir(pd.DataFrame) if not x.startswith('_')]

When you assign column names as df.columns = [['col1', 'col2']], df is a MultiIndex dataframe now, so to access each column, you'll need to pass a tuple:

df['col1'].str.contains('Product A')    # <---- error
df['col1',].str.contains('Product A')   # <---- no error; note the trailing comma

In fact, you can pass a tuple to select a column of any MultiIndex dataframe, e.g.

df['level_1_colname', 'level_2_colname'].str.contains('Product A')

You can also flatten a MultiIndex column names by mapping a "flattener" function on it. A common one is ''.join:

df.columns = df.columns.map('_'.join)

Teleprinter answered 1/2, 2023 at 22:30 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags