pandas - AttributeError 'dataframe' object has no attribute
Asked Answered
R

2

22

I am trying to filter out the dataframe that contains a list of product. However, I am getting the pandas - 'dataframe' object has no attribute 'str' error whenever I run the code.

Here is the line of code:

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

Product is an object datatype.

import pandas as pd
import numpy as np

data = pd.read_csv("FILE.csv", header = None)

headerName = ["DRID", "Product", "M24", "M23", "M22", "M21"] 
data.columns = [headerName]

log_df = np.log(1 + data[["M24", "M23", "M22", "M21"]])
copy = data[["DRID", "Product"]].copy()
log_df = copy.join(log_df)

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

Here is the head:

       ID  PRODUCT       M24       M23       M22  M21
0  123421        A  0.000000  0.000000  1.098612  0.0   
1  141840        A  0.693147  1.098612  0.000000  0.0   
2  212006        A  0.693147  0.000000  0.000000  0.0   
3  216097        A  1.098612  0.000000  0.000000  0.0   
4  219517        A  1.098612  0.693147  1.098612  0.0
Rankin answered 24/7, 2018 at 15:18 Comment(35)
Your code should work. Are you sure you are not doing log_df.str somewhere (instead of log_df['Product'].str)? Or maybe you have duplicated indexes with this name Product (e.g. two columns with same name) ?Kestrel
@RafaelC yes I am positive. It was working yesterday, but now it is not working anymore.Rankin
What do you see for type(log_df['Product']) ?Jackhammer
@RafaelC no, there are no duplicated indexes.Rankin
@Jackhammer pandas.core.frame.DataFrameRankin
@DavidLuong, So now what do you see for type(log_df) ?Jackhammer
@Jackhammer pandas.core.frame.DataFrame for type(log_df)Rankin
What version of pandas are you using?Skylar
Are you sure you don't have extra brackets, e.g. log_df[['Product']]? Otherwise, I think you need to share a reproducible example.Jackhammer
@Jackhammer yes i am sure, I will share the entire code with you.Rankin
@RafaelC I can't share that because it is confidential information.. But I get 3.6 million products that have been sold.Rankin
Post only the first five lines of your df and change whatever confidential info to foo, bar, blablabla etc. Just want to understand the structure of your dfKestrel
@RafaelC sorry, kind of new to stackoverflow. I will post it in the body.Rankin
@RafaelC Done, let me know if you need anything elseRankin
@Skylar 0.23.3Rankin
Add the column names tooKestrel
@RafaelC Got itRankin
problem lies in the line of code np.log, log_df = np.log(1+data[["M24","M23","M22","M21","M20","M19","M18","M17","M16","M15","M14","M13","M12","M11","M10","M9","M8","M7","M6","M5","M4","M3","M2","M1"]])Ricardaricardama
@Mr.J do you have any suggestions?Rankin
yes, i found problem but i need your input. can you please print print(data) after line data.columns=[headerName]. give me output result. the problem seems to be column mapping with Product.Ricardaricardama
just show me 3X4 rowsRicardaricardama
@Mr.J i printed it out. it seems whenever i print it out, the product is not actually under the column.Rankin
exactly that is what causing the issue on conversion to str, since right columns names are not getting mapped.Ricardaricardama
@Mr.J can you please help me on how to fix this problem? I am inexperienced with python. Also, it was working fine yesterday. Do you have any idea why it might not work anymore?Rankin
add columns names as parameter to dataframe on creation. or put columns as header to csv file. something like this. data = pd.DataFrame({'ID':[123421,141840,212006],'PRODUCT':['A','A','A'],'M24':[0.000000,0.693147,0.693147],'M23':[0.000000,1.098612,0.693147]},columns=["ID","Product","M24","M23"])Ricardaricardama
@Mr.J i added it as a parameter and it still is not working.Rankin
@Mr.J i am doing data =pd.read_csv("file.csv" names= "A...")Rankin
@Mr.J whenever I print out 'data' alone, it looks like it is getting mapped, but print(data) still doesnt map it.Rankin
this works for me, try to change your accordingly. import pandas as pd import numpy as np cl = ['ID','PRODUCT','M24','M23'] data = pd.DataFrame({'ID':[123421,141840,212006], 'PRODUCT':["A","A","A"], 'M24':[0.000000,0.693147,0.693147],'M23':[0.000000,1.098612,0.693147]},columns=cl) data.set_index('ID') log_df = np.log(1+data[['M24','M23']]) log_df = data[["ID","PRODUCT"]].copy().join(log_df) include_clique = log_df.loc[log_df['PRODUCT'].str.contains("A")] include_cliqueRicardaricardama
let me know if it worksRicardaricardama
@Mr.J i might be wrong, but I am still able to call data["product"] .. if this was not mapped, wouldn't an error occur?Rankin
are you getting correct product data?Ricardaricardama
@Mr.J yeah i am, i did data.product.unique() i noticed that there is a bunch of white space, which makes the print(data) look awkward.Rankin
try to strip white spaces on columnsRicardaricardama
okay, i'm not sure what happened but everything works now. i didn't change any code at all. i ignored the white space without striping it and it works for me... well thank you for your help. this is confusing me..Rankin
M
27

Short answer: change data.columns=[headerName] into data.columns=headerName

Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.

When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.

For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.

products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]

However, I guess the first solution is what you're looking for

Miguel answered 28/9, 2018 at 9:24 Comment(1)
If entries in the column are NaN, you may still run into problems and require this answer's 2nd solution. I had an unexpected scenario where some of my df['column'] were single columns and others were DataFrames.Suave
T
1

You get AttributeError: 'DataFrame' object has no attribute ... when you try to access an attribute your dataframe doesn't have.

A common case is when you try to select a column using . instead of [] when the column name contains white space (e.g. 'col1 ').

df.col1       # <--- error
df['col1 ']   # <--- no error

Another common case is when you try to call a Series method on a DataFrame. For example, tolist() (or map()) are Series methods so they must be called on a column. If you call them on a DataFrame, you'll get

AttributeError: 'DataFrame' object has no attribute 'tolist'

AttributeError: 'DataFrame' object has no attribute 'map'

As hoang tran explains, this is what is happening with OP as well. .str is a Series accessor and it's not implemented for DataFrames.


Yet another case is if you have a typo and try to call/access an attribute that's simply not defined; e.g. if you try to call rows() instead of iterrows(), you'll get

AttributeError: 'DataFrame' object has no attribute 'rows'

You can check the full list of attributes using the following comprehension.

[x for x in dir(pd.DataFrame) if not x.startswith('_')]

When you assign column names as df.columns = [['col1', 'col2']], df is a MultiIndex dataframe now, so to access each column, you'll need to pass a tuple:

df['col1'].str.contains('Product A')    # <---- error
df['col1',].str.contains('Product A')   # <---- no error; note the trailing comma

In fact, you can pass a tuple to select a column of any MultiIndex dataframe, e.g.

df['level_1_colname', 'level_2_colname'].str.contains('Product A')

You can also flatten a MultiIndex column names by mapping a "flattener" function on it. A common one is ''.join:

df.columns = df.columns.map('_'.join)
Teleprinter answered 1/2, 2023 at 22:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.