NaN values when new column added to pandas DataFrame
Asked Answered
P

2

36

I'm trying to generate a new column in a pandas DataFrame that equals values in another pandas DataFrame. When I attempt to create the new column I just get NaNs for the new column values.

First I use an API call to get some data, and the 'mydata' DataFrame is one column of data indexed by dates

mydata = Quandl.get(["YAHOO/INDEX_MXX.4"],
                    trim_start="2001-04-01", trim_end="2014-03-31",
                    collapse="monthly")

The next DataFrame I get from a CSV with the following code, and it contains many columns of data with the same number of rows as 'mydata'

DWDATA = pandas.DataFrame.from_csv("filename",
                                   header=0,
                                   sep=',',
                                   index_col=0,
                                   parse_dates=True,
                                   infer_datetime_format=True)

I then try to generate the new column like this:

DWDATA['MXX'] = mydata.iloc[:,0]

Again, I just get NaN values. Can someone help me understand why it's doing this and how to resolve? From what I've read it looks like I might have something wrong with my indexes. The indexes are dates in each DataFrame, but 'mydata' have end-of-month dates while 'DWDATA' has beginning-of-month dates.

Plauen answered 6/10, 2014 at 17:13 Comment(2)
If the index does not overlap (as you describe), then you indeed will get NaNs. You will have to change the index of one of both, or if you are certain the number of rows are exactly equal, just put the values (without index) in the new column (mydata.iloc[:,0].values)Enrage
Adding '.values' did work! Thanks @Enrage and I'll remember that bit about the indexes having to be equal in the future!Plauen
P
48

Because the indexes are not exactly equal, NaNs will result. Either one or both of the indexes must be changed to match. Example:

mydata = mydata.set_index(DWDATA.index)

The above will change the index of the 'mydata' DataFrame to match the index of the 'DWDATA' DataFrame.

Since the number of rows are exactly equal for the two DataFrames, you can also just pass the values of 'mydata' to the new 'DWDATA' column:

DWDATA['MXX'] = mydata.iloc[:,0].values
Plauen answered 6/10, 2014 at 17:52 Comment(3)
This work fine, but values is not recommended anymore at least for production by the pandas team. They recommend to use .to_numpy()Enshrine
@Enshrine - Is this by value or by reference?Redpencil
@AlaaM. I meant .values the dataframe attribute. In our case rather than using DWDATA['MXX'] = mydata.iloc[:,0].values, according to the documentation it is better to use DWDATA['MXX'] = mydata.iloc[:,0].to_numpy()Enshrine
E
0

I like the accepted solution, and just to add to it. I can't add a comment (not enough rep), but I've hit exactly the same problem. eventually solved it with tolist(). seemed like the most pythonic way. to copy from @gtnbz2nyt reply:

DWDATA['MXX'] = mydata.iloc[:,0].tolist()

I hope it covers more datatypes. edit: to clarify, the iloc is making a series which is than transferred into a list. you can't list a df object

Eglanteen answered 4/8, 2022 at 9:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.