Good alternative to Pandas .append() method, now that it is being deprecated?

R

8

158

I use the following method a lot to append a single row to a dataframe. One thing I really like about it is that it allows you to append a simple dict object. For example:

# Creating an empty dataframe
df = pd.DataFrame(columns=['a', 'b'])

# Appending a row
df = df.append({ 'a': 1, 'b': 2 }, ignore_index=True)

Again, what I like most about this is that the code is very clean and requires very few lines. Now I suppose the recommended alternative is:

# Create the new row as its own dataframe
df_new_row = pd.DataFrame({ 'a': [1], 'b': [2] })
df = pd.concat([df, df_new_row])

So what was one line of code before is now two lines with a throwaway variable and extra cruft where I create the new dataframe. :( Is there a good way to do this that just uses a dict like I have in the past (that is not deprecated)?

Rachael answered 24/1, 2022 at 16:46 Comment(2)

pandas issue 35407 explains that df.append was deprecated because: "Series.append and DataFrame.append [are] making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result." – Mnemonics 10/2, 2022 at 12:15

Came across this warning today. However when I used concat as the alternative I got "cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid". So frustrating..... – Injector 13/2, 2022 at 23:16

S

69

Create a list with your dictionaries, if they are needed, and then create a new dataframe with df = pd.DataFrame.from_records(your_list). List's "append" method are very efficient and won't be ever deprecated. Dataframes on the other hand, frequently have to be recreated and all data copied over on appends, due to their design - that is why they deprecated the method

Serialize answered 24/1, 2022 at 16:57 Comment(7)

How do you know that it is deprecated? At pandas.pydata.org/docs/reference/api/… (which currently shows version 1.4.0) I don't see any mention about that. Even at the dev tree I don't see any deprecation warning: pandas.pydata.org/docs/dev/reference/api/… – Humiliating 2/2, 2022 at 10:55

I agree ; though when you use append method (with 1.4.0) you run into a "FutureWarning : The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead". You will find the details in the "what's new" page – Paquette 4/2, 2022 at 13:20

@Humiliating the update to the documentation is being dealt with in this pull request: github.com/pandas-dev/pandas/pull/45587 – Mnemonics 10/2, 2022 at 12:21

this brought a ten times faster speed to my code , thanks so much man – Brigitte 7/5, 2022 at 13:37

That is actually the reason they are deprecating df.append. Thank the Pandas maintainers for that. Still, the "new way to do it" should be more proeminent in their docs, for sure. – Serialize 7/5, 2022 at 16:50

creating a dataframe from a huge list took too much time compared to getting data and appending in chunks to the dataframe. However the pd.concat method below worked just fine – Stranglehold 28/6, 2022 at 5:30

I heard append() will insert dictionaries into their correct corresponding dataframe columns for an existing dataframe with named columns. Given that, will from_records() be able to do the same? – Testee 30/7, 2022 at 9:18

C

70

I also like the append method. But you can do it in one line with a list of dicts

df = pd.concat([df, pd.DataFrame.from_records([{ 'a': 1, 'b': 2 }])])

or using loc and tuples for values on DataFrames with incremenal ascending indexes

df.loc[len(df), ['a','b']] = 1, 2

or maybe

df.loc[len(df), df.columns] = 3, 4

Complainant answered 8/3, 2022 at 13:45 Comment(4)

You can also use ignore_index df = pd.concat([df, pd.DataFrame.from_records([{ 'a': 1, 'b': 2 }])], ignore_index=True) – Complainant 22/4, 2022 at 18:50

first argument must be an iterable of pandas objects, you passed an object of type "DataFrame". What can I do to solve this? – Pterodactyl 28/11, 2022 at 20:7

The arguments are actually inside a list. So the first argument is a list of pandas objects. First item is the original df, second item is the new df generated from records. – Complainant 29/11, 2022 at 20:47

I guess this is the best answer out here, but intuitively, this feels wrong because the old dataframe is still overwritten with the new one. Appending in a sense should have greater data security. – Shipwright 4/12, 2022 at 22:36

S

69

Create a list with your dictionaries, if they are needed, and then create a new dataframe with df = pd.DataFrame.from_records(your_list). List's "append" method are very efficient and won't be ever deprecated. Dataframes on the other hand, frequently have to be recreated and all data copied over on appends, due to their design - that is why they deprecated the method

Serialize answered 24/1, 2022 at 16:57 Comment(7)

How do you know that it is deprecated? At pandas.pydata.org/docs/reference/api/… (which currently shows version 1.4.0) I don't see any mention about that. Even at the dev tree I don't see any deprecation warning: pandas.pydata.org/docs/dev/reference/api/… – Humiliating 2/2, 2022 at 10:55

I agree ; though when you use append method (with 1.4.0) you run into a "FutureWarning : The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead". You will find the details in the "what's new" page – Paquette 4/2, 2022 at 13:20

@Humiliating the update to the documentation is being dealt with in this pull request: github.com/pandas-dev/pandas/pull/45587 – Mnemonics 10/2, 2022 at 12:21

this brought a ten times faster speed to my code , thanks so much man – Brigitte 7/5, 2022 at 13:37

That is actually the reason they are deprecating df.append. Thank the Pandas maintainers for that. Still, the "new way to do it" should be more proeminent in their docs, for sure. – Serialize 7/5, 2022 at 16:50

creating a dataframe from a huge list took too much time compared to getting data and appending in chunks to the dataframe. However the pd.concat method below worked just fine – Stranglehold 28/6, 2022 at 5:30

I heard append() will insert dictionaries into their correct corresponding dataframe columns for an existing dataframe with named columns. Given that, will from_records() be able to do the same? – Testee 30/7, 2022 at 9:18

S

36

If you want to use concat instead:

append

outputxlsx = outputxlsx.append(df, ignore_index=True)

concat

outputxlsx = pd.concat([outputxlsx, df], ignore_index=True)

Saskatoon answered 2/6, 2022 at 19:55 Comment(1)

outputxlsx = pd.concat([outputxlsx, df]) is enough since df is a data frame. – Mnemonics 14/6, 2022 at 19:9

D

10

For those, like me, who want a descriptive function rather than lots of one-liners, here is an option based on @Rafael Gaitan above.

def append_dict_to_df(df, dict_to_append):
    df = pd.concat([df, pd.DataFrame.from_records([dict_to_append])])
    return df

# Creating an empty dataframe
df = pd.DataFrame(columns=['a', 'b'])

# Appending a row
df = append_dict_to_df(df,{ 'a': 1, 'b': 2 })

Dre answered 31/8, 2022 at 12:46 Comment(0)

A

9

I was facing a similar issue. The other solutions weren't really working for me. I'm leaving this answer here as an additional possibility to deal with the issue since this is the first google result for certain searches and I myself ended here at least for the second time.

In my case the data is not a dict but just a list of values for a known set of parameters. I want to add the parameter values to a dataframe as rows because this way I can access a series of all the values for one parameter via df[parameter].

I start with an empty DataFrame:

parameters = ['a', 'b', 'c', 'd', 'e', 'f']
df = pd.DataFrame(columns=parameters)

df:

        a   b   c   d   e   f

With append I could add rows very convenient like so:

new_row = pd.Series([1,2,3,4,5,6], index=parameters, name='row1')
df.append(new_row)

df:

        a   b   c   d   e   f
row1    1   2   3   4   5   6

With pd.concat I found this to deliver the same result in very similar way:

new_row = pd.DataFrame([1,2,3,4,5,6], columns=['row1'], index=parameters).T
df = pd.concat((df, new_row))

The key was to create a (1,n) dataframe from the 1d data and then transpose it to match the other dataframe.

Alten answered 1/8, 2022 at 17:8 Comment(1)

Or you could create a dictionary out of your list: new = {k: v for k, v in zip(parameters, [1,2,3,4,5,6])} then df = pd.concat([df, pd.DataFrame(new, index=['row1'])]) works – Casket 21/9, 2022 at 6:30

W

3

# Deprecated issue has been resolved

# Creating an empty dataframe
df = pd.DataFrame(columns=['a', 'b'])
print("df columns:", df)

# Appending a row
df = df.append({ 'a': 1, 'b': 2 }, ignore_index=True)
print("df column Values :", df)

# Create the new row as its own dataframe
df_new_row = pd.DataFrame.from_records({ 'a': [3], 'b': [4] })
df = pd.concat([df, df_new_row])
print("pd concat with two df's :", df)

Wherever answered 10/8, 2022 at 9:7 Comment(0)

I

1

You can use the following command

#If your index is a string
df.loc["name of the index"] = pd.Series({"Column 1" : Value1, "Column 2" : Value2,
"Column 3" : Value3, "Column 4" : Value4, ...})

#If your index is a number
df.loc[len(df)] = pd.Series({"Column 1" : Value1, "Column 2" : Value2,
"Column 3" : Value3, "Column 4" : Value4, ...})

Just keep in mind that the changes will be stored in the initial dataframe.

Intolerance answered 11/4, 2023 at 15:36 Comment(0)

W

0

If you want to use chained operations in pandas to append new rows, you could use something like this:

import pandas as pd
      
dictionary = {
    "sex": ["male", "female", "male"],
    "age": [10, 25, 36],
    "pclass": ["pclass1", "pclass3", "pclass2"],
    "survived": [True, False, False]
}
df_dict = pd.DataFrame(dictionary)

new_row = pd.DataFrame({"sex": "male", "age": 40, "survived": False, "name": "Alex"}, index=[0])

df= (
  df_dict
  .assign(name=['Alice', 'Bob', 'Charlie'])
  .drop("pclass", axis=1)
  .pipe(lambda df_: pd.concat([df_, new_row], ignore_index = True))
)

For your example, I would use a custom append_row function.

import pandas as pd

def append_row(df1, d):
    df2 = pd.DataFrame(d, index=[0])
    return pd.concat([df1, df2], ignore_index = True)

df = (
  pd.DataFrame(columns=['a', 'b'])
  .pipe(append_row, {'a': 1, 'b': 2 })
)

Warrantable answered 28/7, 2023 at 12:18 Comment(2)

This breaks for several reasons, can you please fix it: you need to remove the stale line .drop("pclass", axis=1), also the assignment of df_dict doesn't work in pandas 2.0. Please retest it. – Menfolk 9/9, 2023 at 1:44

The first part was only an example, not a complete example. Yoy can see both examples working at this kaggle notebook using pandas 2.0.3 here: kaggle.com/code/jordirosell/pandas-pipe-to-append-new-rows – Warrantable 10/9, 2023 at 13:54

Recommended topics

Hot tags