How to convert DataFrame.append() to pandas.concat()? [duplicate]

Asked 24/2, 2022 at 21:35 Answered 10/7, 2023 at 18:33

Solved python pandas dataframe append concatenation

In pandas 1.4.0: append() was deprecated, and the docs say to use concat() instead.

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Codeblock in question:

def generate_features(data, num_samples, mask):
    """
    The main function for generating features to train or evaluate on.
    Returns a pd.DataFrame()
    """
    logger.debug("Generating features, number of samples", num_samples)
    features = pd.DataFrame()

    for count in range(num_samples):
        row, col = get_pixel_within_mask(data, mask)
        input_vars = get_pixel_data(data, row, col)
        features = features.append(input_vars)
        print_progress(count, num_samples)

    return features

These are the two options I've tried, but did not work:

features = pd.concat([features],[input_vars])

and

pd.concat([features],[input_vars])

This is the line that is deprecated and throwing the error:

features = features.append(input_vars)

Algol answered 24/2, 2022 at 21:35 Comment(0)

You can store the DataFrames generated in the loop in a list and concatenate them with features once you finish the loop.

In other words, replace the loop:

for count in range(num_samples):
    # .... code to produce `input_vars`
    features = features.append(input_vars)        # remove this `DataFrame.append`

with the one below:

tmp = []                                  # initialize list
for count in range(num_samples):
    # .... code to produce `input_vars`
    tmp.append(input_vars)                        # append to the list, (not DF)
features = pd.concat(tmp)                         # concatenate after loop

You can certainly concatenate in the loop but it's more efficient to do it only once.

Stonefish answered 24/2, 2022 at 21:43 Comment(2)

From personal experience, each append can individually take almost as long as the entire concat, so the time savings by doing it once at the end can be massive. – Complemental 12/9, 2022 at 7:27

It is very unfortunate that they are deprecating append for dataframes. With my code, creating the dataframe using the temporary list as shown here results in my code running 10X slower. – Verisimilitude 29/11, 2022 at 13:29

This will "append" the blank df and prevent errors in the future by using the concat option

features= pd.concat([features, input_vars])

However, still, without having access to actually data and data structures this would be hard to test replicate.

Shadowgraph answered 24/2, 2022 at 21:39 Comment(3)

On the official Pandas docs for the latest release, you will see that .append() was deprecated. pandas.pydata.org/docs/whatsnew/v1.4.0.html They say I should use concat() instead, but I can't get it to work. I will keep exploring the pandas docs. – Algol 24/2, 2022 at 21:44

I updated my answer to use the concat thank you for pointing out the docs sorry if I missed them before – Shadowgraph 24/2, 2022 at 21:51

This directly fixes the op's mistake e.g. [features],[input_vars] should be [features, input_vars]. However in the case of a loop like the op, the other answer is far more efficient. – Complemental 12/9, 2022 at 7:44

There is another unpleasant edge case here: If input_vars is a series (not a dataframe) that represents one row to be appended to features, the deprecated use of features = features.append(input_vars) works fine and adds one row to the dataframe.

But the version with concat features = pd.concat([features, input_vars]) does something different and produces lots of NaNs. To get this to work, you need to convert the series to a dataframe:

features = pd.concat([features, input_vars.to_frame().T])

Wotan answered 25/4, 2023 at 15:20 Comment(0)

For example, you have a list of dataframes called collector, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our 'collector'. You do as follows

pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )

Pinter answered 18/2, 2023 at 12:11 Comment(0)

You can bring it back by creating a module

import pandas as pd


def my_append(self, x, ignore_index=False):
    if ignore_index:
        return pd.concat([self, x])
    else:
        return pd.concat([self, x]).reset_index(drop=True)


if not hasattr(pd.DataFrame, "append"):
    setattr(pd.DataFrame, "append", my_append)

This will add the implementation and can be tested as follows

import pandas as pd
import lib.pandassupport


def test_append_ignore_index_is_true():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row, ignore_index=True)
    print(df)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 0],
        )
    )


def test_append():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 3],
        )
    )

Nide answered 10/7, 2023 at 18:33 Comment(0)

Recommended topics

Hot tags