How to convert DataFrame.append() to pandas.concat()? [duplicate]
Asked Answered
A

5

21

In pandas 1.4.0: append() was deprecated, and the docs say to use concat() instead.

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Codeblock in question:

def generate_features(data, num_samples, mask):
    """
    The main function for generating features to train or evaluate on.
    Returns a pd.DataFrame()
    """
    logger.debug("Generating features, number of samples", num_samples)
    features = pd.DataFrame()

    for count in range(num_samples):
        row, col = get_pixel_within_mask(data, mask)
        input_vars = get_pixel_data(data, row, col)
        features = features.append(input_vars)
        print_progress(count, num_samples)

    return features

These are the two options I've tried, but did not work:

features = pd.concat([features],[input_vars])

and

pd.concat([features],[input_vars])

This is the line that is deprecated and throwing the error:

features = features.append(input_vars)
Algol answered 24/2, 2022 at 21:35 Comment(0)
S
18

You can store the DataFrames generated in the loop in a list and concatenate them with features once you finish the loop.

In other words, replace the loop:

for count in range(num_samples):
    # .... code to produce `input_vars`
    features = features.append(input_vars)        # remove this `DataFrame.append`

with the one below:

tmp = []                                  # initialize list
for count in range(num_samples):
    # .... code to produce `input_vars`
    tmp.append(input_vars)                        # append to the list, (not DF)
features = pd.concat(tmp)                         # concatenate after loop

You can certainly concatenate in the loop but it's more efficient to do it only once.

Stonefish answered 24/2, 2022 at 21:43 Comment(2)
From personal experience, each append can individually take almost as long as the entire concat, so the time savings by doing it once at the end can be massive.Complemental
It is very unfortunate that they are deprecating append for dataframes. With my code, creating the dataframe using the temporary list as shown here results in my code running 10X slower.Verisimilitude
S
5

This will "append" the blank df and prevent errors in the future by using the concat option

features= pd.concat([features, input_vars])

However, still, without having access to actually data and data structures this would be hard to test replicate.

Shadowgraph answered 24/2, 2022 at 21:39 Comment(3)
On the official Pandas docs for the latest release, you will see that .append() was deprecated. pandas.pydata.org/docs/whatsnew/v1.4.0.html They say I should use concat() instead, but I can't get it to work. I will keep exploring the pandas docs.Algol
I updated my answer to use the concat thank you for pointing out the docs sorry if I missed them beforeShadowgraph
This directly fixes the op's mistake e.g. [features],[input_vars] should be [features, input_vars]. However in the case of a loop like the op, the other answer is far more efficient.Complemental
W
1

There is another unpleasant edge case here: If input_vars is a series (not a dataframe) that represents one row to be appended to features, the deprecated use of features = features.append(input_vars) works fine and adds one row to the dataframe.

But the version with concat features = pd.concat([features, input_vars]) does something different and produces lots of NaNs. To get this to work, you need to convert the series to a dataframe:

features = pd.concat([features, input_vars.to_frame().T])

See also this question: Why does concat Series to DataFrame with index matching columns not work?

Wotan answered 25/4, 2023 at 15:20 Comment(0)
P
0

For example, you have a list of dataframes called collector, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our 'collector'. You do as follows

pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )
Pinter answered 18/2, 2023 at 12:11 Comment(0)
N
0

You can bring it back by creating a module

import pandas as pd


def my_append(self, x, ignore_index=False):
    if ignore_index:
        return pd.concat([self, x])
    else:
        return pd.concat([self, x]).reset_index(drop=True)


if not hasattr(pd.DataFrame, "append"):
    setattr(pd.DataFrame, "append", my_append)

This will add the implementation and can be tested as follows

import pandas as pd
import lib.pandassupport


def test_append_ignore_index_is_true():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row, ignore_index=True)
    print(df)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 0],
        )
    )


def test_append():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 3],
        )
    )
Nide answered 10/7, 2023 at 18:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.