Edit pandas dataframe row-by-row
Asked Answered
F

1

24

pandas for python is neat. I'm trying to replace a list-of-dictionaries with a pandas-dataframe. However, I'm wondering of there's a way to change values row-by-row in a for-loop just as easy?

Here's the non-pandas dict-version:

trialList = [
    {'no':1, 'condition':2, 'response':''},
    {'no':2, 'condition':1, 'response':''},
    {'no':3, 'condition':1, 'response':''}
]  # ... and so on

for trial in trialList:
    # Do something and collect response
    trial['response'] = 'the answer!'

... and now trialList contains the updated values because trial refers back to that. Very handy! But the list-of-dicts is very unhandy, especially because I'd like to be able to compute stuff column-wise which pandas excel at.

So given trialList from above, I though I could make it even better by doing something pandas-like:

import pandas as pd    
dfTrials = pd.DataFrame(trialList)  # makes a nice 3-column dataframe with 3 rows

for trial in dfTrials.iterrows():
   # do something and collect response
   trials[1]['response'] = 'the answer!'

... but trialList remains unchanged here. Is there an easy way to update values row-by-row, perhaps equivalent to the dict-version? It is important that it's row-by-row as this is for an experiment where participants are presented with a lot of trials and various data is collected on each single trial.

Fieldfare answered 19/12, 2013 at 21:33 Comment(0)
L
58

If you really want row-by-row ops, you could use iterrows and loc:

>>> for i, trial in dfTrials.iterrows():
...     dfTrials.loc[i, "response"] = "answer {}".format(trial["no"])
...     
>>> dfTrials
   condition  no  response
0          2   1  answer 1
1          1   2  answer 2
2          1   3  answer 3

[3 rows x 3 columns]

Better though is when you can vectorize:

>>> dfTrials["response 2"] = dfTrials["condition"] + dfTrials["no"]
>>> dfTrials
   condition  no  response  response 2
0          2   1  answer 1           3
1          1   2  answer 2           3
2          1   3  answer 3           4

[3 rows x 4 columns]

And there's always apply:

>>> def f(row):
...     return "c{}n{}".format(row["condition"], row["no"])
... 
>>> dfTrials["r3"] = dfTrials.apply(f, axis=1)
>>> dfTrials
   condition  no  response  response 2    r3
0          2   1  answer 1           3  c2n1
1          1   2  answer 2           3  c1n2
2          1   3  answer 3           4  c1n3

[3 rows x 5 columns]
Lanoralanose answered 19/12, 2013 at 21:41 Comment(3)
Thanks! The top one is what I needed. Not as beautiful/simple as I'd like, but it works.Kollwitz
@Jonas the point is that the vectorised solution will always be the fastest. Iterating row-by-row is slow (and can usually be avoided).Jacobite
In my case, data is updated trial by trial as the subject goes through the experiment and analysis needs to be done "online" before all data is collected. Therefore operations on all rows at once is not possible.Kollwitz

© 2022 - 2024 — McMap. All rights reserved.