How do I create a while loop for this df that has moving average in every stage? [duplicate]
Asked Answered
E

1

0

So I want to spread the shipments per ID in the group one by one by looking at avg sales to determine who to give it to.

Here's my dataframe:

ID   STOREID    BAL       SALES   SHIP
1     STR1      50        5       18
1     STR2      6         7       18
1     STR3      74        4       18
2     STR1      35        3       500
2     STR2      5         4       500
2     STR3      54        7       500

While SHIP (grouped by ID) is greater than 0, calculate AVG (BAL/SALES) and the lowest AVG per group give +1 to its column BAL and +1 to its column final. And then repeat the process until SHIP is 0. The AVG would be different every stage which is why I wanted it to be a while loop.

Sample output of first round is below. So do this until SHIP is 0 and SUM of Final per ID is = to SHIP:

ID   STOREID   BAL   SALES  SHIP   AVG        Final
1    STR1      50    5      18     10         0
1    STR2      6     4      18     1.5        1
1    STR3      8     4      18     2          0
2    STR1      35    3      500    11.67      0
2    STR2      5     4      500    1.25       1
2    STR3      54    7      500    7.71       0

I've tried a couple of ways in SQL, I thought it would be better to do it in python but I haven't been doing a great job with my loop. Here's what I tried so far:

df['AVG'] = 0
df['FINAL'] = 0

for i in df.groupby(["ID"])['SHIP']:
    if i > 0:
          df['AVG'] = df['BAL'] / df['SALES']
          df['SHIP'] = df.groupby(["ID"])['SHIP']-1
          total = df.groupby(["ID"])["FINAL"].transform("cumsum")
          df['FINAL'] = + 1
          df['A'] = + 1
    else:
         df['FINAL'] = 0
Earl answered 14/11, 2021 at 20:37 Comment(2)
Can you explain what you mean by "So I want to spread the shipments per ID in the group one by one"? Why are you showing the [Final] column as x in your expected output when your sample code shows an integer value? Can you utilize a different data structure for iterative computations?Defy
So in the final column, I'm trying to allocate the SHIP across the FINAL column, one by one by looking at the lowest AVG column. I just used X as a placeholder of a value. I'll edit that.Earl
M
0

This was challenging because more than one row in the group can have the same average calculation. then it throws off the allocation.

This works on the example dataframe, if I understood you correctly.

d = {'ID': [1, 1, 1, 2,2,2], 'STOREID': ['str1', 'str2', 'str3','str1', 'str2', 'str3'],'BAL':[50, 6, 74, 35,5,54], 'SALES': [5, 7, 4, 3,4,7], 'SHIP': [18, 18, 18, 500,500,500]} 
df = pd.DataFrame(data=d)
df['AVG'] = 0
df['FINAL'] = 0

def calc_something(x):
    # print(x.iloc[0]['SHIP'])
    for i in range(x.iloc[0]['SHIP'])[0:500]:
          x['AVG'] = x['BAL'] / x['SALES']
          x['SHIP'] = x['SHIP']-1
          x = x.sort_values('AVG').reset_index(drop=True)
          # print(x.iloc[0, 2])
          x.iloc[0, 2] = x['BAL'][0] + 1
          x.iloc[0, 6] = x['FINAL'][0] + 1
    return x

df_final = df.groupby('ID').apply(calc_something).reset_index(drop=True).sort_values(['ID', 'STOREID'])
df_final

   ID STOREID  BAL  SALES  SHIP    AVG  FINAL
1   1    STR1   50      5     0 10.000      0
0   1    STR2   24      7     0  3.286     18
2   1    STR3   74      4     0 18.500      0
4   2    STR1  127      3     0 42.333     92
5   2    STR2  170      4     0 42.500    165
3   2    STR3  297      7     0 42.286    243
Maintop answered 15/11, 2021 at 5:13 Comment(10)
It actually gave me a different output, I did the exact same thing? I got similar IDs (which is 1)Earl
sorry, don't undersatnd what this means: "I got similar IDs (which is 1)" it's unclear how it's different unless you are using different input dataframe than what you provided above.Maintop
Yes, I'm using exactly the same, but the loop gets mixed up? Here's my data frame: d = {'ID': [1, 1, 1, 2,2,2], 'STOREID': ['str1', 'str2', 'str3','str1', 'str2', 'str3'],'BAL':[50, 6, 74, 35,5,54], 'SALES': [5, 7, 4, 3,4,7], 'SHIP': [18, 18, 18, 500,500,500]} df = pd.DataFrame(data=d)Earl
And when I tried running the code, I got an output of 5 rows will all IDs as 1 and it seems like all the SHIP was given to the smallest avg instead of spread outEarl
ran it again with your data and get it the same correct result. you aren't moving or adding columns? .iloc[0, 2] is finicky. looking for specific rows/columns. unfortunately it's hard to say why it's not working for you.Maintop
oh, for your logic, the ship goes to the small avg, and for id 1, it all goes to str2. but for id 2 the ships are spread out. is that what you mean? i think you need to look at the logic in the function as it may not actually be what you want, just what i understood you to be asking for.Maintop
That's so weird, I get a different result and I copied what you have exactly - even restarted my jupyter kernel. You are priting the df_final right? My df_final prints 5 rows with all 1 as an ID. Perhaps I need to reset index somewhere? Cos my index shows 1,4,0,3,2,5 instead of 1,0,2,4,5,3 like yoursEarl
But I do think what you have is helpful, I'm just trying to figure out why I'm getting a different output with the exact same dfEarl
updated to include your data. if you copy from d through df_final, you should end up with my result. i'm on the latest pandas and python versions, maybe check that as well.Maintop
Omg yes, it's because of the pandas version. Crazy. Thank you!Earl

© 2022 - 2024 — McMap. All rights reserved.