Need to expand an inventory journal (log) pandas dataframe to include all dates per product id
Asked Answered
W

1

14

I have an inventory journal that contains products and their relative inventory qty (resulting_qty) as well as the loss/gain every time inventory is added or subtracted (delta_qty).

The issue is that inventory records do not get updated daily, rather they are only updated when a change in inventory occurs. For this reason, it is difficult to extract the total inventory qty for all items on a given day, because some items are not recorded on certain days, despite the fact that they do have available inventory given their last entry resulting_qty was greater than 0. Logically, this would mean that an item went without a change in qty for a certain amount of days equal to the number of days between the max date and the last recorded date.

my data looks something like this, except in reality there are thousands of product ids

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2017-03-06 | 2017-03-06 12:24:22 | A   | 0         | 0.0           |
| 2017-03-31 | 2017-03-31 02:43:11 | A   | 3         | 3.0           |
| 2017-04-08 | 2017-04-08 22:04:35 | A   | -1        | 2.0           |
| 2017-04-12 | 2017-04-12 18:26:39 | A   | -1        | 1.0           |
| 2017-04-19 | 2017-04-19 09:15:38 | A   | -1        | 0.0           |
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-05-09 16:25:41 | C   | 2         | 2.0           |

Essentially, I need to make the data look something more like this so that I can simply pull a date and get the sum of total inventory for a given day when grouping by date (e.g. df.groupby(date).resulting_qty.sum()):

Note I removed the PID= A rows due to character limitations, but I hope you get the idea:

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-06 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-07 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-08 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-09 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-10 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-11 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-12 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-13 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-14 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-15 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-16 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-17 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-18 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-19 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-20 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-21 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-24 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-25 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-26 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-27 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-28 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-29 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-30 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-01 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-02 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-03 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-04 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-05 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-06 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-07 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-08 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-10 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |

So far what I've done was created a series of loops that generates a date range between the min date of the product lifecycle and the max date of all products. I then append the last recorded row values as a new row with a new date if there is no information for said new date. I append these to lists, and then generate a new dataframe with the updated lists. The code is terribly slow and takes 2+ hours to complete on the total dataset:

date_list = []
pid_list= []
time_stamp_list = []
delta_qty_list = []
resulting_qty_list = []


timer = len(test.product_id.unique().tolist())
counter = 0
for product in test.product_id.unique().tolist():
    counter+=1
    print((counter/timer)*100)
    temp_df = test.query(f'product_id=={product}', engine='python')
    for idx,date in enumerate(pd.date_range(temp_df.index.min(),test.index.max()).tolist()):
        min_date= temp_df.index.min()
        if date.date() == min_date:
            date2=min_date
            pid = temp_df.loc[date2]['product_id']
            timestamp = temp_df.loc[date2]['timestamp']
            delta_qty = temp_df.loc[date2]['delta_qty']
            resulting_qty = temp_df.loc[date2]['resulting_qty']
            date_list.append(date2)
            pid_list.append(pid)
            delta_qty_list.append(delta_qty)
            time_stamp_list.append(timestamp)
            resulting_qty_list.append(resulting_qty)
        else:

            if date.date() in temp_df.index:
                date2= date.date()
                pid = temp_df.loc[date2]['product_id']
                timestamp = temp_df.loc[date2]['timestamp']
                delta_qty = temp_df.loc[date2]['delta_qty']
                resulting_qty = temp_df.loc[date2]['resulting_qty']
                date_list.append(date2)
                pid_list.append(pid)
                delta_qty_list.append(delta_qty)
                time_stamp_list.append(timestamp)
                resulting_qty_list.append(resulting_qty)
            elif date.date() > date2:
                date_list.append(date.date())
                pid_list.append(pid)
                time_stamp_list.append(timestamp)
                delta_qty_list.append(delta_qty)
                resulting_qty_list.append(resulting_qty)
            else:
                pass

Can someone please help me to understand what is the right way I should approach this as I'm 100% sure this is not the best approach.

Thank you

Wiseacre answered 12/5, 2019 at 19:53 Comment(0)
C
10

The idea here is to reindex the DataFrame to fill the gaps you have.

Setup a DataFrame generated using your sample:

from io import StringIO

buffer = StringIO()
buffer.write('''\
date|timestamp|pid|delta_qty|resulting_qty
2017-03-06|2017-03-06 12:24:22|A|0|0.0          
2017-03-31|2017-03-31 02:43:11|A|3|3.0          
2017-04-08|2017-04-08 22:04:35|A|-1|2.0          
2017-04-12|2017-04-12 18:26:39|A|-1|1.0          
2017-04-19|2017-04-19 09:15:38|A|-1|0.0          
2019-01-16|2019-01-16 23:37:17|B|0|0.0          
2019-01-19|2019-01-19 09:40:38|C|0|0.0          
2019-04-05|2019-04-05 16:40:32|B|2|2.0          
2019-04-22|2019-04-22 11:06:33|B|-1|1.0          
2019-04-23|2019-04-23 13:23:17|B|-1|0.0          
2019-05-09|2019-05-09 16:25:41|C|2|2.0          
''')
buffer.seek(0)

df = pd.read_csv(buffer, sep='|', parse_dates=['date', 'timestamp'])

First, we generate a new, gap-less index between the min and max dates for each product. This has the effect of having no rows for a product after the last existing update, per your example. However, this step is easily customizable to suit the exact requirements you have. E.g., if you want the dates to span from first product entry to today, you can just set start and end manually.

from itertools import chain, cycle

date_ranges = df.groupby('pid').agg({'date': ['min', 'max']})

pairs = (zip(cycle([pid]), pd.date_range(start, end)) 
         for pid, (start, end) in date_ranges.iterrows())
new_index = pd.Index(chain.from_iterable(pairs), name=['pid', 'date'])

Then we apply the new index. Here we have two options:

  1. As per your example, we continue filling based on the last update exactly as is
  2. Fill delta_qty with 0 and the remaining columns per the last update (this is deviating from your request, but seemed logical and is only a minor change)

In either case, the two essential concepts are the .reindex method and the .fillna method. We can use reindex to expand the dense DataFrame to include all the dates but have sparse data. Then, we fill in the nans with the proper data. Since we're forward-padding from the last update, we want to specify method='ffill' per the docs

Method 1:

# this fills the rows per last update
results = df.set_index(['pid', 'date'])\
    .reindex(new_index).reset_index()
results.fillna(method='ffill', inplace=True)

This returns

    pid       date           timestamp  delta_qty  resulting_qty
0     A 2017-03-06 2017-03-06 12:24:22        0.0            0.0
1     A 2017-03-07 2017-03-06 12:24:22        0.0            0.0
2     A 2017-03-08 2017-03-06 12:24:22        0.0            0.0
3     A 2017-03-09 2017-03-06 12:24:22        0.0            0.0
..   ..        ...                 ...        ...            ...
24    A 2017-03-30 2017-03-06 12:24:22        0.0            0.0
25    A 2017-03-31 2017-03-31 02:43:11        3.0            3.0
..   ..        ...                 ...        ...            ...
29    A 2017-04-04 2017-03-31 02:43:11        3.0            3.0

for pid == 'A'

Method 2:

results = df.set_index(['pid', 'date'])\
    .reindex(new_index).reset_index()
results['delta_qty'].fillna(0, inplace=True)
results.fillna(method='ffill', inplace=True)

This returns:

    pid       date           timestamp  delta_qty  resulting_qty
0     A 2017-03-06 2017-03-06 12:24:22        0.0            0.0
1     A 2017-03-07 2017-03-06 12:24:22        0.0            0.0
2     A 2017-03-08 2017-03-06 12:24:22        0.0            0.0
3     A 2017-03-09 2017-03-06 12:24:22        0.0            0.0
..   ..        ...                 ...        ...            ...
24    A 2017-03-30 2017-03-06 12:24:22        0.0            0.0
25    A 2017-03-31 2017-03-31 02:43:11        3.0            3.0
..   ..        ...                 ...        ...            ...
29    A 2017-04-04 2017-03-31 02:43:11        0.0            3.0
Calcar answered 15/5, 2019 at 0:48 Comment(1)
Thank you so much for your stringIO code! I've been looking for a way to easily put data people post here into a dataframe a lot faster. This will save me a ton of time :DMiskolc

© 2022 - 2024 — McMap. All rights reserved.