As a minimum example, let's say we have next polars.DataFrame:
df = pl.DataFrame({"sub_id": [1,2,3], "engagement": ["one:one,two:two", "one:two,two:one", "one:one"], "total_duration": [123, 456, 789]})
sub_id | engagement | total_duration |
---|---|---|
1 | one:one,two:two | 123 |
2 | one:two,two:one | 456 |
3 | one:one | 789 |
then, we explode "engagement" column
df = df.with_columns(pl.col("engagement").str.split(",")).explode("engagement")
and receive:
sub_id | engagement | total_duration |
---|---|---|
1 | one:one | 123 |
1 | two:two | 123 |
2 | one:two | 456 |
2 | two:one | 456 |
3 | one:one | 789 |
For visualization I use Plotly, and code would be following:
import plotly.express as px
fig = px.bar(df, x="sub_id", y="total_duration", color="engagement")
fig.show()
Resulting plot:
Now it basically means that subscribers 1 and 2 have their total_duration (total watched time) doubled. How could I remain total_duration per sub, but leaving engagement groups as shown on the plot legend?