how custom sort of rows in polars
Asked Answered
G

3

8

How to sort row with spesific order

df = pl.DataFrame({"currency": ["EUR","EUR","EUR","USD","USD","USD"], "alphabet": ["A","B","C","A","B","C"]})

i need to descending the currency and custom sort of alphabet

expected to be like this

currency alphabet
USD C
USD A
USD B
EUR C
EUR A
EUR B
Gehenna answered 7/3, 2023 at 4:2 Comment(0)
G
6

For example you can make your own order of pl.Categorical data using pl.StringCache.

df = pl.DataFrame({
    "currency": ["EUR","EUR","EUR","USD","USD","USD","USD"],
    "alphabet": ["A","B","C","A","B","C","A"]
})

with pl.StringCache():
    currency = sorted(["EUR", "USD"], reverse=True)
    pl.Series(["C", "A", "B", *currency]).cast(pl.Categorical)
    
    df = df.with_columns(
        pl.col(pl.Utf8).cast(pl.Categorical),
    ).sort(
        pl.col(pl.Categorical).to_physical()
    )
    
    print(df)
┌──────────┬──────────┐
│ currency ┆ alphabet │
│ ---      ┆ ---      │
│ cat      ┆ cat      │
╞══════════╪══════════╡
│ USD      ┆ C        │
│ USD      ┆ A        │
│ USD      ┆ A        │
│ USD      ┆ B        │
│ EUR      ┆ C        │
│ EUR      ┆ A        │
│ EUR      ┆ B        │
└──────────┴──────────┘
Garibay answered 7/3, 2023 at 6:45 Comment(0)
H
5

Create a polars expression that maps the "alphabet" values to numbers that respect the desired order of the column values using Expr.replace. Use the DataFrame.sort method to sort the rows first by "currency" value in descending order, and second by the previous expression value (in ascending order).

with pl.StringCache():
    df = pl.DataFrame({
        "currency": ["EUR","EUR","EUR","USD","USD","USD"], 
        "alphabet": ["A","B","C","A","B","C"]
    })

    abc_order = {val: idx for idx, val in enumerate(["C", "A", "B"])}

    res = df.sort(pl.col("currency"), 
                  pl.col("alphabet").replace_strict(abc_order),
                  descending=[True, False])

Output:

>>> res

shape: (6, 2)
┌──────────┬──────────┐
│ currency ┆ alphabet │
│ ---      ┆ ---      │
│ str      ┆ str      │
╞══════════╪══════════╡
│ USD      ┆ C        │
│ USD      ┆ A        │
│ USD      ┆ B        │
│ EUR      ┆ C        │
│ EUR      ┆ A        │
│ EUR      ┆ B        │
└──────────┴──────────┘
Heliometer answered 19/3, 2023 at 22:48 Comment(0)
T
0

The pl.Enum datatype has since been added which can simplify the process:

df.sort(
    "currency", 
    pl.col("alphabet").cast(pl.Enum(["C", "A", "B"])),
    descending = [True, False]
)
shape: (6, 2)
┌──────────┬──────────┐
│ currency ┆ alphabet │
│ ---      ┆ ---      │
│ str      ┆ str      │
╞══════════╪══════════╡
│ USD      ┆ C        │
│ USD      ┆ A        │
│ USD      ┆ B        │
│ EUR      ┆ C        │
│ EUR      ┆ A        │
│ EUR      ┆ B        │
└──────────┴──────────┘
Thirtythree answered 9/7 at 13:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.