Rust Polars: Is it possible to explode a list column into multiple columns?
Asked Answered
W

2

5

I have a function which returns a list type column. Hence, one of my columns is a list. I'd like to turn this list column into multiple columns. For example:

use polars::prelude::*;
use polars::df;

fn main() {
    let s0 = Series::new("a", &[1i64, 2, 3]);
    let s1 = Series::new("b", &[1i64, 1, 1]);
    let s2 = Series::new("c", &[Some(2i64), None, None]);
    // construct a new ListChunked for a slice of Series.
    let list = Series::new("foo", &[s0, s1, s2]);

    // construct a few more Series.
    let s0 = Series::new("Group", ["A", "B", "A"]);
    let s1 = Series::new("Cost", [1, 1, 1]);
    let df = DataFrame::new(vec![s0, s1, list]).unwrap();

    dbg!(df);

At this stage DF looks like this:

┌───────┬──────┬─────────────────┐
│ Group ┆ Cost ┆ foo             │
│ ---   ┆ ---  ┆ ---             │
│ str   ┆ i32  ┆ list [i64]      │
╞═══════╪══════╪═════════════════╡
│ A     ┆ 1    ┆ [1, 2, 3]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ [1, 1, 1]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ [2, null, null] │

Question From here, I'd like to get:

┌───────┬──────┬─────┬──────┬──────┐
│ Group ┆ Cost ┆ a   ┆ b    ┆ c    │
│ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  │
│ str   ┆ i32  ┆ i64 ┆ i64  ┆ i64  │
╞═══════╪══════╪═════╪══════╪══════╡
│ A     ┆ 1    ┆ 1   ┆ 2    ┆ 3    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ 1   ┆ 1    ┆ 1    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ 2   ┆ null ┆ null │

So I need something like .explode() but column-wise orient. Is there an existent funciton for this or a workaround potentially?

Many thanks

Winterkill answered 3/6, 2022 at 13:24 Comment(0)
S
9

Yes you can. Via polars lazy, we get access the to the expression API and we can use the list() namespace, to get elements by index.

let out = df
    .lazy()
    .select([
        all().exclude(["foo"]),
        col("foo").list().get(0).alias("a"),
        col("foo").list().get(1).alias("b"),
        col("foo").list().get(2).alias("c"),
    ])
    .collect()?;
dbg!(out);
┌───────┬──────┬─────┬──────┬──────┐
│ Group ┆ Cost ┆ a   ┆ b    ┆ c    │
│ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  │
│ str   ┆ i32  ┆ i64 ┆ i64  ┆ i64  │
╞═══════╪══════╪═════╪══════╪══════╡
│ A     ┆ 1    ┆ 1   ┆ 2    ┆ 3    │
│ B     ┆ 1    ┆ 1   ┆ 1    ┆ 1    │
│ A     ┆ 1    ┆ 2   ┆ null ┆ null │
└───────┴──────┴─────┴──────┴──────┘

Subsistent answered 3/6, 2022 at 15:9 Comment(5)
Hey @ritchie, thank you for this great library. I noticed that if my s2 was shorter: let s2 = Series::new("c", &[Some(2i64), None]) then row 3 column C would be "2". What happens if a list len is smaller than the argument of .get()? it seems it iterates over the list once, then once it returns first element of the list, and then returns nulls. So in the above example if we were to expand with many arr().get(x) we'd get 2, null, 2, null, null, null.... Is that expected?Winterkill
I believe that is a bug we already fixed in master. I will release a new version to crates.io this week. You can use master until then.Subsistent
I am getting an error 'ExprListNameSpace' object is not callable when using .agg() on py-polars. Switching to col("foo").arr.get(0).alias("a") fixes the issue.Ripleigh
What if length of lists is variable?Hyaluronidase
Looks like the arr() function got replaced by the list property/namespace. See pola-rs.github.io/polars-book/user-guide/expressions/lists/…Propane
P
0

This code was tested on Rust v1.67 for polars in v0.27.2.

    let out = df
    .lazy()
    .with_columns([
        col("foo").arr().get(lit(0)).alias("a"),
        col("foo").arr().get(lit(1)).alias("b"),
        col("foo").arr().get(lit(2)).alias("c"),
    ])
    .drop_columns(["foo"])
    .collect()?;

    println!("out:\n{out}");

Another way using for loop:

    let mut lazyframe = df.lazy();
    let column_name: Vec<char> = ('a'..='z').into_iter().collect();

    for (index, ch) in column_name.iter().enumerate().take(3) {
        lazyframe = lazyframe
        .with_columns([
            // split list into new columns
            col("foo").arr().get(lit(index as i64)).alias(&ch.to_string()),
        ])
    }

    let out = lazyframe
    .drop_columns(["foo"])
    .collect()?;

    println!("out:\n{out}");
Pridgen answered 17/2, 2023 at 1:26 Comment(2)
for the python polars .arr was renamed to .list --> pola-rs.github.io/polars/py-polars/html/reference/expressions/…Blacking
Renamed to .list() in rust tooNazario

© 2022 - 2025 — McMap. All rights reserved.