Rust Polars iterate ChunckedArray<StructType>
Asked Answered
C

1

0

I was following this answer but it didn't work for me on my version (polars = { version = "0.42.0", features = ["dtype-struct", "lazy", "polars-io"] }). I see that in new version into_struct returns ChunckedArray<StructType> instead of StructChuncked. It's surprising that iter in new version gives Option<()> which seems useless. Does it mean that it's impossible to iterate over ChunckedArray<StructType>? Or is there a different way of doing that? Also if you know the motivation behind that change I'd be glad to learn about that.

Chantalchantalle answered 9/9, 2024 at 14:47 Comment(4)
pub type StructChunked = ChunkedArray<StructType>;Suspect
@Suspect thanks, that makes sense. Do you know how to iterate on it's values in new version?Chantalchantalle
struct_fields to get the fields, fields_as_series to get the dataSuspect
@TimofeyChernikov I have updated my answer.Temperature
A
0
[package]
name = "pol"
version = "0.1.0"
edition = "2021"

[dependencies]
polars = {version="0.43.0",features=["mode","polars-io","csv","polars-ops","lazy","docs-selection","streaming","regex","temporal","is_unique","is_between","dtype-date","dtype-datetime","dtype-time","dtype-duration","dtype-categorical","rows","is_in","pivot"]}
polars-io = "0.43.0"
polars-lazy = "0.43.0"

There was .downcast_iter() can be used,but you need operate Chunks.It's very complicated. The essence of this problem is that Struct is not used for row iteration of dataframe. Struct is used to package complex results of custom functions into a Series in complex aggregation1. Polars row iteration has a large efficiency loss because it involves many type conversions. df.get_row(&self, idx: usize) can be used for row-wise working,but slow. In fact, the Series of dataframe can be taken out and iterated directly. We need to use itertools::multizip. This is much more efficient than the built-in df.get_row function of polars. Add itertools to your Cargo.toml:

[dependencies]
itertools = "0.13.0"

The row-wise iterator code

    use polars::prelude::*;
    use itertools::multizip;
    #[derive(Debug)]
    pub struct Person {
        id: u32,
        name: String,
        age: u32,
    }
    let df = df!(
        "id" => &[1u32,2,3],
        "name" => &["John", "Jane", "Bobby"],
        "age" => &[32u32, 28, 45]
    )
    .unwrap();

    let objects = df.take_columns();
    let id_ = objects[0].u32()?.iter();
    let name_ = objects[1].str()?.iter();
    let age_=objects[2].u32()?.iter();
    
    let combined = multizip((id_, name_, age_));
    let res: Vec<_>= combined.map(
        |(a, b, c)|{
            Person{
                id:a.unwrap(),
                name:b.unwrap().to_owned(),
                age:c.unwrap(),
            }
        }).collect();
       print!("{:?}",res);
Azriel answered 30/9, 2024 at 18:2 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.