How to get a Vec of Structs out of dataframe?
Asked Answered
K

4

0

Suppose I've got

use polars::prelude::*;
pub struct Person {
    id: u32,
    name: String,
    age: u32,
}


let df = df!(
    "id" => &[1,2,3],
    "name" => &["John", "Jane", "Bobby"],
    "age" => &[32, 28, 45]
).unwrap();

How can I get a Vec of structs out in place of this fixed definition of rows?

let rows = vec![
    Person { id: 1, name: "John".to_string(), age: 32 },
    Person { id: 2, name: "Jane".to_string(), age: 28 },
    Person { id: 3, name: "Bobby".to_string(), age: 45 },
];

In case anyone's wondering why I ask, I'm trying to use leptos-struct-table with polars built in wasm.

Kessler answered 20/2 at 20:4 Comment(0)
K
1

I turned the df into its constituent Series, zipped them together then mapped them into the Person

let rows:Vec<Person>=
    df.column("id").unwrap().i32().unwrap().into_iter()
    .zip(df.column("name").unwrap().str().unwrap().into_iter())
    .zip(df.column("age").unwrap().i32().unwrap().into_iter())
    .map(|((id, name), age)| {
        Person {
            id:id.unwrap(), 
            name:name.unwrap().to_string(), 
            age:age.unwrap()
        }
    })
    .collect();
Kessler answered 22/2 at 19:10 Comment(2)
Note that you can zip as many iters together as you like using izip!(). For style reasons, you may prefer to separate the columns into their own variables and then izip them together.Milurd
You don't have to call into_iter() on the arguments to zip.Goldshlag
O
4

You can also use df.into_struct. Please note that this feature is flagged under dtype-struct flag

polars = { version = "0.38.1", features = ["dtype-struct"] }

Here how the Rust code looks like:

use polars::prelude::*;

#[derive(Debug)]
pub struct Person {
    id: u32,
    name: String,
    age: u32,
}

fn main() {
    let df = df!(
        "id" => &[1u32,2,3],
        "name" => &["John", "Jane", "Bobby"],
        "age" => &[32u32, 28, 45]
    )
    .unwrap();
    let foo: Vec<Person> = df
        .into_struct("StructChunked")
        .iter()
        .map(|row| Person {
            id: row[0].try_extract().unwrap(),
            name: row[1].get_str().unwrap().to_string(),
            age: row[2].try_extract().unwrap(),
        })
        .collect();
    println!("{:?}", foo);
}

This will produce the following output

[Person { id: 1, name: "John", age: 32 }, Person { id: 2, name: "Jane", age: 28 }, Person { id: 3, name: "Bobby", age: 45 }]

EDIT:

It seems like the above code does not work with the latest version of rust-polars(Could be because of breaking changes). Here is the updated code, that work with the latest version.

use polars::prelude::*;

#[derive(Debug)]
pub struct Person {
    id: u32,
    name: String,
    age: u32,
}

fn main() {
    let df = df!(
        "id" => &[1u32,2,3],
        "name" => &["John", "Jane", "Bobby"],
        "age" => &[32u32, 28, 45]
    )
    .unwrap();
    let result: Vec<Person> = df
        .into_struct("struct")
        .into_series()
        .iter()
        .map(|row: AnyValue<'_>| {
            let row_values: Vec<_> = row._iter_struct_av().collect();
            Person {
                id: row_values[0].try_extract().unwrap(),
                name: row_values[1].get_str().unwrap().to_string(),
                age: row_values[2].try_extract().unwrap(),
            }
        })
        .collect();
    println!("{:?}", result);
}
Oleaceous answered 11/3 at 12:53 Comment(1)
Is there a way to reference the columns by name rather than indices when creating the Person struct?Laxative
K
1

I turned the df into its constituent Series, zipped them together then mapped them into the Person

let rows:Vec<Person>=
    df.column("id").unwrap().i32().unwrap().into_iter()
    .zip(df.column("name").unwrap().str().unwrap().into_iter())
    .zip(df.column("age").unwrap().i32().unwrap().into_iter())
    .map(|((id, name), age)| {
        Person {
            id:id.unwrap(), 
            name:name.unwrap().to_string(), 
            age:age.unwrap()
        }
    })
    .collect();
Kessler answered 22/2 at 19:10 Comment(2)
Note that you can zip as many iters together as you like using izip!(). For style reasons, you may prefer to separate the columns into their own variables and then izip them together.Milurd
You don't have to call into_iter() on the arguments to zip.Goldshlag
A
0

If you're ok with going through json (perhaps a tad expensive), it's pretty simple.

use polars::prelude::*;
use serde::Deserialize;
use std::error::Error;

#[derive(Deserialize, Debug)]
pub struct Person {
    id: u32,
    name: String,
    age: u32,
}

fn main() -> Result<(), Box<dyn Error>> {
    let mut df = df!(
        "id" => &[1,2,3],
        "name" => &["John", "Jane", "Bobby"],
        "age" => &[32, 28, 45]
    )
    .unwrap();

    let mut json = Vec::<u8>::new();
    JsonWriter::new(&mut json)
        .with_json_format(JsonFormat::Json)
        .finish(&mut df)?;
    let rows = serde_json::from_slice::<Vec<Person>>(&json)?;
    println!("{rows:?}");
    Ok(())
}
Albarran answered 20/2 at 20:47 Comment(1)
thanks for the answer but I don't want to serde it.Kessler
S
0

The essence of this problem is that Struct is not used for row iteration of dataframe. Struct is used to package complex results of custom functions into a Series in complex aggregation.

Usage of Struct,look section "For Lazyframe"

Rust Row iterator in Polars

Shawna answered 1/10 at 11:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.