If you activate the rows
feature in polars, you can try:
DataFrame::get_row
and DataFrame::get_row_amortized
.
The latter is preferred, as that reduces heap allocations by reusing the row buffer.
Anti-pattern
This will be slow. Asking for rows from a columnar data storage will incur many cache misses and goes trough several layers of indirection.
Slightly better
What would be slightly better is using rust iterators. This will have less indirection than the get_row
methods.
df.as_single_chunk_par();
let mut iters = df.columns(["foo", "bar", "ham"])?
.iter().map(|s| s.iter()).collect::<Vec<_>>();
for row in 0..df.height() {
for iter in &mut iters {
let value = iter.next().expect("should have as many iterations as rows");
// process value
}
}
If your DataFrame
consists of a single data type, you should downcast the Series
to a ChunkedArray
, this will speed up iteration.
In the snippet below, we'll assume the data type is Float64
.
let mut iters = df.columns(["foo", "bar", "ham"])?
.iter().map(|s| Ok(s.f64()?.into_iter())).collect::<Result<Vec<_>>>()?;
for row in 0..df.height() {
for iter in &mut iters {
let value = iter.next().expect("should have as many iterations as rows");
// process value
}
}