Julia: How to create a new column in DataFrames.jl by adding two columns using `transform` or `@transform`?
Asked Answered
C

2

6
using DataFrames

df = DataFrame(a=1:3, b=1:3)

How do I create a new column c such that c = a+b element wise?

Can't figure it out by reading the transform doc.

I know that

df[!, :c] = df.a .+ df.b

works but I want to use transform in a chain like this

@chain df begin
  @transform(c = :a .+ :b)
  @where(...)
  groupby(...)
end

The above syntax doesn't work with DataFramesMeta.jl

Caliph answered 26/4, 2021 at 6:21 Comment(1)
great question. In my work I try to organize transformations in compact pipelines like this, I came here to learn this very thing.Unwary
P
10

This is an answer using DataFrames.jl.

To create a new data frame:

julia> transform(df, [:a,:b] => (+) => :c)
3×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1      2
   2 │     2      2      4
   3 │     3      3      6

and for an in-place operation:

julia> transform!(df, [:a,:b] => (+) => :c)
3×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1      2
   2 │     2      2      4
   3 │     3      3      6

or

julia> insertcols!(df, :c => df.a + df.b)
3×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1      2
   2 │     2      2      4
   3 │     3      3      6

The difference between transform! and insertcols! is that insertcols! will error if :c column is present in the data frame, while transform! will overwrite it.

Penetralia answered 26/4, 2021 at 7:21 Comment(2)
Do you know how I might do this if the operation I need to do is complicated? For example, suppose I want to create a new column where c[1] = a[1], c[2] = a[1] + a[2], c[3] = a[1] + a[2] + a[3], etc. I can calculate the elements easily enough with a for loop, but I'm not sure how to do it with transform and insertcols.Stulin
use the transform(df, :a => cumsum => :c).Deservedly
I
0

The syntax should be:

@chain df begin
  @transform!(@byrow :c = :a + :b)
  ...
end;
Ideogram answered 23/2, 2023 at 23:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.