Replace specific values in Julia Dataframe column with random value
Asked Answered
G

2

13

I'm looking for a way to replace values in Dataframe column with random numbers. They should be different in every row where the substitution was performed.

For example replacing "X" with random numbers drawn from 100:120 range

julia> df = DataFrame(:a=>[1,2,"X","X",5,"X"],)
6×1 DataFrame
│ Row │ a   │
│     │ Any │
├─────┼─────┤
│ 1   │ 1   │
│ 2   │ 2   │
│ 3   │ X   │
│ 4   │ X   │
│ 5   │ 5   │
│ 6   │ X   │

* Replacing X with random values in 100:120 *

julia> df
6×1 DataFrame
│ Row │ a   │
│     │ Any │
├─────┼─────┤
│ 1   │ 1   │
│ 2   │ 2   │
│ 3   │ 103 │
│ 4   │ 110 │
│ 5   │ 5   │
│ 6   │ 116 │

I've tried using replace but rand is evaluated before replace:

julia> replace!(df.a,"X"=> rand(100:120))
julia> df
6×1 DataFrame
│ Row │ a   │
│     │ Any │
├─────┼─────┤
│ 1   │ 1   │
│ 2   │ 2   │
│ 3   │ 115 │
│ 4   │ 115 │
│ 5   │ 5   │
│ 6   │ 115 │
Greenheart answered 7/12, 2022 at 13:37 Comment(1)
If you can arrange to have missing appear instead of "X" it would be more Julian. Then, Przemyslaw's one-liner would be: replace!(x -> @coalesce(x, rand(100:120)), df.a) which is more readable.Keyser
C
12

A one liner could be:

replace!( x-> x=="X" ? rand() : x, df.a)
Celebrate answered 7/12, 2022 at 14:4 Comment(0)
P
6

Alternatively using operation specification syntax:

transform!(df, :a => ByRow(a -> a == "X" ? rand() : a)=> :a)

or you can do:

rand!(view(df.a, df.a .== "X"), Float64)

for another in-place approach (but replace! will likely be more efficient)

Putdown answered 7/12, 2022 at 14:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.