How to provide reproducible Sample Data in Julia
Asked Answered
A

2

7

Here on stackoverflow.com - when I provide sample data to make a reproducible example, how can I do it the Julian way?

In R for example dput(df) will output a string with which you can create df again. Hence, you just post the result here on stackoverflow and bam! - reproducible example. So, how should one do it in Julia?

Amateurish answered 11/5, 2020 at 6:11 Comment(0)
O
10

I think the easiest thing to do generally is to simply construct an MWE DataFrame with random numbers etc in your example, so there's no need to read/write out.

In situations where that's inconvenient, you might consider writing out to an IO buffer and taking the string representation of that, which people can then read back in the same way in reverse:

julia> using CSV, DataFrames

julia> df = DataFrame(a = rand(5), b = rand(1:10, 5));

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> string_representation = String(take!(CSV.write(io, df)))
"a,b\n0.5613453808585873,9\n0.3308122459718885,6\n0.631520224612919,9\n0.3533712075535982,3\n0.35289980394398723,9\n"

julia> CSV.read(IOBuffer(string_representation))
5×2 DataFrame
│ Row │ a        │ b     │
│     │ Float64  │ Int64 │
├─────┼──────────┼───────┤
│ 1   │ 0.561345 │ 9     │
│ 2   │ 0.330812 │ 6     │
│ 3   │ 0.63152  │ 9     │
│ 4   │ 0.353371 │ 3     │
│ 5   │ 0.3529   │ 9     │
Orose answered 11/5, 2020 at 7:14 Comment(5)
The following line is missing: using Random; Random.seed!(0). You need to have a fixed seed to ensure reproducibility. [Moreover, it is worth noting that the way random numbers are generated in Julia 1.5 will change and one recommended option is StableRNGs.jl.]Nicolis
No, all you need is to read from the string_representation as shown.Backwoods
I guess generating reproducible random numbers across computers and Julia versions is a slightly different, although related question - I read this question as the simple case of "how do I copy/paste a given DataFrame into SO for a question?", using rand in my answer was maybe a distraction...Orose
I also read it as "how do I copy/paste a given DataFrame into SO for a question?". The other question would "How to create reproducible random numbers in Julia".Amateurish
Another note on this: you can't do CSV.write(io, df), CSV.read(io, DataFrame). You need the IOBuffer to contain a String. You need the intermediate string representation to debug CSVLaunder
J
0

Here is one way to mimic the behaviour of R's dput in Julia:

julia> using DataFrames

julia> using Random; Random.seed!(0);

julia> df = DataFrame(a = rand(3), b = rand(1:10, 3))
3×2 DataFrame
 Row │ a          b
     │ Float64    Int64
─────┼──────────────────
   1 │ 0.405699       1
   2 │ 0.0685458      7
   3 │ 0.862141       2

julia> julian_dput(x) = invoke(show, Tuple{typeof(stdout), Any}, stdout, df);

julia> julian_dput(df)
DataFrame(AbstractVector[[0.4056994708920292, 0.06854582438651502, 0.8621408571954849], [1, 7, 2]], DataFrames.Index(Dict(:a => 1, :b => 2), [:a, :b]))

That is, julian_dput() takes a DataFrame as input and returns a string that can generate the input.

Source: https://discourse.julialang.org/t/given-an-object-return-julia-code-that-defines-the-object/80579/12

Jacquie answered 12/8, 2022 at 16:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.