Transpose of Julia DataFrame
Asked Answered
A

4

7

Let's create Julia DataFrame

df=convert(DataFrame, rand(10, 4))

It would look like this. I am trying to take the transpose of this dataFrame. "transpose" function appears to be not working for Julia Data Frame as shown below.

enter image description here

I have used Python Pandas dataframe package extensively in the past. In Python, it would be as easy as "df.T" Please let me know a way to Tranpose this dataframe.

Ameline answered 6/6, 2016 at 23:27 Comment(4)
Usually you would like to transpose a matrix. Which is as easy as M' in Julia. If the matrix is embedded in a DataFrame, convert it to a matrix, transpose and then (if you must) back to a DataFrame. In the OP this would be DataFrame(Matrix(df)')Kithara
A great suggestion, But, that would mean the row names and column names in the original dataframe "df" are no longer respected.Ameline
Not sure there are any rownames in DataFrame. Have a look at NamedArrays. They support transpose and have row,column,dimension naming.Kithara
Careful that ' is conjugate transpose, whereas regular transpose is .'. In practice, conjugate transpose is more common, which is why the syntax was chosen this way.Pontifex
M
9

I had the same question and tried the strategy suggested in the comments to your question. The problem I encountered, however, is that converting to a Matrix won't work if your DataFrame has NA values. You have to change them to something else, then convert to a Matrix. I had a lot of problems converting back to NA when I wanted to get from a Matrix back to a DataFrame type.

Here's a way to do it using DataFrame's stack and unstack functions.

julia> using DataFrames

julia> df = DataFrame(A = 1:4, B = 5:8)
4×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 2     │ 6     │
│ 3   │ 3     │ 7     │
│ 4   │ 4     │ 8     │

julia> colnames = names(df)
2-element Array{Symbol,1}:
 :A
 :B

julia> df[!, :id] = 1:size(df, 1)
1:4

julia> df
4×3 DataFrame
│ Row │ A     │ B     │ id    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 1     │
│ 2   │ 2     │ 6     │ 2     │
│ 3   │ 3     │ 7     │ 3     │
│ 4   │ 4     │ 8     │ 4     │

Adding the :id column is suggested by the DataFrame documentation as a way to help with unstacking.

Now stack the columns you want to transpose:

julia> dfl = stack(df, colnames)
8×3 DataFrame
│ Row │ variable │ value │ id    │
│     │ Symbol   │ Int64 │ Int64 │
├─────┼──────────┼───────┼───────┤
│ 1   │ A        │ 1     │ 1     │
│ 2   │ A        │ 2     │ 2     │
│ 3   │ A        │ 3     │ 3     │
│ 4   │ A        │ 4     │ 4     │
│ 5   │ B        │ 5     │ 1     │
│ 6   │ B        │ 6     │ 2     │
│ 7   │ B        │ 7     │ 3     │
│ 8   │ B        │ 8     │ 4     │

Then unstack, switching the id and variable names (this is why adding the :id column is necessary).

julia> dfnew = unstack(dfl, :variable, :id, :value)
2×5 DataFrame
│ Row │ variable │ 1      │ 2      │ 3      │ 4      │
│     │ Symbol   │ Int64⍰ │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼──────────┼────────┼────────┼────────┼────────┤
│ 1   │ A        │ 1      │ 2      │ 3      │ 4      │
│ 2   │ B        │ 5      │ 6      │ 7      │ 8      │
Mcginnis answered 16/6, 2016 at 1:39 Comment(0)
C
10

The problem with Stephen answer, is that order of columns is not preserved (try if you are not convinced with the following DataFrame

julia> df = DataFrame(A = 1:4, B = 5:8, AA = 15:18)
4×3 DataFrame
│ Row │ A     │ B     │ AA    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 15    │
│ 2   │ 2     │ 6     │ 16    │
│ 3   │ 3     │ 7     │ 17    │
│ 4   │ 4     │ 8     │ 18    │

but this DataFrame can be transposed (keeping order of columns/rows) using:

julia> DataFrame([[names(df)]; collect.(eachrow(df))], [:column; Symbol.(axes(df, 1))])
3×5 DataFrame
│ Row │ column │ 1     │ 2     │ 3     │ 4     │
│     │ Symbol │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼────────┼───────┼───────┼───────┼───────┤
│ 1   │ A      │ 1     │ 2     │ 3     │ 4     │
│ 2   │ B      │ 5     │ 6     │ 7     │ 8     │
│ 3   │ AA     │ 15    │ 16    │ 17    │ 18    │

Reference: https://github.com/JuliaData/DataFrames.jl/issues/2065#issuecomment-568937464

Concurrent answered 26/12, 2019 at 8:36 Comment(0)
M
9

I had the same question and tried the strategy suggested in the comments to your question. The problem I encountered, however, is that converting to a Matrix won't work if your DataFrame has NA values. You have to change them to something else, then convert to a Matrix. I had a lot of problems converting back to NA when I wanted to get from a Matrix back to a DataFrame type.

Here's a way to do it using DataFrame's stack and unstack functions.

julia> using DataFrames

julia> df = DataFrame(A = 1:4, B = 5:8)
4×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 2     │ 6     │
│ 3   │ 3     │ 7     │
│ 4   │ 4     │ 8     │

julia> colnames = names(df)
2-element Array{Symbol,1}:
 :A
 :B

julia> df[!, :id] = 1:size(df, 1)
1:4

julia> df
4×3 DataFrame
│ Row │ A     │ B     │ id    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 1     │
│ 2   │ 2     │ 6     │ 2     │
│ 3   │ 3     │ 7     │ 3     │
│ 4   │ 4     │ 8     │ 4     │

Adding the :id column is suggested by the DataFrame documentation as a way to help with unstacking.

Now stack the columns you want to transpose:

julia> dfl = stack(df, colnames)
8×3 DataFrame
│ Row │ variable │ value │ id    │
│     │ Symbol   │ Int64 │ Int64 │
├─────┼──────────┼───────┼───────┤
│ 1   │ A        │ 1     │ 1     │
│ 2   │ A        │ 2     │ 2     │
│ 3   │ A        │ 3     │ 3     │
│ 4   │ A        │ 4     │ 4     │
│ 5   │ B        │ 5     │ 1     │
│ 6   │ B        │ 6     │ 2     │
│ 7   │ B        │ 7     │ 3     │
│ 8   │ B        │ 8     │ 4     │

Then unstack, switching the id and variable names (this is why adding the :id column is necessary).

julia> dfnew = unstack(dfl, :variable, :id, :value)
2×5 DataFrame
│ Row │ variable │ 1      │ 2      │ 3      │ 4      │
│     │ Symbol   │ Int64⍰ │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼──────────┼────────┼────────┼────────┼────────┤
│ 1   │ A        │ 1      │ 2      │ 3      │ 4      │
│ 2   │ B        │ 5      │ 6      │ 7      │ 8      │
Mcginnis answered 16/6, 2016 at 1:39 Comment(0)
P
1

permutedims does this.

Often when you want to transpose a dataframe, you already have a column with names (Strings or Symbols):



In the asker's random matrix example..

df = DataFrame(rand(10, 4), :auto)

..there aren't any names for the new columns. So we'll use the row numbers:

df.id = string.(1:nrow(df))  # Add column with names
permutedims(df, "id", "")

We used the optional third argument of permutedims to rename the new id column to the empty string, which is not necessary but can be nice.

Pertinent answered 9/9, 2022 at 19:33 Comment(0)
D
0

This works with dataframes that are not too complicated. One of the dataframe's columns is used to generate column names. The names of the other columns become row names.

function all_unique(v::Vector)::Bool
    return length(unique(v)) == length(v)
end

function df_add_first_column(
    df::DataFrame,
    colname::Union{Symbol,String},
    col_data
)
    df1 = DataFrame([colname => col_data])
    hcat(df1, df)
end

function df_transpose(df::DataFrame, col::Union{Symbol, String})::DataFrame
    @assert all_unique(df[!, col]) "Column `col` contains non-unique elements"

    function foo(i)
        string(df[i, col]) => collect(df[i, Not(col)])
    end

    dft = DataFrame(map(foo, 1:nrow(df)))

    return df_add_first_column(dft, "Row", filter(x -> x != string(col), names(df)))
end

Example:

df0 = DataFrame(A = [1, 2, 3], B = rand(3), C = rand(3))

    3×3 DataFrame
 Row │ A      B         C        
     │ Int64  Float64   Float64
─────┼───────────────────────────
   1 │     1  0.578605  0.590092
   2 │     2  0.350394  0.399114
   3 │     3  0.90852   0.710629

2×4 DataFrame
 Row │ Row     1         2         3        
     │ String  Float64   Float64   Float64
─────┼──────────────────────────────────────
   1 │ B       0.578605  0.350394  0.90852
   2 │ C       0.590092  0.399114  0.710629
Dollfuss answered 27/2, 2022 at 1:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.