How to do a sort of mixed values in R
Asked Answered
G

6

5

I have a data frame that I want to sort by one column than the next, (using tidyverse if possible).

I checked the below address but the solutions did not seem to work.

Order a "mixed" vector (numbers with letters)

Sample code for an example:

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

This does not give me what I want:

df <- df %>% arrange(variable, level)

The order of the level columns are as follows:

variable level channel DIR channel EA channel IA level 1000 level 500 level 750

I need them:

variable level channel DIR channel EA channel IA level 500 level 750 level 1000

There are multiple different "variables" in the real data set where half need to be sorted in number order and half in alphabetical. Does anyone know how to do this?

Graziano answered 5/4, 2018 at 20:9 Comment(0)
M
2

It's slightly ugly, but you could just split the data frame in two using filter statements, arrange each section individually, and then bind them back together:

df <- bind_rows(df %>%
              filter(!is.na(as.numeric(level))) %>%
              arrange(variable, as.numeric(level)),
          df %>%
              filter(is.na(as.numeric(level))) %>%
              arrange(variable, level))

Gives you:

# A tibble: 6 x 2
  variable level
  <chr>    <chr>
1 comp_ded 500  
2 comp_ded 750  
3 comp_ded 1000 
4 channel  DIR  
5 channel  EA   
6 channel  IA   
Macaroni answered 5/4, 2018 at 20:28 Comment(0)
A
3

The simplest solution would be to use dplyr::group_by.

library(dplyr)

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

df %>%
  group_by(variable, level) %>%
  arrange()

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000
Archerfish answered 5/4, 2018 at 21:57 Comment(0)
M
2

It's slightly ugly, but you could just split the data frame in two using filter statements, arrange each section individually, and then bind them back together:

df <- bind_rows(df %>%
              filter(!is.na(as.numeric(level))) %>%
              arrange(variable, as.numeric(level)),
          df %>%
              filter(is.na(as.numeric(level))) %>%
              arrange(variable, level))

Gives you:

# A tibble: 6 x 2
  variable level
  <chr>    <chr>
1 comp_ded 500  
2 comp_ded 750  
3 comp_ded 1000 
4 channel  DIR  
5 channel  EA   
6 channel  IA   
Macaroni answered 5/4, 2018 at 20:28 Comment(0)
G
2

Using gtools, a slightly shorter solution which uses mixedorder:

library(gtools)
sorteddf <- df[with(df, order(variable, mixedorder(level))),]

Output:

  variable level
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000
Geology answered 5/4, 2018 at 20:53 Comment(0)
R
1

You could create a temporary variable for sorting. Once you've sorted in the desired order, you can also set the order permanently by converting to factor (as in @Vio's answer). Maybe something like this:

df = df %>% 
  mutate(tmp = as.numeric(level)) %>% 
  arrange(variable, tmp, level) %>% 
  select(-tmp) %>% 
  mutate(level = factor(level, levels=unique(level)))
  variable level
  <chr>    <fct>
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000

I think you can also shorten this by not explicitly creating a temporary variable, and instead using an "anonymous" variable inside arrange:

df = df %>% 
  arrange(variable, as.numeric(level), level) %>% 
  mutate(level = factor(level, levels=unique(level)))
Ribband answered 5/4, 2018 at 20:34 Comment(0)
S
1

Convert to factor and change the levels. Even easier with forcats::fct_relevel()

# Convert to factor
df <- as_tibble(cbind(variable, level)) %>%
  mutate(level = as.factor(level))

# Change order of levels
levels(df$level) = levels(df$level)[match(c("DIR", "EA", "IA", "500", "750", "1000"), levels(df$level))]

df %>% arrange(level)

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000
Solmization answered 5/4, 2018 at 20:35 Comment(4)
This makes sense for the example. My real data set has 220 observationsGraziano
@Graziano It changes the levels of the factor. It will work on any number of observations since you are sorting factors (which is sorted by the ordering of its level). If you have more levels, define their ordering manually. Or if you are looking for simple A-Z, 0-9 sort for more than 6 factor levels, then split sorting is probably the best way.Solmization
split sorting? I'll have to look that up.. It's 10 different variables and levels within those. Thanks.Graziano
By split, I meant the way divibisan's answer is sorting.Solmization
M
0

I think it's much easier to sort by as.numeric(level) first, then by level:

df %>% arrange(variable, as.numeric(level), level)

Gives:

# A tibble: 6 x 2
variable level
<chr>    <chr>
1 channel  DIR
2 channel  EA
3 channel  IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000 
Melda answered 16/3, 2021 at 11:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.