Order a number by increased digits [duplicate]

Asked 6/10, 2023 at 12:26 Answered 6/10, 2023 at 12:37

Apologize if it has been already asked but I'm trying to "change" the order of digits of numbers in a vector.

Let's take as an example this vector:

vector = c("5213456","17235896","23731074")

I'd like to have results as follows :

"1234556","12356789","01233477"

I know it's a bit weird to ask this but I have combinations of numbers in a column of my dataframe, and I spotted some duplicated cases but cannot be filtered with a simple unique() function or something else as digits are not ordered in the same way.

Thanks a lot.

Trilby answered 6/10, 2023 at 12:26 Comment(1)

some tutorial u can follow here geeksforgeeks.org/… geeksforgeeks.org/sort-string-characters – Shamblin 6/10, 2023 at 12:37

I'd go with this:

stringr::str_split(vector, '') |> 
  purrr::map_chr(~ sort(.x) |> paste(collapse=''))
[1] "1234556"  "12356789" "01233477"

Chert answered 6/10, 2023 at 12:29 Comment(0)

In base R, you could split, then sort, then paste it all together. strsplit creates a list, so using lapply or sapply to iterate through it:

unlist(lapply(strsplit(vector, ""), \(x) paste(sort(x), collapse = "")))

# or (thanks @Robert Hacken!)
sapply(strsplit(vector, ""), \(x) paste(sort(x), collapse = ""))

# [1] "1234556"  "12356789" "01233477"

Among the answers, it looks like lapply and vapply are the fastest on the example data, but in longer vectors things seems to even out:

microbenchmark::microbenchmark(
  lapply = unlist(lapply(strsplit(vector, ""), \(x) paste(sort(x), collapse = ""))),
  sapply = sapply(strsplit(vector, ""), \(x) paste(sort(x), collapse = "")),
  Thomas = unname(sapply(vector, \(x) intToUtf8(sort(utf8ToInt(x))))),
  Mael_vapply = vapply(strsplit(vector, NULL), \(x) paste(sort(x), collapse = ''), ''),
  geotheory = stringr::str_split(vector, '') |> 
  purrr::map_chr(~ sort(.x) |> paste(collapse=''))
)

        expr     min       lq      mean   median       uq      max neval
      lapply  69.701  75.6775  95.79383  79.6590  87.3620 1488.421   100
      sapply  78.522  88.6830 115.97092  93.4425 102.1400 1942.413   100
      Thomas 123.368 143.3830 174.09282 155.7950 165.8135 1900.921   100
 Mael_vapply  68.167  76.1265  97.90242  79.9230  84.5445 1672.561   100
   geotheory 206.529 224.7750 249.77152 240.1000 264.5795  407.786   100

longvec <- rep(vector, 1e4)
# vector of length 30,000

        expr      min        lq      mean    median        uq      max neval
      lapply 623.6649  727.6631  940.1815  817.8032 1023.8237 1953.855   100
      sapply 589.1852  770.0218  941.9850  836.4549 1013.4217 1963.823   100
      Thomas 994.9503 1223.2654 1637.0953 1361.5117 1860.2610 3250.105   100
 Mael_vapply 615.7141  759.3814  922.8012  821.4991  998.0893 1807.163   100
   geotheory 664.4530  810.4600  984.8990  879.5722 1031.2233 2103.203   100

Conceptacle answered 6/10, 2023 at 12:30 Comment(5)

Thanks both! I have updated the answer to reflect both options - also added a benchmark – Conceptacle 6/10, 2023 at 12:39

@ThomasIsCoding Interesting, I never realized there could be a difference. Looking at the source code, well, there is, but the difference seems to be mainly due to the overhead related to sapply calling more functions. This may be visible if FUN doesn't do much work (like here, when length(vector) = 3) but for longer inputs the difference is negligible. The only think which is not O(1) is unique(lengths(x)) in simplify2array which is not that expensive either. But if you wanted to call many *applys which do very little, then lapply would be better. – Audrieaudris 6/10, 2023 at 13:22

I'd like to see these benchmarks with longer vectors before reading too much into these differences. – Brechtel 6/10, 2023 at 13:40

@SamR, seems you mightt be right - I added in the benchmarks of length = 30,000 – Conceptacle 6/10, 2023 at 14:37

Basically, as length of vector or the contained strings increases, the difference between sapply and lapply disappears. With short strings @ThomasIsCoding 's UTF approach is somewhat slower than others. However, with increasing string length it gets much faster than the other, split & paste, approaches. – Audrieaudris 6/10, 2023 at 15:2

In base R:

vapply(strsplit(vector, NULL), \(x) paste(sort(x), collapse = ''), '')

Anthropoid answered 6/10, 2023 at 12:30 Comment(0)

I'd go with this:

stringr::str_split(vector, '') |> 
  purrr::map_chr(~ sort(.x) |> paste(collapse=''))
[1] "1234556"  "12356789" "01233477"

Chert answered 6/10, 2023 at 12:29 Comment(0)

Short but not most efficient

> unname(sapply(vector, \(x) intToUtf8(sort(utf8ToInt(x)))))
[1] "1234556"  "12356789" "01233477"

or might be a bit faster

> unlist(lapply(vector, \(x) intToUtf8(sort(utf8ToInt(x)))))
[1] "1234556"  "12356789" "01233477"

Phelloderm answered 6/10, 2023 at 12:37 Comment(0)

Recommended topics

Hot tags