Splitting CamelCase in R
Asked Answered
P

8

14

Is there a way to split camel case strings in R?

I have attempted:

string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s"    "ome"  "amel" "ase" 
Platyhelminth answered 6/12, 2011 at 21:18 Comment(0)
E
17
string.to.split = "thisIsSomeCamelCase"
gsub("([A-Z]){1}", " \\1", string.to.split)
# [1] "this Is Some Camel Case"
# added a counter to prevent situation mentioned in comment
strsplit(gsub("([A-Z]{1})", " \\1", string.to.split), " ")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case" 

# another attempt to meet the commenter's concern
# inserts space between lower-single upper sequence
gsub("([[:lower:]])([[:upper:]]){1}", "\\1 \\2", string.to.split)

Looking at Ramnath's and mine I can say that my initial impression that this was an underspecified question has been supported.

And give Tommy and Ramanth upvotes for pointing out [:upper:]

strsplit(gsub("([[:upper:]])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case" 
Excellence answered 6/12, 2011 at 21:26 Comment(2)
So what if the string is something like 'AB DevelopmentsAndy', which I might want to recreate as "AB Developments Andy". That is, I don't want to split consecutive uppercase characters. Just ones where lowercase then upper case are next to each other.Wretched
You should read ?regex and find the correct pattern for a lowercase character and use the correct symbol for an arbitrary number of them.Excellence
T
11

Here is one way to do it

split_camelcase <- function(...){
  strings <- unlist(list(...))
  strings <- gsub("^[^[:alnum:]]+|[^[:alnum:]]+$", "", strings)
  strings <- gsub("(?!^)(?=[[:upper:]])", " ", strings, perl = TRUE)
  return(strsplit(tolower(strings), " ")[[1]])
}

split_camelcase("thisIsSomeGood")
# [1] "this" "is"   "some" "good"
Thuthucydides answered 6/12, 2011 at 21:24 Comment(1)
+1 since this works with international upper case letters (not just A-Z) - for example, "enÖIHavet"Eyecatching
T
7

Here's an approach using a single regex (a Lookahead and Lookbehind):

strsplit(string.to.split, "(?<=[a-z])(?=[A-Z])", perl = TRUE)

## [[1]]
## [1] "this"  "Is"    "Some"  "Camel" "Case" 
Twohanded answered 11/4, 2015 at 2:9 Comment(1)
Nice, and, to be preferred as it does not split between successive uppercase. Even better: strsplit(string.to.split, "(?<=[[:lower:]])(?=[[:upper:]])", perl = TRUE)Moldau
S
2

Here is a one-liner using the gsubfn package's strapply. The regular expression matches the beginning of the string (^) followed by one or more lower case letters ([[:lower:]]+) or (|) an upper case letter ([[:upper:]]) followed by zero or more lower case letters ([[:lower:]]*) and processes the matched strings with c (which concatenates the individual matches into a vector). As with strsplit it returns a list so we take the first component ([[1]]) :

library(gsubfn)
strapply(string.to.split, "^[[:lower:]]+|[[:upper:]][[:lower:]]*", c)[[1]]
## [1] "this"  "Is"    "Camel" "Case" 
Sherrylsherurd answered 6/12, 2011 at 22:8 Comment(0)
L
1

I think my other answer is better than the follwing, but if only a oneliner to split is needed...here we go:

library(snakecase)
unlist(strsplit(to_parsed_case(string.to.split), "_"))
#> [1] "this"  "Is"    "Some"  "Camel" "Case" 
Lack answered 27/3, 2017 at 9:55 Comment(0)
D
1
library(strex)

str_split_camel_case("thisIsSomeCamelCase")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case" 
Drin answered 3/4 at 18:58 Comment(0)
C
0

The beginnings of an answer is to split all the characters:

sp.x <- strsplit(string.to.split, "")

Then find which string positions are upper case:

ind.x <- lapply(sp.x, function(x) which(!tolower(x) == x))

Then use that to split out each run of characters . . .

Cassel answered 6/12, 2011 at 21:26 Comment(0)
L
-1

Here an easy solution via snakecase + some tidyverse helpers:

install.packages("snakecase")
library(snakecase)
library(magrittr)
library(stringr)
library(purrr)

string.to.split = "thisIsSomeCamelCase"
to_parsed_case(string.to.split) %>% 
  str_split(pattern = "_") %>% 
  purrr::flatten_chr()
#> [1] "this"  "Is"    "Some"  "Camel" "Case" 

Githublink to snakecase: https://github.com/Tazinho/snakecase

Lack answered 25/3, 2017 at 22:16 Comment(2)
I didn't downvote, but would guess they're due to this answer being much more complicated, and requiring obscure packages, relative to other 1- to 2-line answers.Paleography
I still like this answer, since it is very clean and formats the output. So I leave it here and just provided a oneliner above...Lack

© 2022 - 2024 — McMap. All rights reserved.