Splitting CamelCase in R

P

8

14

Is there a way to split camel case strings in R?

I have attempted:

string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s"    "ome"  "amel" "ase"

Platyhelminth answered 6/12, 2011 at 21:18 Comment(0)

E

17

string.to.split = "thisIsSomeCamelCase"
gsub("([A-Z]){1}", " \\1", string.to.split)
# [1] "this Is Some Camel Case"
# added a counter to prevent situation mentioned in comment
strsplit(gsub("([A-Z]{1})", " \\1", string.to.split), " ")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case" 

# another attempt to meet the commenter's concern
# inserts space between lower-single upper sequence
gsub("([[:lower:]])([[:upper:]]){1}", "\\1 \\2", string.to.split)

Looking at Ramnath's and mine I can say that my initial impression that this was an underspecified question has been supported.

And give Tommy and Ramanth upvotes for pointing out [:upper:]

strsplit(gsub("([[:upper:]])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case"

Excellence answered 6/12, 2011 at 21:26 Comment(2)

So what if the string is something like 'AB DevelopmentsAndy', which I might want to recreate as "AB Developments Andy". That is, I don't want to split consecutive uppercase characters. Just ones where lowercase then upper case are next to each other. – Wretched 9/4, 2019 at 11:31

You should read ?regex and find the correct pattern for a lowercase character and use the correct symbol for an arbitrary number of them. – Excellence 9/4, 2019 at 15:10

T

11

Here is one way to do it

split_camelcase <- function(...){
  strings <- unlist(list(...))
  strings <- gsub("^[^[:alnum:]]+|[^[:alnum:]]+$", "", strings)
  strings <- gsub("(?!^)(?=[[:upper:]])", " ", strings, perl = TRUE)
  return(strsplit(tolower(strings), " ")[[1]])
}

split_camelcase("thisIsSomeGood")
# [1] "this" "is"   "some" "good"

Thuthucydides answered 6/12, 2011 at 21:24 Comment(1)

+1 since this works with international upper case letters (not just A-Z) - for example, "enÖIHavet" – Eyecatching 6/12, 2011 at 21:35

T

7

Here's an approach using a single regex (a Lookahead and Lookbehind):

strsplit(string.to.split, "(?<=[a-z])(?=[A-Z])", perl = TRUE)

## [[1]]
## [1] "this"  "Is"    "Some"  "Camel" "Case"

Twohanded answered 11/4, 2015 at 2:9 Comment(1)

Nice, and, to be preferred as it does not split between successive uppercase. Even better: strsplit(string.to.split, "(?<=[[:lower:]])(?=[[:upper:]])", perl = TRUE) – Moldau 11/2, 2018 at 19:19

S

2

Here is a one-liner using the gsubfn package's strapply. The regular expression matches the beginning of the string (^) followed by one or more lower case letters ([[:lower:]]+) or (|) an upper case letter ([[:upper:]]) followed by zero or more lower case letters ([[:lower:]]*) and processes the matched strings with c (which concatenates the individual matches into a vector). As with strsplit it returns a list so we take the first component ([[1]]) :

library(gsubfn)
strapply(string.to.split, "^[[:lower:]]+|[[:upper:]][[:lower:]]*", c)[[1]]
## [1] "this"  "Is"    "Camel" "Case"

Sherrylsherurd answered 6/12, 2011 at 22:8 Comment(0)

L

1

I think my other answer is better than the follwing, but if only a oneliner to split is needed...here we go:

library(snakecase)
unlist(strsplit(to_parsed_case(string.to.split), "_"))
#> [1] "this"  "Is"    "Some"  "Camel" "Case"

Lack answered 27/3, 2017 at 9:55 Comment(0)

D

1

library(strex)

str_split_camel_case("thisIsSomeCamelCase")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case"

Drin answered 3/4 at 18:58 Comment(0)

C

0

The beginnings of an answer is to split all the characters:

sp.x <- strsplit(string.to.split, "")

Then find which string positions are upper case:

ind.x <- lapply(sp.x, function(x) which(!tolower(x) == x))

Then use that to split out each run of characters . . .

Cassel answered 6/12, 2011 at 21:26 Comment(0)

L

-1

Here an easy solution via snakecase + some tidyverse helpers:

install.packages("snakecase")
library(snakecase)
library(magrittr)
library(stringr)
library(purrr)

string.to.split = "thisIsSomeCamelCase"
to_parsed_case(string.to.split) %>% 
  str_split(pattern = "_") %>% 
  purrr::flatten_chr()
#> [1] "this"  "Is"    "Some"  "Camel" "Case"

Githublink to snakecase: https://github.com/Tazinho/snakecase

Lack answered 25/3, 2017 at 22:16 Comment(2)

I didn't downvote, but would guess they're due to this answer being much more complicated, and requiring obscure packages, relative to other 1- to 2-line answers. – Paleography 26/3, 2017 at 3:14

I still like this answer, since it is very clean and formats the output. So I leave it here and just provided a oneliner above... – Lack 27/3, 2017 at 9:58

Recommended topics

Hot tags