Capitalizing letters. R equivalent of excel "PROPER" function [duplicate]
Asked Answered
E

4

13

Colleagues,

I'm looking at a data frame resembling the extract below:

Month   Provider Items
January CofCom   25
july    CofCom   331
march   vobix    12
May     vobix    0

I would like to capitalise first letter of each word and lower the remaining letters for each word. This would result in the data frame resembling the one below:

Month   Provider Items
January Cofcom   25
July    Cofcom   331
March   Vobix    12
May     Vobix    0

In a word, I'm looking for R's equivalent of the ROPER function available in the MS Excel.

Eatables answered 25/7, 2014 at 13:10 Comment(2)
See capwords function in ?tolower help pageSalado
There is a capwords function defined in the Examples section of ?tolower .Chemiluminescence
R
13

The question is about an equivalent of Excel PROPER and the (former) accepted answer is based on:

proper=function(x) paste0(toupper(substr(x, 1, 1)), tolower(substring(x, 2)))

It might be worth noting that:

proper("hello world")
## [1] "Hello world"

Excel PROPER would give, instead, "Hello World". For 1:1 mapping with Excel see @Matthew Plourde.

If what you actually need is to set only the first character of a string to upper-case, you might also consider the shorter and slightly faster version:

proper=function(s) sub("(.)", ("\\U\\1"), tolower(s), pe=TRUE)
Ruthenious answered 2/3, 2016 at 16:55 Comment(0)
V
32

With regular expressions:

x <- c('woRd Word', 'Word', 'word words')
gsub("(?<=\\b)([a-z])", "\\U\\1", tolower(x), perl=TRUE)
# [1] "Word Word"  "Word"       "Word Words"

(?<=\\b)([a-z]) says look for a lowercase letter preceded by a word boundary (e.g., a space or beginning of a line). (?<=...) is called a "look-behind" assertion. \\U\\1 says replace that character with it's uppercase version. \\1 is a back reference to the first group surrounded by () in the pattern. See ?regex for more details.

If you only want to capitalize the first letter of the first word, use the pattern "^([a-z]) instead.

Vaccination answered 25/7, 2014 at 13:40 Comment(3)
This is the actual answer. I urge the questioner to reconsider their check.Selfexistent
Is the back reference necessary? Wouldn't this give the same result? gsub("(\\b[a-z])", "\\U\\1", tolower(xx), perl=TRUE)Malay
This does not work well for all languages as this also capitalizes letters after special characters (such as umlauts).Guanajuato
R
13

The question is about an equivalent of Excel PROPER and the (former) accepted answer is based on:

proper=function(x) paste0(toupper(substr(x, 1, 1)), tolower(substring(x, 2)))

It might be worth noting that:

proper("hello world")
## [1] "Hello world"

Excel PROPER would give, instead, "Hello World". For 1:1 mapping with Excel see @Matthew Plourde.

If what you actually need is to set only the first character of a string to upper-case, you might also consider the shorter and slightly faster version:

proper=function(s) sub("(.)", ("\\U\\1"), tolower(s), pe=TRUE)
Ruthenious answered 2/3, 2016 at 16:55 Comment(0)
H
11

Another method uses the stringi package. The stri_trans_general function appears to lower case all letters other than the initial letter.

require(stringi)
x <- c('woRd Word', 'Word', 'word words')
stri_trans_general(x, id = "Title")
[1] "Word Word"  "Word"       "Word Words"
Haplosis answered 25/7, 2014 at 16:43 Comment(1)
For future visitors: stringi has a function called stri_trans_totitle which does the same thing. Not sure if that existed at the time of this answer.Drawplate
H
5

I dont think there is one, but you can easily write it yourself

(dat <- data.frame(x = c('hello', 'frIENds'),
                   y = c('rawr','rulZ'),
                   z = c(16, 18)))
#         x    y  z
# 1   hello rawr 16
# 2 frIENds rulZ 18

proper <- function(x)
  paste0(toupper(substr(x, 1, 1)), tolower(substring(x, 2)))


(dat <- data.frame(lapply(dat, function(x)
  if (is.numeric(x)) x else proper(x)),
  stringsAsFactors = FALSE))

#         x    y  z
# 1   Hello Rawr 16
# 2 Friends Rulz 18

str(dat)
# 'data.frame':  2 obs. of  3 variables:
#   $ x: chr  "Hello" "Friends"
#   $ y: chr  "Rawr" "Rulz"
#   $ z: num  16 18
Hillhouse answered 25/7, 2014 at 13:21 Comment(5)
Thank you, this is what I was looking for. It's such a nice thing that should be part of the base :)Eatables
Just one word of caution that available numeric column in the function was changed to factor after I applied this function, which messed up the chart a little so I had to make it numeric again.Eatables
@Eatables that that case, I would, data.frame(lapply(dat, function(x) if(is.numeric(x)) x else proper(x))) or something similarHillhouse
Thank you very much, it's very useful solution. I'm wondering whether it would be sensible to move if(is.numeric part to the function itself.Eatables
you could do that, too. you could also expand the function to handle different classes in different waysHillhouse

© 2022 - 2024 — McMap. All rights reserved.