How to format a number as percentage in R?
Asked Answered
U

11

189

One of the things that used to perplex me as a newby to R was how to format a number as a percentage for printing.

For example, display 0.12345 as 12.345%. I have a number of workarounds for this, but none of these seem to be "newby friendly". For example:

set.seed(1)
m <- runif(5)

paste(round(100*m, 2), "%", sep="")
[1] "26.55%" "37.21%" "57.29%" "90.82%" "20.17%"

sprintf("%1.2f%%", 100*m)
[1] "26.55%" "37.21%" "57.29%" "90.82%" "20.17%"

Question: Is there a base R function to do this? Alternatively, is there a widely used package that provides a convenient wrapper?


Despite searching for something like this in ?format, ?formatC and ?prettyNum, I have yet to find a suitably convenient wrapper in base R. ??"percent" didn't yield anything useful. library(sos); findFn("format percent") returns 1250 hits - so again not useful. ggplot2 has a function percent but this gives no control over rounding accuracy.

Unquiet answered 22/8, 2011 at 10:5 Comment(8)
sprintf seems to be the favorite solution on the mailing lists, and I've not seen any better solution. Any built-in function won't be much simpler to call anyway, right?Hwu
In my view sprintf is perfectly fine for that subset of R coders that also happen to be programmers. I have coded a lot in my life, including COBOL (shudder) and fortran (shows my age). But I don't consider the sprintf formatting rules obvious (translation: WTF?). And of course a dedicated wrapper must be easier to call than sprintf, for example: format_percent(x=0.12345, digits=2)Unquiet
@hircus I think it's common enough that it deserves its own short curried function. It's particularly an issue with Sweave, where \Sexpr{sprintf(%1.2f%%",myvar)} is much uglier than \Sexpr{pct(myvar)} or whatever the shorter function would be.Perseid
Isn't learning to use the appropriate tools something we should expect users to strive towards? I mean, learning to use sprintf() is hardly more time consuming than finding out that package foo contains format_percent(). What happens if the user then doesn't want to format as percent but something else that is similar? They need to find another wrapper. In the long run learning the base tools will be beneficial.Eyeglasses
There is a slight problem in that % is the comment character in LaTeX, which is the "default" reporting format for R. So while it may be useful for labelling graphs, care must be taking if the formatted number is to be Sweaved.Teodorateodorico
I might be able to address why it's not a good idea, maybe not in the short space of comments. I'll try. (1) In a lot of consulting, I've had that request & I (as a human) can infer whether the original # is a proportion or needs to be converted as such; arbitrary numeric objects have no such "is-proportion" flag. (2) B/c of (1) it can be assumed that a person can do their own calculation, convert to proportions, and then output appropriately. (3) Satisfying % requests opens the door to issues with percentiles, e.g. when given a list of numbers.Retha
(Continued) Given a vector of numerics, a percentile request comes along... more issues arise. (4) Why stop at percents - basis points are also good. Conclusion: I don't speak for R Core, but it's just so easy to format the #s on one's own that the tiny little hurdle it creates means that the implementer/user will be more likely to correctly implement what they want.Retha
(Continued) I will concede that although I don't like the idea of a "printPercent()" function, I am not opposed to a "multiplyBy100andAppendPercentSign()" function. If nothing else, it is fully descriptive. It can also be paired, in a package, with its twin: "removePercentSignAndDivideBy100()".Retha
A
170

Even later:

As pointed out by @DzimitryM, percent() has been "retired" in favor of label_percent(), which is a synonym for the old percent_format() function.

label_percent() returns a function, so to use it, you need an extra pair of parentheses.

library(scales)
x <- c(-1, 0, 0.1, 0.555555, 1, 100)
label_percent()(x)
## [1] "-100%"   "0%"      "10%"     "56%"     "100%"    "10 000%"

Customize this by adding arguments inside the first set of parentheses.

label_percent(big.mark = ",", suffix = " percent")(x)
## [1] "-100 percent"   "0 percent"      "10 percent"    
## [4] "56 percent"     "100 percent"    "10,000 percent"

An update, several years later:

These days there is a percent function in the scales package, as documented in krlmlr's answer. Use that instead of my hand-rolled solution.


Try something like

percent <- function(x, digits = 2, format = "f", ...) {
  paste0(formatC(100 * x, format = format, digits = digits, ...), "%")
}

With usage, e.g.,

x <- c(-1, 0, 0.1, 0.555555, 1, 100)
percent(x)

(If you prefer, change the format from "f" to "g".)

Autocorrelation answered 22/8, 2011 at 10:42 Comment(5)
Yes, this works, and is a slightly more general version of the workaround I supplied in the question. But my real question is whether this exists in base R or not.Unquiet
Works for me in listing percents, but replacing "x" with "percent(x)" in a statistical or graphing command produces an error message.Arthur
@Arthur Both my answer and krlmlr's answer return character vectors as the output, not numbers. They are for formatting axis labels and the like. Perhaps you just want to multiply by 100?Autocorrelation
As of 2020 scales ver. 1.1.0 manual tells: percent() is retired; please use label_percent() instead, which is not suitable for numbers formatting. So that the hand-rolled solution is still relevantQuathlamba
@Quathlamba Why is label_percent() not suitable for numbers formatting?Thisbee
P
84

Check out the scales package. It used to be a part of ggplot2, I think.

library('scales')
percent((1:10) / 100)
#  [1] "1%"  "2%"  "3%"  "4%"  "5%"  "6%"  "7%"  "8%"  "9%"  "10%"

The built-in logic for detecting the precision should work well enough for most cases.

percent((1:10) / 1000)
#  [1] "0.1%" "0.2%" "0.3%" "0.4%" "0.5%" "0.6%" "0.7%" "0.8%" "0.9%" "1.0%"
percent((1:10) / 100000)
#  [1] "0.001%" "0.002%" "0.003%" "0.004%" "0.005%" "0.006%" "0.007%" "0.008%"
#  [9] "0.009%" "0.010%"
percent(sqrt(seq(0, 1, by=0.1)))
#  [1] "0%"   "32%"  "45%"  "55%"  "63%"  "71%"  "77%"  "84%"  "89%"  "95%" 
# [11] "100%"
percent(seq(0, 0.1, by=0.01) ** 2)
#  [1] "0.00%" "0.01%" "0.04%" "0.09%" "0.16%" "0.25%" "0.36%" "0.49%" "0.64%"
# [10] "0.81%" "1.00%"
Personate answered 22/7, 2013 at 12:29 Comment(3)
Doesn't work for negative numbers. percent(-0.1) produces NaN%Justinajustine
@akhmed: This has been reported already, a fix is available but pending review: github.com/hadley/scales/issues/50. Note that it seems to work for more than one negative number: scales::percent(c(-0.1, -0.2))Personate
Thanks for the link! I wasn't sure if it is a feature or a bug. For multiple numbers it sometimes works and sometimes doesn't. Say, scales::percent(c(-0.1,-0.1,-0.1)) produces "NaN%" "NaN%" "NaN%" but your example does work. For the reference of others, the bug isn't yet fixed as of scales_0.2.4. Also, as of today, the corresponding pull request fixing it is not yet merged into the main branch.Justinajustine
P
40

Check out the percent function from the formattable package:

library(formattable)
x <- c(0.23, 0.95, 0.3)
percent(x)
[1] 23.00% 95.00% 30.00%
Paella answered 12/7, 2016 at 15:48 Comment(2)
+1, this allows for specifying how many digits to include, which scales::percent in the first two answers does not.Fragile
+1, even though it's pretty easy to roll your own function, allowing choosing the number of digits is really useful.Incorporating
T
30

Base R

I much prefer to use sprintf which is available in base R.

sprintf("%0.1f%%", .7293827 * 100)
[1] "72.9%"

I especially like sprintf because you can also insert strings.

sprintf("People who prefer %s over %s: %0.4f%%", 
        "Coke Classic", 
        "New Coke",
        .999999 * 100)
[1] "People who prefer Coke Classic over New Coke: 99.9999%"

It's especially useful to use sprintf with things like database configurations; you just read in a yaml file, then use sprintf to populate a template without a bunch of nasty paste0's.

Longer motivating example

This pattern is especially useful for rmarkdown reports, when you have a lot of text and a lot of values to aggregate.

Setup / aggregation:

library(data.table) ## for aggregate

approval <- data.table(year = trunc(time(presidents)), 
                       pct = as.numeric(presidents) / 100,
                       president = c(rep("Truman", 32),
                                     rep("Eisenhower", 32),
                                     rep("Kennedy", 12),
                                     rep("Johnson", 20),
                                     rep("Nixon", 24)))
approval_agg <- approval[i = TRUE,
                         j = .(ave_approval = mean(pct, na.rm=T)), 
                         by = president]
approval_agg
#     president ave_approval
# 1:     Truman    0.4700000
# 2: Eisenhower    0.6484375
# 3:    Kennedy    0.7075000
# 4:    Johnson    0.5550000
# 5:      Nixon    0.4859091

Using sprintf with vectors of text and numbers, outputting to cat just for newlines.

approval_agg[, sprintf("%s approval rating: %0.1f%%",
                       president,
                       ave_approval * 100)] %>% 
  cat(., sep = "\n")
# 
# Truman approval rating: 47.0%
# Eisenhower approval rating: 64.8%
# Kennedy approval rating: 70.8%
# Johnson approval rating: 55.5%
# Nixon approval rating: 48.6%

Finally, for my own selfish reference, since we're talking about formatting, this is how I do commas with base R:

30298.78 %>% round %>% prettyNum(big.mark = ",")
[1] "30,299"
Torritorricelli answered 30/9, 2020 at 22:10 Comment(0)
J
12

I did some benchmarking for speed on these answers and was surprised to see percent in the scales package so touted, given its sluggishness. I imagine the advantage is its automatic detector for for proper formatting, but if you know what your data looks like it seems clear to be avoided.

Here are the results from trying to format a list of 100,000 percentages in (0,1) to a percentage in 2 digits:

library(microbenchmark)
x = runif(1e5)
microbenchmark(times = 100L, andrie1(), andrie2(), richie(), krlmlr())
# Unit: milliseconds
#   expr       min        lq      mean    median        uq       max
# 1 andrie1()  91.08811  95.51952  99.54368  97.39548 102.75665 126.54918 #paste(round())
# 2 andrie2()  43.75678  45.56284  49.20919  47.42042  51.23483  69.10444 #sprintf()
# 3  richie()  79.35606  82.30379  87.29905  84.47743  90.38425 112.22889 #paste(formatC())
# 4  krlmlr() 243.19699 267.74435 304.16202 280.28878 311.41978 534.55904 #scales::percent()

So sprintf emerges as a clear winner when we want to add a percent sign. On the other hand, if we only want to multiply the number and round (go from proportion to percent without "%", then round() is fastest:

# Unit: milliseconds
#        expr      min        lq      mean    median        uq       max
# 1 andrie1()  4.43576  4.514349  4.583014  4.547911  4.640199  4.939159 # round()
# 2 andrie2() 42.26545 42.462963 43.229595 42.960719 43.642912 47.344517 # sprintf()
# 3  richie() 64.99420 65.872592 67.480730 66.731730 67.950658 96.722691 # formatC()
Jelly answered 4/6, 2015 at 2:54 Comment(0)
B
10

The tidyverse version is this:

> library(dplyr)
> library(scales)

> set.seed(1)
> m <- runif(5)
> dt <- as.data.frame(m)

> dt %>% mutate(perc=percent(m,accuracy=0.001))
          m    perc
1 0.2655087 26.551%
2 0.3721239 37.212%
3 0.5728534 57.285%
4 0.9082078 90.821%
5 0.2016819 20.168%

Looks tidy as usual.

Bant answered 20/4, 2020 at 16:1 Comment(2)
Tidy, indeed. But given we value tidiness, I assume one could call the library "scales" (as you did with "tidyverse") and leave out the "::" operator which is confusing to newbies like me.Launcelot
Yes I think you are right, I have updated the answer.Bant
S
8

You can use the scales package just for this operation (without loading it with require or library)

scales::percent(m)
Symptomatology answered 11/1, 2017 at 16:43 Comment(1)
How to give the accuracy for the number of digits?Xenos
P
6

Here's my solution for defining a new function (mostly so I can play around with Curry and Compose :-) ):

library(roxygen)
printpct <- Compose(function(x) x*100, Curry(sprintf,fmt="%1.2f%%"))
Perseid answered 22/8, 2011 at 10:28 Comment(0)
S
0
try this~

data_format <- function(data,digit=2,type='%'){
if(type=='d') {
    type = 'f';
    digit = 0;
}
switch(type,
    '%' = {format <- paste("%.", digit, "f%", type, sep='');num <- 100},
    'f' = {format <- paste("%.", digit, type, sep='');num <- 1},
    cat(type, "is not a recognized type\n")
)
sprintf(format, num * data)
}
Snowfall answered 30/10, 2015 at 5:57 Comment(0)
D
0

This function could transform the data to percentages by columns

percent.colmns = function(base, columnas = 1:ncol(base), filas = 1:nrow(base)){
    base2 = base
    for(j in columnas){
        suma.c = sum(base[,j])
        for(i in filas){
            base2[i,j] = base[i,j]*100/suma.c
        }
    }
    return(base2)
}
Dorty answered 28/9, 2016 at 15:57 Comment(1)
Basic arithmetic is vectorized---the inner for loop is inefficient and unnecessary. Can be replaced with base2[, j] = base[ , j] * 100 / suma.c. Also worth noting that this isn't exactly an answer to the question... the question is about formatting something like 0.5 to "50.0%", not about doing a calculation...Breakage
N
0

Here's a lightweight percent class object and all associated methods.

It differs to scales in that percent(1) will return "1%" whereas scales::percent(1) will return "100%". This can be easily amended by removing the division by 100 in percent() if need be.

Edit: Have bundled the code in a package.

# remotes::install_github("NicChr/percent")
library(percent)

percent(0.12345 * 100)
[1] "12.345%"
percent(0:10)
#>  [1] "0%"  "1%"  "2%"  "3%"  "4%"  "5%"  "6%"  "7%"  "8%"  "9%"  "10%"

With this class we can do basic math which cannot be done with scales::percent

Notice that only when both vectors are percents the output is a percent.

10 * percent(50)
#> [1] 5
percent(10) + percent(20)
#> [1] "30%"

We can format as normal using format()

# Format uses significant and not decimal digits
format(percent(12.345), digits = 3)
[1] "12.3%
format(percent(12.345), digits = 3, symbol = ' (%)')
[1] "12.3 (%)"

Benchmark against scales package

x <- seq(0, 1, 1e-6)
bench::mark(percent(x), 
            scales::percent(x), 
            check = FALSE)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 percent(x)           1.31ms   1.48ms   244.       7.63MB    17.3 
#> 2 scales::percent(x)    3.41s    3.41s     0.293  326.42MB     2.05

To convert proportions to percentages we can just write a simple wrapper..

as_percent <- function(x){
  percent(as.numeric(x) * 100)
}
as_percent(0.5)
[1] "50%"
Ninnette answered 29/2 at 8:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.