How to add leading zeros?
Asked Answered
C

8

490

I have a set of data which looks something like this:

anim <- c(25499,25500,25501,25502,25503,25504)
sex  <- c(1,2,2,1,2,1)
wt   <- c(0.8,1.2,1.0,2.0,1.8,1.4)
data <- data.frame(anim,sex,wt)

data
   anim sex  wt anim2
1 25499   1 0.8     2
2 25500   2 1.2     2
3 25501   2 1.0     2
4 25502   1 2.0     2
5 25503   2 1.8     2
6 25504   1 1.4     2

I would like a zero to be added before each animal id:

data
   anim sex  wt anim2
1 025499   1 0.8     2
2 025500   2 1.2     2
3 025501   2 1.0     2
4 025502   1 2.0     2
5 025503   2 1.8     2
6 025504   1 1.4     2

And for interest sake, what if I need to add two or three zeros before the animal id's?

Candicandia answered 28/4, 2011 at 1:10 Comment(2)
Suppose you want to add n zeros before animal ids you just need to do data$anim = paste(rep(0, n), data$anim, sep = "")Venessavenetia
When you say you want to "add zeros", you presumably don't want to convert your integer columns to string/categorical in order to add the zero-padding inside the data itself, you want to keep them integer and only print leading zeros when rendering output.Husted
C
770

The short version: use formatC or sprintf.


The longer version:

There are several functions available for formatting numbers, including adding leading zeroes. Which one is best depends upon what other formatting you want to do.

The example from the question is quite easy since all the values have the same number of digits to begin with, so let's try a harder example of making powers of 10 width 8 too.

anim <- 25499:25504
x <- 10 ^ (0:5)

paste (and it's variant paste0) are often the first string manipulation functions that you come across. They aren't really designed for manipulating numbers, but they can be used for that. In the simple case where we always have to prepend a single zero, paste0 is the best solution.

paste0("0", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"

For the case where there are a variable number of digits in the numbers, you have to manually calculate how many zeroes to prepend, which is horrible enough that you should only do it out of morbid curiosity.


str_pad from stringr works similarly to paste, making it more explicit that you want to pad things.

library(stringr)
str_pad(anim, 6, pad = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"

Again, it isn't really designed for use with numbers, so the harder case requires a little thinking about. We ought to just be able to say "pad with zeroes to width 8", but look at this output:

str_pad(x, 8, pad = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "0001e+05"

You need to set the scientific penalty option so that numbers are always formatted using fixed notation (rather than scientific notation).

library(withr)
with_options(
  c(scipen = 999), 
  str_pad(x, 8, pad = "0")
)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

stri_pad in stringi works exactly like str_pad from stringr.


formatC is an interface to the C function printf. Using it requires some knowledge of the arcana of that underlying function (see link). In this case, the important points are the width argument, format being "d" for "integer", and a "0" flag for prepending zeroes.

formatC(anim, width = 6, format = "d", flag = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
formatC(x, width = 8, format = "d", flag = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

This is my favourite solution, since it is easy to tinker with changing the width, and the function is powerful enough to make other formatting changes.


sprintf is an interface to the C function of the same name; like formatC but with a different syntax.

sprintf("%06d", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
sprintf("%08d", x)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

The main advantage of sprintf is that you can embed formatted numbers inside longer bits of text.

sprintf(
  "Animal ID %06d was a %s.", 
  anim, 
  sample(c("lion", "tiger"), length(anim), replace = TRUE)
)
## [1] "Animal ID 025499 was a tiger." "Animal ID 025500 was a tiger."
## [3] "Animal ID 025501 was a lion."  "Animal ID 025502 was a tiger."
## [5] "Animal ID 025503 was a tiger." "Animal ID 025504 was a lion." 

See also goodside's answer.


For completeness it is worth mentioning the other formatting functions that are occasionally useful, but have no method of prepending zeroes.

format, a generic function for formatting any kind of object, with a method for numbers. It works a little bit like formatC, but with yet another interface.

prettyNum is yet another formatting function, mostly for creating manual axis tick labels. It works particularly well for wide ranges of numbers.

The scales package has several functions such as percent, date_format and dollar for specialist format types.

Cheriecherilyn answered 28/4, 2011 at 9:56 Comment(10)
thanks alot for the great help. I used formatC to add leading zeros to my anim and it worked well.Candicandia
formatC(number or vector, width = 6, format = "d", flag = "0") worked well (R version 3.0.2 (2013-09-25)). Thanks.Fierro
using formatC() in the way described above didn't work for me. It added spaces instead of zeroes. Did I do something wrong? I'm using R version 3.1.1.Shy
@Shy Sounds like you forgot flag = "0".Cheriecherilyn
Nope. Input: formatC('1', width = 3, format = 'd', flag = '0') Output: [1] " 1" Edit: I tried it without the quotes and it workedShy
formatC is designed to format numbers. If you pass a string to it, then of course you will get unexpected results. The function should really warn about receiving a silly input.Cheriecherilyn
Thank you for listing all the possibilities even if they seem to be "the same". I wanted to pad my numbers to equal width, no matter how many I have, and I was able to use formatC with width=ceiling(log(nr.vars, base=10)) for it. I wouldn't have been able to do this with the printf syntax (or if there is a way, I don't know it).Infrasonic
@Infrasonic Try sprintf("%8.2f", 10 ^ (1:10)). Also, log10 is easier than log(base = 10).Cheriecherilyn
@RichieCotton I must admit I have almost no knowledge of the sprintf syntax. How does your suggestion work, and most importantly, where does it take the variable which determines how many zeroes to use?Infrasonic
The Details section of the ?sprintf help page describes this. "m.n: Two numbers separated by a period, denoting the field width (m) and the precision (n)."Cheriecherilyn
E
256

For a general solution that works regardless of how many digits are in data$anim, use the sprintf function. It works like this:

sprintf("%04d", 1)
# [1] "0001"
sprintf("%04d", 104)
# [1] "0104"
sprintf("%010d", 104)
# [1] "0000000104"

In your case, you probably want: data$anim <- sprintf("%06d", data$anim)

Elephus answered 28/4, 2011 at 1:29 Comment(6)
Note that sprintf converts numeric to string (character).Catheycathi
Thanks for the answer. I want to make a 13-digits number to 14-digits (adding leading zero). This function doesn't seems to work for this case. It gives me an arror: Error in sprintf("%020d", 4000100000104) : invalid format '%020d'; use format %f, %e, %g or %a for numeric objects. Any suggestion?Iroquoian
Try: sprintf("%014.0f", 4000100000104)Dugald
sprintf is not available for R 3.4.1Polycythemia
Yes it is. It's unchanged since version 1.5.0.Pyretic
I had a weird experience where a colleague using Windows would have sprintf( print leading spaces while my Mac was printing leading zeros. We switched to stringr::str_pad(Ordain
B
39

Expanding on @goodside's repsonse:

In some cases you may want to pad a string with zeros (e.g. fips codes or other numeric-like factors). In OSX/Linux:

> sprintf("%05s", "104")
[1] "00104"

But because sprintf() calls the OS's C sprintf() command, discussed here, in Windows 7 you get a different result:

> sprintf("%05s", "104")
[1] "  104"

So on Windows machines the work around is:

> sprintf("%05d", as.numeric("104"))
[1] "00104"
Bria answered 21/8, 2013 at 14:41 Comment(1)
For whatever reason, this solution no longer works for me on Linux. @kdauria's str_pad is now my go to.Bria
L
36

str_pad from the stringr package is an alternative.

anim = 25499:25504
str_pad(anim, width=6, pad="0")
Luminescent answered 27/8, 2014 at 6:56 Comment(1)
Be very careful with str_pad as it can lead to unexpected results. i.num = 600000; str_pad(i.num, width = 7, pad = "0") will give you "006e+05" and not "0600000"Overthrust
A
2

Here's a generalizable base R function:

pad_left <- function(x, len = 1 + max(nchar(x)), char = '0'){

    unlist(lapply(x, function(x) {
        paste0(
            paste(rep(char, len - nchar(x)), collapse = ''),
            x
        )
    }))
}

pad_left(1:100)

I like sprintf but it comes with caveats like:

however the actual implementation will follow the C99 standard and fine details (especially the behaviour under user error) may depend on the platform

Ashby answered 27/9, 2018 at 2:15 Comment(0)
C
1

Here is another alternative for adding leading to 0s to strings such as CUSIPs which can sometimes look like a number and which many applications such as Excel will corrupt and remove the leading 0s or convert them to scientific notation.

When I tried the answer provided by @metasequoia the vector returned had leading spaces and not 0s. This was the same problem mentioned by @user1816679 -- and removing the quotes around the 0 or changing from %d to %s did not make a difference either. FYI, I am using RStudio Server running on an Ubuntu Server. This little two-step solution worked for me:

gsub(pattern = " ", replacement = "0", x = sprintf(fmt = "%09s", ids[,CUSIP]))

using the %>% pipe function from the magrittr package it could look like this:

sprintf(fmt = "%09s", ids[,CUSIP]) %>% gsub(pattern = " ", replacement = "0", x = .)

I'd prefer a one-function solution, but it works.

Cowen answered 10/12, 2016 at 19:44 Comment(0)
C
1

For other circumstances in which you want the number string to be consistent, I made a function.

Someone may find this useful:

idnamer<-function(x,y){#Alphabetical designation and number of integers required
    id<-c(1:y)
    for (i in 1:length(id)){
         if(nchar(id[i])<2){
            id[i]<-paste("0",id[i],sep="")
         }
    }
    id<-paste(x,id,sep="")
    return(id)
}
idnamer("EF",28)

Sorry about the formatting.

Caddric answered 3/4, 2017 at 1:27 Comment(0)
S
0
data$anim <- sapply(0, paste0,data$anim)
Shanda answered 20/4, 2016 at 18:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.