How can I trim leading and trailing white space?

B

15

421

I am having some trouble with leading and trailing white space in a data.frame.

For example, I look at a specific row in a data.frame based on a certain condition:

> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)] 



[1] codeHelper     country        dummyLI    dummyLMI       dummyUMI       

[6] dummyHInonOECD dummyHIOECD    dummyOECD      

<0 rows> (or 0-length row.names)

I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame. After looking through my code history and trying to figure out what went wrong I tried:

> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
   codeHelper  country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18        AUT Austria        0        0        0              0           1
   dummyOECD
18         1

All I have changed in the command is an additional white space after Austria.

Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame uses "Austria " while the other frame has "Austria". The matching doesn't work.

Is there a nice way to 'show' the white space on my screen so that I am aware of the problem?
And can I remove the leading and trailing white space in R?

So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.

Benedikt answered 14/2, 2010 at 12:44 Comment(2)

I just saw that sub() uses the Perl notation as well. Sorry about that. I am going to try to use the function. But for my first question i don't have a solution yet. – Benedikt 14/2, 2010 at 12:50

As hadley pointed it this regex "^\\s+|\\s+$" will identify leading and trailing whitespace. so x <- gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Tanhya 14/2, 2010 at 15:11

Y

489

Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

If you want to clean strings afterwards you could use one of these functions:

# Returns string without leading white space
trim.leading <- function (x)  sub("^\\s+", "", x)

# Returns string without trailing white space
trim.trailing <- function (x) sub("\\s+$", "", x)

# Returns string without leading or trailing white space
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

To use one of these functions on myDummy$country:

 myDummy$country <- trim(myDummy$country)

To 'show' the white space you could use:

 paste(myDummy$country)

which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.

Yasminyasmine answered 14/2, 2010 at 13:13 Comment(8)

As hadley pointed it this regex "^\\s+|\\s+$" will identify leading and trailing whitespace. so x <- gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Tanhya 14/2, 2010 at 15:10

@Jay: Thanks for the hint. I changed the regexps in my answer to use the shorter "\\s" instead of "[ \t]". – Yasminyasmine 14/2, 2010 at 15:46

See also str_trim in the stringr package. – Babita 16/2, 2010 at 15:35

is there a trim param in read.spss? I tried trim_values = TRUE and trim.factor.names = TRUE but to no avail... – Bradfield 19/9, 2014 at 9:7

FYI: I trimmed all trailing spaces of the entire dataframe using apply: df_trimmed <- as.data.frame(apply(df,2,function (x) sub("\\s+$", "", x))) – Bradfield 19/9, 2014 at 9:35

Unfortunately, strip.white=TRUE only works on non-quoted strings. – Sweptback 10/8, 2015 at 15:8

There is a much easier way to trim whitespace in R 3.2.0. See the next answer! – Arundell 29/12, 2015 at 16:6

Also need to include stringsAsFactors = FALSE when using read.csv, as this won't work on factors. trimws() detailed below will work regardless, but by silently converting factor to character. Both useful answers though, thanks! – Dispatcher 14/7, 2018 at 9:21

J

609

As of R 3.2.0 a new function was introduced for removing leading/trailing white spaces:

trimws()

See: Remove Leading/Trailing Whitespace

Jaddan answered 13/5, 2015 at 9:26 Comment(6)

It depends on the definition of a best answer. This answer is nice to know of (+1) but in a quick test, it wasnt as fast as some of the alternatives out there. – Archdeaconry 24/5, 2015 at 8:5

doesn't seem to work for multi-line strings, despite \n being in the covered character class. trimws("SELECT\n blah\n FROM foo;") still contains newlines. – Gestalt 31/12, 2015 at 1:10

@Gestalt That is the expected behaviour. In the string you pass to trimws there are no leading or trailing white spaces. If you want to remove leading and trailing white spaces from each of the lines in the string, you will first have to split it up. Like this: trimws(strsplit("SELECT\n blah\n FROM foo;", "\n")[[1]]) – Jaddan 31/12, 2015 at 8:20

Although a built-in function for recent versions of R, it does 'just' do a PERL style regex under the hood. I might have expected some fast custom C code to do this. Maybe the trimws regex is fast enough. stringr::str_trim (based on stringi) is also interesting in that it uses a completely independent internationalized string library. You'd think whitespace would be immune from problems with internationalization, but I wonder. I've never seen a comparison of results of native vs stringr/stringi or any benchmarks. – Marji 30/1, 2016 at 17:31

For some reason I could not figure out, trimws() did not remove my leading white spaces, while Bryan's trim.strings() below (only 1 vote, mine!) did... – Smithsonite 3/3, 2018 at 22:16

@JackWasey I've added a benchmark - the example might be somewhat simple, but it should give an idea about the performance – Twofaced 8/3, 2021 at 12:29

Y

489

Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

If you want to clean strings afterwards you could use one of these functions:

# Returns string without leading white space
trim.leading <- function (x)  sub("^\\s+", "", x)

# Returns string without trailing white space
trim.trailing <- function (x) sub("\\s+$", "", x)

# Returns string without leading or trailing white space
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

To use one of these functions on myDummy$country:

 myDummy$country <- trim(myDummy$country)

To 'show' the white space you could use:

 paste(myDummy$country)