I have a large data set with a column of text, 20K rows. Would like to remove the first x number (e.g. 3) of characters at the beginning of each row in that specific column. Appreciate your assistance.
How to remove the first three characters from every row in a column in R
You can do it with gsub
function and simple regex. Here is the code:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)
Great answer and it worked like a charm. Thank you. –
Lapp
such a nice great answer. Would you know how to adapt your answer in case we wanted to delete the last three characters instead? –
Maintenance
Use ".{0,3}$" instead of "^.{0,3}". –
Sybaris
With the tidyverse
we can use str_sub
(and some sample fruit
text strings) to do this, by directly specifying start and end points:
library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#> some_fruit
#> <chr>
#> 1 apple
#> 2 apricot
#> 3 avocado
#> 4 banana
#> 5 bell pepper
#> 6 bilberry
#> 7 blackberry
#> 8 blackcurrant
#> 9 blood orange
#> 10 blueberry
#> # … with 70 more rows
tbl %>%
mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#> some_fruit chopped_fruit
#> <chr> <chr>
#> 1 apple le
#> 2 apricot icot
#> 3 avocado cado
#> 4 banana ana
#> 5 bell pepper l pepper
#> 6 bilberry berry
#> 7 blackberry ckberry
#> 8 blackcurrant ckcurrant
#> 9 blood orange od orange
#> 10 blueberry eberry
#> # … with 70 more rows
Created on 2019-02-22 by the reprex package (v0.2.1)
Thank you very much for your help. –
Lapp
As usual..so many ways to do things in R!
You can also try ?substring
:
lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
+ column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
+ stringsAsFactors=FALSE)
> head(lotsofdata)
column.1 column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"
Or column 1 [,1]
> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"
Then just replace it:
x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
lotsofdata$column.1<-x
> head(lotsofdata)
column.1 column2
1 aPoint1 MoreData1
2 aPoint2 MoreData2
3 aPoint3 MoreData3
4 aPoint4 MoreData4
Thank you very much for your help. –
Lapp
© 2022 - 2025 — McMap. All rights reserved.
?substr
and also at?nchar
– Katinakatine