I have a file:
ABCD.csv
The length before the .csv
is not fixed and vary in any length.
How can I extract the portion before the .csv
?
I have a file:
ABCD.csv
The length before the .csv
is not fixed and vary in any length.
How can I extract the portion before the .csv
?
There's a built in file_path_sans_ext
from the standard install tools package that grabs the file without the extension.
tools::file_path_sans_ext("ABCD.csv")
## [1] "ABCD"
basename()
as file_path_sans_ext(basename(filepath))
. –
Verduzco basename
will also remove the path leading to the file. And with this regex, any extension will be removed.
filepath <- "d:/Some Dir/ABCD.csv"
sub(pattern = "(.*)\\..*$", replacement = "\\1", basename(filepath))
# [1] "ABCD"
Or, using file_path_sans_ext
as Tyler Rinker suggested:
file_path_sans_ext(basename(filepath))
# [1] "ABCD"
sub(pattern = "(.*?)\\..*$", replacement = "\\1", basename(filepath))
–
Paleoecology You can use sub
or substr
sub('\\.csv$', '', str1)
#[1] "ABCD"
or
substr(str1, 1, nchar(str1)-4)
#[1] "ABCD"
Using the 'file_path' from @JasonV's post
sub('\\..*$', '', basename(filepath))
#[1] "ABCD"
Or
library(stringr)
str_extract(filepath, perl('(?<=[/])([^/]+)(?=\\.[^.]+)'))
#[1] "ABCD"
str1 <- 'ABCD.csv'
.
after the \\.
Could that be also a .
literally i.e. foo..
–
Placenta foo.
Not sure what to do with those –
Placenta .<word>
at the end and there are no other cases, this would work –
Placenta fs::path_ext_remove()
"removes the last extension and returns the rest of the path".
fs::path_ext_remove(c("ABCD.csv", "foo.bar.baz.txt", "d:/Some Dir/ABCD.csv"))
# Produces: [1] "ABCD" "foo.bar.baz" "D:/Some Dir/ABCD"
If you have filenames with multiple (possible extensions) and you want to strip off only the last extension, you can try the following.
Consider the filename foo.bar.baz.txt
this
sub('\\..[^\\.]*$', '', "foo.bar.baz.txt")
will leave you with foo.bar.baz
.
You can try this also:
data <- "ABCD.csv"
gsub(pattern = "\\.csv$", "", data)
#[1] "ABCD"
This will be helpful in case of list of files as well, say
data <- list.files(pattern="\\.csv$")
, using the code will remove extension of all the files in the list.
Here is an implementation that works for compression and multiple files:
remove.file_ext <- function(path, basename = FALSE) {
out <- c()
for (p in path) {
fext <- file_ext(path)
compressions <- c("gzip", "gz", "bgz", "zip")
areCompressed <- fext %in% compressions
if (areCompressed) {
ext <- file_ext(file_path_sans_ext(path, compression = FALSE))
regex <- paste0("*\\.",ext,"\\.", fext,"$")
} else {
regex <- paste0("*\\.",fext,"$")
}
new <- gsub(pattern = regex, "", path)
out <- c(out, new)
}
return(ifelse(basename, basename(out), out))
}
Loading the library needed :
> library(stringr)
Extracting all the matches from the regex:
> str_match("ABCD.csv", "(.*)\\..*$")
[,1] [,2]
[1,] "ABCD.csv" "ABCD"
Returning only the second part of the result, which corresponds to the group matching the file name:
> str_match("ABCD.csv", "(.*)\\..*$")[,2]
[1] "ABCD"
EDIT for @U-10-Forward:
It is basically the same principle as the other answer. Just that I found this solution more robust.
Regex wise it means:
() = group
.* = any single character except the newline character any number of time
// is escape notation, thus //. means literally "."
.* = any characters any number of time again
$ means should be at the end of the input string
The logic is then that it will return the group preceding a "." followed by a group of characters at the end of the string (which equals the file extension in this case).
The above answers are great, but I was interested in which was fastest for dealing with millions of paths at once. It seems that using sub
via this SO question is the fastest for getting the filename out of the path. and then comparing three of the methods above, using tools::file_path_sans_ext
is faster.
library(fs)
library(stringr)
library(microbenchmark)
files<-paste0("http://some/ppath/to/som/cool/file/",1:1000,".flac")
microbenchmark(
fs::path_ext_remove(sub(".*/", "", files)),
tools::file_path_sans_ext(sub(".*/", "", files)),
str_extract(files, '(?<=[/])([^/]+)(?=\\.[^.]+)')
)
Unit: milliseconds
expr min lq mean median uq max neval
fs::path_ext_remove(sub(".*/", "", files)) 10.6273 10.98940 11.323063 11.20500 11.4992 14.5834 100
tools::file_path_sans_ext(sub(".*/", "", files)) 1.3717 1.44260 1.532092 1.48560 1.5588 2.4806 100
str_extract(files, "(?<=[/])([^/]+)(?=\\\\.[^.]+)") 7.4197 7.62875 7.985206 7.88835 8.2311 9.4107 100
You can use substring()
filename <- "test_channels.csv"
only.extension <- substring(filename,nchar(filename) - 3 )
without_ext.name <- substring(filename,1, nchar(filename) - 4 )
# verify
only.extension
without_ext.name
I hope it will be helpful.
© 2022 - 2024 — McMap. All rights reserved.
?tools::file_ext
– Manzoni