Dealing with readLines() function in R
Asked Answered
O

3

9

I'm experiencing a very hard time with R lately.

I'm not an expert user but I'm trying to use R to read a plain text (.txt) file and capture each line of it. After that, I want to deal with those lines and make some breaks and changes in the text.

Here is the code I'm using:

fileName <- "C:/MyFolder/TEXT_TO_BE_PROCESSED.txt"
con <- file(fileName,open="r")
line <- readLines(con)
close(con)

It reads the text and the line breaks perfectly. But I don't understand how the created object line works.

The object line created with this code has the class: character and the length [57]. If I type line[1] it shows exactly the text of the first line. But if I type

length(line[1])

it returns me [1].

I would like to know how can I transform this string of length == 1 that contains 518 in fact into a string of length == 518.

Does anyone know what I'm doing wrong?

I don't need to necessarily use the readLines() function. I've did some research and also found the function scan(), but I ended with the same situation of a immutable string of 518 characters but length == 1.

Hope I've been clear enough about my doubt. Sorry for the bad English.

Osmium answered 11/4, 2014 at 0:38 Comment(5)
readLines returns "A character vector of length the number of lines read." (from ?readLines). That's why each line is length 1. Have you tried read.csv or read.table for this?Sapotaceous
Please provide some of the data and what you expect as a result. It sounds like you just need strsplitSapotaceous
Try nchar(line[1]), it'll give you the number of characters in the first element of list (i.e., the first line of your file). length(list) tells you the number of lines retrieved from the file; by giving it length(list[1]), you're asking it the number of elements in a slice of list, a slice that happens to have a single element in it (which may be a string of length 518 or whatever).Tercentenary
@Tercentenary The nchar(line[1]) returns me the number os characters on the string. But I wannna know how to access those characters individually. The strsplit function does not satisfy my needs. The best way to describe what I wanna do is to say that I want to read every line of line (i.e.: line[1], line[2], ... , line[n]) character by character (blank or not) and make some rearrangements.Osmium
Without a better idea of what exactly you want to break a string into, my guidance is merely ?substr and ?regexp.Tercentenary
S
6

Suppose txt is the text from line 1 of your data that you read in with readLines.
Then if you want to split it into separate strings, each of which is a word, then you can use strsplit, splitting at the space between each word.

> txt <- paste0(letters[1:10], LETTERS[1:10], collapse = " ")
> txt
## [1] "aA bB cC dD eE fF gG hH iI jJ"   ## character vector of length 1
> length(txt)
[1] 1
> newTxt <- unlist(strsplit(txt, split = "\\s"))  ## split the string at the spaces
> newTxt
## [1] "aA" "bB" "cC" "dD" "eE" "fF" "gG" "hH" "iI" "jJ"
## now the text is a character vector of length 10  
> length(newTxt)
[1] 10
Sapotaceous answered 11/4, 2014 at 3:41 Comment(3)
Thanks, but that is not exactly what I need. I dont want do split the vector in words. For me the blank spaces are really important and i want each blank space to count as a character too. The final product I'm looking for, in your example, would be a string of 29 characters.Osmium
Okay, then instead of split = "\\s", use split = ""Sapotaceous
The solution suggested by @Richard Scriven solved my problem. I'm really grateful. Changing the split argument was what I needed to get it done.Osmium
F
5

You can firstly condense that code into a single line, the other 3 lines just make objects that you don't need.

line <- readLines("C:/MyFolder/TEXT_TO_BE_PROCESSED.txt")

The if you want to know how many space separated words per line

words <- sapply(line,function(x) length(unlist(strsplit(x,split=" "))))

If you leave out the length argument in the above you get a list of character vectors of the words from each line.

Faker answered 11/4, 2014 at 1:35 Comment(2)
I've tried this solutions. Leaving out the length argument it returns me a variable "words" that is a list of 57. If I type words[1]. It returns me the whole first line splited word by word. But I can't access a specific words like I want example.: words[1][2].Osmium
then you need to look up the difference between [ and [[. To get the first word of the first list entry you want words[[1]][1]Faker
E
1

How about:

con <- file(fileName, open='r')
text <- readLines(con)[[1]]

to get the text of the first line of the file.

Etana answered 4/6, 2016 at 16:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.