Mapping a flat text file
Asked Answered
C

3

1

In a text file, lines are detected by \n at the end of each line. For this purpose, it is necessary to read the entire file, and this is a big problem for large files (say 2GB). I am looking for a method to read a single line without walking through the entire file (though I know it should be a complicated process).

  1. The first way I know is to use fseek() with offset; but it is not practical.
  2. Creating a flat file of key/value; but I am not sure if there is a way to avoid loading the entire into RAM (it should be something like reading an array in php).
  3. Alternatively, can we make some numbers at the beginning of each line to be read. I mean, is it possible to read the first digits at the beginning of the line by skipping the line contents (going to the next line).

    768| line content is here
    769| another line
    770| something
    

If reading only the first digits, the total data which should be read is not much even for large files.

Cherianne answered 10/10, 2011 at 7:55 Comment(4)
You can always read the entire file line-by-line in a loop, extracting the starting digits and discarding the remaining line. However you need to be mindful of the performance. For a 2 GB file, this can take quite some time.Radioisotope
All I am looking for is to avoid reading the entire file line by line.Cherianne
Do you need to read specific lines that can be indexed on line number?. If so just do a binary search. Read (say) 200 characters in the middle of the file to find out a line number. Then repeat in either of the halves until you get to the right line.Curd
My guess is that you'll have to read the whole file - or at least until you find the line you're interested in - unless all lines have the same length (which I can see they do not).Radioisotope
C
1

Do you need to read specific lines that can be indexed on line number?. If so just do a binary search. Read (say) 200 characters in the middle of the file to find out a line number. Then repeat in either of the halves until you get to the right line.

Curd answered 10/10, 2011 at 8:8 Comment(2)
The lines are variable lenght. How would you find out the line number by reading 200 characters in the middle of the file?Radioisotope
If you don't find a line number in the 200 characters, just keep reading forward (or backwards) until you do. Then when you do have a line number, continue with the binary search algorithm :)Guthrie
K
0

I think there are no simple way to do what you want. Records have variable length and no length could be determined in advance, right?

If file is always the same (or at least not modified frequently), I'd put it to database, or at least create index file (record number: offset) and use that fseek()

Klinger answered 10/10, 2011 at 8:18 Comment(2)
What would you suggest to quickly create and update the index file?Cherianne
If you write file line by line, as you mentioned in comment below, you may create index in parallel. Just accumulate offsets (previous data length) and store for each line in your file fixed-length record (using pack for example) into index file. You can wrap it all in single class and use wherever. To read string you just calculate offset in index : linenum * recordlenght, fseek there and read recordlength bytes, unpack, fseek in text file and read the line. It may seem complicated but is a common approach to indexing.Klinger
R
0

Alternatively you can index your text file initially and then proceed with your daily operation of picking up single file lines based on your index file. You can find how to index your text file here or here. Indexing a text file is no different from indexing a CSV or variable record file.

Rozanneroze answered 27/12, 2019 at 15:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.