fseek() by line, not bytes?
Asked Answered
H

5

18

I have a script that parses large files line by line. When it encounters an error that it can't handle, it stops, notifying us of the last line parsed.

Is this really the best / only way to seek to a specific line in a file? (fseek() is not usable in my case.)

<?php

for ($i = 0; $i < 100000; $i++)
    fgets($fp); // just discard this

I don't have a problem using this, it is fast enough - it just feels a bit dirty. From what I know about the underlying code, I don't imagine there is a better way to do this.

Homemaking answered 27/8, 2010 at 22:35 Comment(0)
S
37

An easy way to seek to a specific line in a file is to use the SplFileObject class, which supports seeking to a line number (seek()) or byte offset (fseek()).

$file = new SplFileObject('myfile.txt');
$file->seek(9999);     // Seek to line no. 10,000
echo $file->current(); // Print contents of that line

In the background, seek() just does what your PHP code did (except, in C code).

Straitlaced answered 30/9, 2011 at 21:32 Comment(7)
Nice! Came across this a while ago and started using it.Homemaking
In this case, seek will directly read line 10,000, without walking through lines 1 - 9,999 to reach the given line?Yusem
@Ali: no, how do you think it knows where the lines start? It reads through the file. There are other alternatives if you do want to directly seek to a line but they involve potentially complex systems to keep track of where lines start in the file.Straitlaced
could you please give me some hits? I searched a lot to find a practical way to read a line without reading the entire file (considering big files of GB size).Yusem
@Ali: If I recall correctly there is a question here on SO with details of one implementation, or I could share the details of my own (though comments don't offer enough space). Sorry I don't have a link for the question that I (think that I) saw.Straitlaced
Thanks for your kind attention salsathe, Please take a look at this question https://mcmap.net/q/669295/-mapping-a-flat-text-fileYusem
There seems to be a bug in large files, after a certain number seek will just stay on the same line, creating infinite loops if used with while ->eolDeclamatory
W
5

If you only have the line number to go on, there is no other method of finding the line. Files are not line based (or even character based), so there is no way to simply jump to a specific line in a file.

There might be other ways of reading the lines in the file that might be slightly faster, like reading larger chunks of the file into a buffer and read lines from that, but you could only hope for it to be a few percent faster. Any method to find a specific line in a file still has to read all data up to that line.

Want answered 27/8, 2010 at 22:44 Comment(1)
Yeah, I figured as much. Somehow I thought that a nice fseekbyline() that was just a wrapper for the C code would make me feel better. heh.Homemaking
T
4

I know it is late for posting but it can help some ppl I did a function like fseekbyline one day ...

function GoToLine($handle,$line)
{
  fseek($handle,0);  // seek to 0
  $i = 0;
  $bufcarac = 0;                    

  for($i = 1;$i<$line;$i++)
  {
    $ligne = fgets($handle);
    $bufcarac += strlen($ligne);  // in the end bufcarac will contains all caracters until the line
  }  

  fseek($handle,$bufcarac);
}

there is no error system, if you wanna go to the line <1 or 203 but the file is empty ... you will get nothing good.

same if you wanna go out of eot

Therefor answered 30/9, 2011 at 21:22 Comment(1)
By the time PHP has gone through the for loop, the pointer will be where you have desired. Simply calling fgets($handle) is enough to put in the for loop, and you can avoid memory loading up in the $bufcarac and $ligne variables.Souther
T
1
rewind($handle);

for ($i=0; $i < $desired_line; $i++) {
    fgetcsv($handle, 1000, ",");
}

This is working for me while I need to rewind to a specific line multiple times in my script.

I am not sure if this eats up memory or speed, but it does the trick.

Toreador answered 22/7, 2014 at 14:7 Comment(1)
This is short and to the point. Although the fgetcsv is specific to CSV files rather than any text file. It's helpful for me at least.Souther
L
0

If I understand correctly, you want to seek to the specific line at some point after you have found an error. If that is the case, you probably store or print the line-number of the bad line somewhere, depending on what you mean by "notify".

Unless you really mean that you cannot use fseek()*, what you can do is to also store/print the position in the file where the bad line starts. Then you can fseek().

* How, in that case, would fseekbyline() be usable if it existed?

Lathy answered 28/8, 2010 at 1:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.