I was wondering if anyone out there knew how this could be done in PHP. I am running a script that involves opening a file, taking the first 1000 lines, doing some stuff with those lines, then the php file opens another instance of itself to take the next thousand lines and so on until it reaches the end of the file. I'm using splfileobject so that I can seek to a certain line, which allows me to break this up into 1000 line chunks quite well. The biggest problem that I'm having is with performance. I'm dealing with files that have upwards of 10,000,000 lines and while it does the first 10,000 lines or so quite fast, there is a huge exponential slowdown after that point that I think is just having to seek to that point.
What I would like to do is read the first thousand lines, then just delete them from the file so that my script is always reading the first thousand lines. Is there a way to do this without reading the rest of the file into memory. Other solutions I have seen involve reading each line into an array then getting rid of the first X entries, but with ten million lines that will eat up too much memory and time.
If anyone has a solution or other suggestions that would speed up the performance, it would be greatly appreciated.
split
-ing your file into several n thousand line files, or is there some reason it must be one big file? – FormlessSplFileObject
'sseek()
method, the file is still being read all the way up to where you're seeking to (each line is read then thrown away). It is not the same asfseek()
-ing to a byte offset. – FormlessSplFileObject::seek()
is the culprit. It should be taking in the order of second(s) at most to read 10,000,000+ lines. – FormlessSplFileObject
's fault (especially on Windows), but without you being able to show that it is the cause I would remain skeptical. – Formlesstell()
or whatever it is in spfileobject. That's a simple count of bytes to skip over, and will be very fast since PHP doesn't have to scan/count line endings. Once you've seeked to the proper location, THEN you can start counting lines. – Trinitarianism