Does fread fail for large files?
Asked Answered
C

5

7

I have to analyze a 16 GB file. I am reading through the file sequentially using fread() and fseek(). Is it feasible? Will fread() work for such a large file?

Conundrum answered 29/9, 2010 at 21:12 Comment(3)
Can you clarify the language used?Laius
I'm gonna go ahead and guess CMalemute
do fread and fseek even exist as standards in another language?Malemute
P
6

You don't mention a language, so I'm going to assume C.

I don't see any problems with fread, but fseek and ftell may have issues.

Those functions use long int as the data type to hold the file position, rather than something intelligent like fpos_t or even size_t. This means that they can fail to work on a file over 2 GB, and can certainly fail on a 16 GB file.

You need to see how big long int is on your platform. If it's 64 bits, you're fine. If it's 32, you are likely to have problems when using ftell to measure distance from the start of the file.

Consider using fgetpos and fsetpos instead.

Prognostic answered 29/9, 2010 at 21:35 Comment(2)
The comment "..you can't legally use non-zero fseek offsets without a call to ftell" is only true for files opened in text mode. Files opened in binary mode can use SEEK_SET and SEEK_CUR with arbitrary offsets.Humpbacked
@caf: Thanks. My answer has been changed as you suggested.Prognostic
C
6

Thanks for the response. I figured out where I was going wrong. fseek() and ftell() do not work for files larger than 4GB. I used _fseeki64() and _ftelli64() and it is working fine now.

Conundrum answered 29/9, 2010 at 23:0 Comment(0)
M
3

If implemented correctly this shouldn't be a problem. I assume by sequentially you mean you're looking at the file in discrete chunks and advancing your file pointer.

Check out http://www.computing.net/answers/programming/using-fread-with-a-large-file-/10254.html

It sounds like he was doing nearly the same thing as you.

Malemute answered 29/9, 2010 at 21:23 Comment(0)
C
2

It depends on what you want to do. If you want to read the whole 16GB of data in memory, then chances are that you'll run out of memory or application heap space.

Rather read the data chunk by chunk and do processing on those chunks (and free resources when done).

But, besides all this, decide which approach you want to do (using fread() or istream, etc.) and do some test cases to see which works better for you.

Circumscribe answered 29/9, 2010 at 21:19 Comment(0)
A
2

If you're on a POSIX-ish system, you'll need to make sure you've built your program with 64-bit file offset support. POSIX mandates (or at least allows, and most systems enforce this) the implementation to deny IO operations on files whose size don't fit in off_t, even if the only IO being performed is sequential with no seeking.

On Linux, this means you need to use -D_FILE_OFFSET_BITS=64 on the gcc command line.

Anhydrite answered 29/9, 2010 at 22:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.