Objective-C: Reading a file line by line
Asked Answered
J

18

141

What is the appropriate way of dealing with large text files in Objective-C? Let's say I need to read each line separately and want to treat each line as an NSString. What is the most efficient way of doing this?

One solution is using the NSString method:

+ (id)stringWithContentsOfFile:(NSString *)path 
      encoding:(NSStringEncoding)enc 
      error:(NSError **)error 

and then split the lines with a newline separator, and then iterate over the elements in the array. However, this seems fairly inefficient. Is there no easy way to treat the file as a stream, enumerating over each line, instead of just reading it all in at once? Kinda like Java's java.io.BufferedReader.

Juback answered 25/6, 2009 at 14:41 Comment(2)
A bit late, but check out [NSScanner scanUpToString:@"\n" intoString:&read], assuming you want to read each line into the string 'read'.Bugaboo
Please have a look at this similar question. I set up a project which deals with reading files line by line.Zincograph
N
64

That's a great question. I think @Diederik has a good answer, although it's unfortunate that Cocoa doesn't have a mechanism for exactly what you want to do.

NSInputStream allows you to read chunks of N bytes (very similar to java.io.BufferedReader), but you have to convert it to an NSString on your own, then scan for newlines (or whatever other delimiter) and save any remaining characters for the next read, or read more characters if a newline hasn't been read yet. (NSFileHandle lets you read an NSData which you can then convert to an NSString, but it's essentially the same process.)

Apple has a Stream Programming Guide that can help fill in the details, and this SO question may help as well if you're going to be dealing with uint8_t* buffers.

If you're going to be reading strings like this frequently (especially in different parts of your program) it would be a good idea to encapsulate this behavior in a class that can handle the details for you, or even subclassing NSInputStream (it's designed to be subclassed) and adding methods that allow you to read exactly what you want.

For the record, I think this would be a nice feature to add, and I'll be filing an enhancement request for something that makes this possible. :-)


Edit: Turns out this request already exists. There's a Radar dating from 2006 for this (rdar://4742914 for Apple-internal people).

Nettle answered 25/6, 2009 at 18:1 Comment(2)
See Dave DeLong's comprehensive approach to this problem here: stackoverflow.com/questions/3707427#3711079Nettle
It's also possible to use plain NSData and memory mapping. I have created an answer with example code which has the same API as Dave DeLong's NSFileHandle implementation: https://mcmap.net/q/89218/-objective-c-reading-a-file-line-by-lineEleen
A
94

This will work for general reading a String from Text. If you would like to read longer text (large size of text), then use the method that other people here were mentioned such as buffered (reserve the size of the text in memory space).

Say you read a Text File.

NSString* filePath = @""//file path...
NSString* fileRoot = [[NSBundle mainBundle] 
               pathForResource:filePath ofType:@"txt"];

You want to get rid of new line.

// read everything from text
NSString* fileContents = 
      [NSString stringWithContentsOfFile:fileRoot 
       encoding:NSUTF8StringEncoding error:nil];

// first, separate by new line
NSArray* allLinedStrings = 
      [fileContents componentsSeparatedByCharactersInSet:
      [NSCharacterSet newlineCharacterSet]];

// then break down even further 
NSString* strsInOneLine = 
      [allLinedStrings objectAtIndex:0];

// choose whatever input identity you have decided. in this case ;
NSArray* singleStrs = 
      [currentPointString componentsSeparatedByCharactersInSet:
      [NSCharacterSet characterSetWithCharactersInString:@";"]];

There you have it.

Angleaangler answered 17/12, 2010 at 1:23 Comment(3)
i have a 70 mb file , using this code to read file doesn't hep me it increases memory linearly. can any one help me?Padre
This is no response to the question. The question was to read a file line by line to reduce memory usageBargainbasement
This is the most brute-force way to do it, with no care for Memory footprint or performance - What will happen if you run this on a 100MB log file with zillions of lines?Chancelor
N
64

That's a great question. I think @Diederik has a good answer, although it's unfortunate that Cocoa doesn't have a mechanism for exactly what you want to do.

NSInputStream allows you to read chunks of N bytes (very similar to java.io.BufferedReader), but you have to convert it to an NSString on your own, then scan for newlines (or whatever other delimiter) and save any remaining characters for the next read, or read more characters if a newline hasn't been read yet. (NSFileHandle lets you read an NSData which you can then convert to an NSString, but it's essentially the same process.)

Apple has a Stream Programming Guide that can help fill in the details, and this SO question may help as well if you're going to be dealing with uint8_t* buffers.

If you're going to be reading strings like this frequently (especially in different parts of your program) it would be a good idea to encapsulate this behavior in a class that can handle the details for you, or even subclassing NSInputStream (it's designed to be subclassed) and adding methods that allow you to read exactly what you want.

For the record, I think this would be a nice feature to add, and I'll be filing an enhancement request for something that makes this possible. :-)


Edit: Turns out this request already exists. There's a Radar dating from 2006 for this (rdar://4742914 for Apple-internal people).

Nettle answered 25/6, 2009 at 18:1 Comment(2)
See Dave DeLong's comprehensive approach to this problem here: stackoverflow.com/questions/3707427#3711079Nettle
It's also possible to use plain NSData and memory mapping. I have created an answer with example code which has the same API as Dave DeLong's NSFileHandle implementation: https://mcmap.net/q/89218/-objective-c-reading-a-file-line-by-lineEleen
D
34

This should do the trick:

#include <stdio.h>

NSString *readLineAsNSString(FILE *file)
{
    char buffer[4096];

    // tune this capacity to your liking -- larger buffer sizes will be faster, but
    // use more memory
    NSMutableString *result = [NSMutableString stringWithCapacity:256];

    // Read up to 4095 non-newline characters, then read and discard the newline
    int charsRead;
    do
    {
        if(fscanf(file, "%4095[^\n]%n%*c", buffer, &charsRead) == 1)
            [result appendFormat:@"%s", buffer];
        else
            break;
    } while(charsRead == 4095);

    return result;
}

Use as follows:

FILE *file = fopen("myfile", "r");
// check for NULL
while(!feof(file))
{
    NSString *line = readLineAsNSString(file);
    // do stuff with line; line is autoreleased, so you should NOT release it (unless you also retain it beforehand)
}
fclose(file);

This code reads non-newline characters from the file, up to 4095 at a time. If you have a line that is longer than 4095 characters, it keeps reading until it hits a newline or end-of-file.

Note: I have not tested this code. Please test it before using it.

Dapsang answered 25/6, 2009 at 18:18 Comment(8)
just change [result appendFormat:"%s", buffer]; to [result appendFormat:@"%s", buffer];Largeminded
how would you modify the format to accept empty lines, or rather lines consisting of a single newline character?Katzman
This is stopping early for me after 812 lines. The 812th line is "... 3 more", and that's making the reader output empty strings.Mucro
One issue though is that you don't always know your EOL is going to be \n. Users can add documents made from windows machines, and the old legacy code uses \r, so the method might need to look for \n, \r, and \r\nMyel
Does not work if you have empty lines in your file. It gets stuck reading the empty line over and over.Thoraco
I added a check to get past the empty lines: int fscanResult = fscanf(file, "%4095[^\n]%n%*c", buffer, &charsRead); if(fscanResult == 1) { [result appendFormat:@"%s", buffer]; } else { if (feof(file)) { break; } else if(ferror(file) != 0) { break; } fscanf(file, "\n", nil, &charsRead); break; }Thoraco
Can you convert this to Swift and add it as an answer to #24582017?Biplane
If I'm reading the fscanf documentation right, "%4095[^\n]%n%*c" will silently consume and throw away one character with each buffer read. It looks like this format assumes that lines will be shorter than the buffer length.Fizgig
M
12

Mac OS X is Unix, Objective-C is C superset, so you can just use old-school fopen and fgets from <stdio.h>. It's guaranteed to work.

[NSString stringWithUTF8String:buf] will convert C string to NSString. There are also methods for creating strings in other encodings and creating without copying.

Moderate answered 3/8, 2010 at 23:30 Comment(2)
[copying anonymous comment] fgets will include the '\n' character, so you may want to strip that off before converting the string.Moderate
[NSString stringWithUTF8String:buf] will only work if the contents of 'bug' is, indeed, containing UTF8 string (who guaranteed that?) also old-school C APIs fall short of most modern needs for text in general, and text-files in particular. Line-endings vary, and may span more than one byte, zeros (nulls) may appear in a text, multi-byte characters may appear in text, and sizes may be very large. I think the "old-school" APIs only fit "old school" needs.Chancelor
F
10

You can use NSInputStream which has a basic implementation for file streams. You can read bytes into a buffer (read:maxLength: method). You have to scan the buffer for newlines yourself.

Foreland answered 25/6, 2009 at 16:46 Comment(0)
Q
8

The appropriate way to read text files in Cocoa/Objective-C is documented in Apple's String programming guide. The section for reading and writing files should be just what you're after. PS: What's a "line"? Two sections of a string separated by "\n"? Or "\r"? Or "\r\n"? Or maybe you're actually after paragraphs? The previously mentioned guide also includes a section on splitting a string into lines or paragraphs. (This section is called "Paragraphs and Line Breaks", and is linked to in the left-hand-side menu of the page I pointed to above. Unfortunately this site doesn't allow me to post more than one URL as I'm not a trustworthy user yet.)

To paraphrase Knuth: premature optimisation is the root of all evil. Don't simply assume that "reading the whole file into memory" is slow. Have you benchmarked it? Do you know that it actually reads the whole file into memory? Maybe it simply returns a proxy object and keeps reading behind the scenes as you consume the string? (Disclaimer: I have no idea if NSString actually does this. It conceivably could.) The point is: first go with the documented way of doing things. Then, if benchmarks show that this doesn't have the performance you desire, optimise.

Quantify answered 26/6, 2009 at 23:4 Comment(3)
Since you mention CRLF (Windows) line endings: That's actually a case that breaks the Objective-C way of doing things. If you use one of the -stringWithContentsOf* methods followed by -componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet], it sees the \r and \n separately and adds a blank line after each line.Gehlbach
That said, the fgets solution fails on CR-only files. But those are (theoretically) rare nowadays, and fgets does work for both LF and CRLF.Gehlbach
reading whole file into memory may not be "slow" per se - but it WILL slow other things, including your own other code, and will finally render your product undesirable. Throughout my 40+ years of commercial programming and software design - I found the OPPOSITE to be true. What you don't optimize at design-time, can hardly ever be improved afterwards. Careless initial software design is the root of all evil. That said - you can use system capabilities (e.g. memory-mapped-files) to play the "high-level" game while keeping performance high.Chancelor
P
6

A lot of these answers are long chunks of code or they read in the entire file. I like to use the c methods for this very task.

FILE* file = fopen("path to my file", "r");

size_t length;
char *cLine = fgetln(file,&length);

while (length>0) {
    char str[length+1];
    strncpy(str, cLine, length);
    str[length] = '\0';

    NSString *line = [NSString stringWithFormat:@"%s",str];        
    % Do what you want here.

    cLine = fgetln(file,&length);
}

Note that fgetln will not keep your newline character. Also, We +1 the length of the str because we want to make space for the NULL termination.

Phil answered 7/3, 2016 at 19:50 Comment(0)
K
5

Just like @porneL said, the C api is very handy.

NSString* fileRoot = [[NSBundle mainBundle] pathForResource:@"record" ofType:@"txt"];
FILE *file = fopen([fileRoot UTF8String], "r");
char buffer[256];
while (fgets(buffer, 256, file) != NULL){
    NSString* result = [NSString stringWithUTF8String:buffer];
    NSLog(@"%@",result);
}
Kurr answered 28/11, 2013 at 3:47 Comment(1)
I can't see how this is answering any of the OP's need... you don't read lines of text, but rather arbitrarily cut chunks of 256 bytes - no care for text encoding, white-spaces, different line-endings or actually anything else here. C APIs can't cope with any real world "text" needs. They come from an old world, where everyone speaks ASCII English and store computer data in unix-like byte-streams. That's not the need or question hereChancelor
S
4

To read a file line by line (also for extreme big files) can be done by the following functions:

DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile];
NSString * line = nil;
while ((line = [reader readLine])) {
  NSLog(@"read line: %@", line);
}
[reader release];

Or:

DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile];
[reader enumerateLinesUsingBlock:^(NSString * line, BOOL * stop) {
  NSLog(@"read line: %@", line);
}];
[reader release];

The class DDFileReader that enables this is the following:

Interface File (.h):

@interface DDFileReader : NSObject {
    NSString * filePath;

    NSFileHandle * fileHandle;
    unsigned long long currentOffset;
    unsigned long long totalFileLength;

    NSString * lineDelimiter;
    NSUInteger chunkSize;
}

@property (nonatomic, copy) NSString * lineDelimiter;
@property (nonatomic) NSUInteger chunkSize;

- (id) initWithFilePath:(NSString *)aPath;

- (NSString *) readLine;
- (NSString *) readTrimmedLine;

#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block;
#endif

@end

Implementation (.m)

#import "DDFileReader.h"

@interface NSData (DDAdditions)

- (NSRange) rangeOfData_dd:(NSData *)dataToFind;

@end

@implementation NSData (DDAdditions)

- (NSRange) rangeOfData_dd:(NSData *)dataToFind {

    const void * bytes = [self bytes];
    NSUInteger length = [self length];

    const void * searchBytes = [dataToFind bytes];
    NSUInteger searchLength = [dataToFind length];
    NSUInteger searchIndex = 0;

    NSRange foundRange = {NSNotFound, searchLength};
    for (NSUInteger index = 0; index < length; index++) {
        if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) {
            //the current character matches
            if (foundRange.location == NSNotFound) {
                foundRange.location = index;
            }
            searchIndex++;
            if (searchIndex >= searchLength) { return foundRange; }
        } else {
            searchIndex = 0;
            foundRange.location = NSNotFound;
        }
    }
    return foundRange;
}

@end

@implementation DDFileReader
@synthesize lineDelimiter, chunkSize;

- (id) initWithFilePath:(NSString *)aPath {
    if (self = [super init]) {
        fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath];
        if (fileHandle == nil) {
            [self release]; return nil;
        }

        lineDelimiter = [[NSString alloc] initWithString:@"\n"];
        [fileHandle retain];
        filePath = [aPath retain];
        currentOffset = 0ULL;
        chunkSize = 10;
        [fileHandle seekToEndOfFile];
        totalFileLength = [fileHandle offsetInFile];
        //we don't need to seek back, since readLine will do that.
    }
    return self;
}

- (void) dealloc {
    [fileHandle closeFile];
    [fileHandle release], fileHandle = nil;
    [filePath release], filePath = nil;
    [lineDelimiter release], lineDelimiter = nil;
    currentOffset = 0ULL;
    [super dealloc];
}

- (NSString *) readLine {
    if (currentOffset >= totalFileLength) { return nil; }

    NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding];
    [fileHandle seekToFileOffset:currentOffset];
    NSMutableData * currentData = [[NSMutableData alloc] init];
    BOOL shouldReadMore = YES;

    NSAutoreleasePool * readPool = [[NSAutoreleasePool alloc] init];
    while (shouldReadMore) {
        if (currentOffset >= totalFileLength) { break; }
        NSData * chunk = [fileHandle readDataOfLength:chunkSize];
        NSRange newLineRange = [chunk rangeOfData_dd:newLineData];
        if (newLineRange.location != NSNotFound) {

            //include the length so we can include the delimiter in the string
            chunk = [chunk subdataWithRange:NSMakeRange(0, newLineRange.location+[newLineData length])];
            shouldReadMore = NO;
        }
        [currentData appendData:chunk];
        currentOffset += [chunk length];
    }
    [readPool release];

    NSString * line = [[NSString alloc] initWithData:currentData encoding:NSUTF8StringEncoding];
    [currentData release];
    return [line autorelease];
}

- (NSString *) readTrimmedLine {
    return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}

#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block {
  NSString * line = nil;
  BOOL stop = NO;
  while (stop == NO && (line = [self readLine])) {
    block(line, &stop);
  }
}
#endif

@end

The class was done by Dave DeLong

Sideman answered 17/1, 2013 at 13:53 Comment(0)
C
4

As others have answered both NSInputStream and NSFileHandle are fine options, but it can also be done in a fairly compact way with NSData and memory mapping:

BRLineReader.h

#import <Foundation/Foundation.h>

@interface BRLineReader : NSObject

@property (readonly, nonatomic) NSData *data;
@property (readonly, nonatomic) NSUInteger linesRead;
@property (strong, nonatomic) NSCharacterSet *lineTrimCharacters;
@property (readonly, nonatomic) NSStringEncoding stringEncoding;

- (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding;
- (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding;
- (NSString *)readLine;
- (NSString *)readTrimmedLine;
- (void)setLineSearchPosition:(NSUInteger)position;

@end

BRLineReader.m

#import "BRLineReader.h"

static unsigned char const BRLineReaderDelimiter = '\n';

@implementation BRLineReader
{
    NSRange _lastRange;
}

- (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding
{
    self = [super init];
    if (self) {
        NSError *error = nil;
        _data = [NSData dataWithContentsOfFile:filePath options:NSDataReadingMappedAlways error:&error];
        if (!_data) {
            NSLog(@"%@", [error localizedDescription]);
        }
        _stringEncoding = encoding;
        _lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet];
    }

    return self;
}

- (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding
{
    self = [super init];
    if (self) {
        _data = data;
        _stringEncoding = encoding;
        _lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet];
    }

    return self;
}

- (NSString *)readLine
{
    NSUInteger dataLength = [_data length];
    NSUInteger beginPos = _lastRange.location + _lastRange.length;
    NSUInteger endPos = 0;
    if (beginPos == dataLength) {
        // End of file
        return nil;
    }

    unsigned char *buffer = (unsigned char *)[_data bytes];
    for (NSUInteger i = beginPos; i < dataLength; i++) {
        endPos = i;
        if (buffer[i] == BRLineReaderDelimiter) break;
    }

    // End of line found
    _lastRange = NSMakeRange(beginPos, endPos - beginPos + 1);
    NSData *lineData = [_data subdataWithRange:_lastRange];
    NSString *line = [[NSString alloc] initWithData:lineData encoding:_stringEncoding];
    _linesRead++;

    return line;
}

- (NSString *)readTrimmedLine
{
    return [[self readLine] stringByTrimmingCharactersInSet:_lineTrimCharacters];
}

- (void)setLineSearchPosition:(NSUInteger)position
{
    _lastRange = NSMakeRange(position, 0);
    _linesRead = 0;
}

@end
Carnap answered 21/1, 2014 at 19:36 Comment(0)
R
1

This answer is NOT ObjC but C.

Since ObjC is 'C' based, why not use fgets?

And yes, I'm sure ObjC has it's own method - I'm just not proficient enough yet to know what it is :)

Raving answered 25/6, 2009 at 15:19 Comment(9)
If you don't know how to do it in Objective-C, then why say it's not the answer? There are plenty of reasons not to drop down to straight C if you can do it otherwise. For example, C functions handle char* but it takes a lot more work to read something else, such as different encodings. Also, he wants NSString objects. All told, rolling this yourself is not only more code, but also error-prone.Nettle
I agree with you 100%, but I have found that (sometimes) it's better to get an answer that works quickly, implement it and then when a more correct alternative appears, utilize that. This is especially important when prototyping, giving the opportunity to get something to work and then progressing from there.Raving
I just realized that it began "This answer" not "The answer". Doh! I agree, it's definitely better to have a hack that works than elegant code that doesn't. I didn't downvote you, but throwing out a guess w/o knowing what Objective-C may have probably isn't very helpful, either. Even so, making an effort is always better than someone that knows and doesn't help... ;-)Nettle
This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post.Parley
@RoboticCat - 5 years after the fact?Raving
@KevinDTimm:It came up in the review system and so I reviewed it. I didn't notice that it was 5 years old. I can only assume that it was flagged by someone.Parley
@RoboticCat - I'm so disappointed in SO now, having downvotes/critiques of posts I did 5 years ago really pushes me away. I'm not interested in going over posts that I made that long ago and it's a shame that they're being reviewed. The quality of questions (and questioners) has gone down so much in the last year that I question the purpose of coming here. This is not the first time recently that I've had an OLD question reviewed/suggested for edit - I'm glad that a +1 answer is so interesting to so many people </sarcasm>Raving
@KevinDTimm:I agree; I'm only sorry that I didn't spot it was a 5 year-old answer. Maybe this is a meta question; should very old questions from regular users be able to be flagged for review?Parley
I think it is a great meta question - even excluding the 'from regular users' part. An X year old answer can/should be left alone seems reasonable to me.Raving
S
1

I see a lot of these answers rely on reading the whole text file into memory instead of taking it one chunk at a time. Here's my solution in nice modern Swift, using FileHandle to keep memory impact low:

enum MyError {
    case invalidTextFormat
}

extension FileHandle {

    func readLine(maxLength: Int) throws -> String {

        // Read in a string of up to the maximum length
        let offset = offsetInFile
        let data = readData(ofLength: maxLength)
        guard let string = String(data: data, encoding: .utf8) else {
            throw MyError.invalidTextFormat
        }

        // Check for carriage returns; if none, this is the whole string
        let substring: String
        if let subindex = string.firstIndex(of: "\n") {
            substring = String(string[string.startIndex ... subindex])
        } else {
            substring = string
        }

        // Wind back to the correct offset so that we don't miss any lines
        guard let dataCount = substring.data(using: .utf8, allowLossyConversion: false)?.count else {
            throw MyError.invalidTextFormat
        }
        try seek(toOffset: offset + UInt64(dataCount))
        return substring
    }

}

Note that this preserves the carriage return at the end of the line, so depending on your needs you may want to adjust the code to remove it.

Usage: simply open a file handle to your target text file and call readLine with a suitable maximum length - 1024 is standard for plain text, but I left it open in case you know it will be shorter. Note that the command will not overflow the end of the file, so you may have to check manually that you've not reached it if you intend to parse the entire thing. Here's some sample code that shows how to open a file at myFileURL and read it line-by-line until the end.

do {
    let handle = try FileHandle(forReadingFrom: myFileURL)
    try handle.seekToEndOfFile()
    let eof = handle.offsetInFile
    try handle.seek(toFileOffset: 0)

    while handle.offsetInFile < eof {
        let line = try handle.readLine(maxLength: 1024)
        // Do something with the string here
    }
    try handle.close()
catch let error {
    print("Error reading file: \(error.localizedDescription)"
}
Skirting answered 23/2, 2020 at 6:38 Comment(2)
This is great, but.... first it isn't exactly Cocoa (which I was expecting as an answer) plus I don't know what "readLine" means in multi-cultural world... There are several ways to end a line, you know. and what about Unicode-16 or Unicode-32 encoded files? How will this work for a binary file?Chancelor
Well, the question was so old, I thought an update to Swift might be more useful than repeating what other people had said. As for the issue with line ends, note that the third line of the readLine function contains String(data: encoding:) - if it's not UTF8, just change the second parameter to the correct value. (N.B. all files are effectively binary if you open them as data.)Skirting
M
0

from @Adam Rosenfield's answer, the formatting string of fscanf would be changed like below:

"%4095[^\r\n]%n%*[\n\r]"

it will work in osx, linux, windows line endings.

Metropolitan answered 27/1, 2014 at 9:16 Comment(0)
P
0

Using category or extension to make our life a bit easier.

extension String {

    func lines() -> [String] {
        var lines = [String]()
        self.enumerateLines { (line, stop) -> () in
            lines.append(line)
        }
        return lines
    }

}

// then
for line in string.lines() {
    // do the right thing
}
Preliminaries answered 7/6, 2015 at 4:57 Comment(0)
K
0

I found response by @lukaswelte and code from Dave DeLong very helpful. I was looking for a solution to this problem but needed to parse large files by \r\n not just \n.

The code as written contains a bug if parsing by more than one character. I've changed the code as below.

.h file:

#import <Foundation/Foundation.h>

@interface FileChunkReader : NSObject {
    NSString * filePath;

    NSFileHandle * fileHandle;
    unsigned long long currentOffset;
    unsigned long long totalFileLength;

    NSString * lineDelimiter;
    NSUInteger chunkSize;
}

@property (nonatomic, copy) NSString * lineDelimiter;
@property (nonatomic) NSUInteger chunkSize;

- (id) initWithFilePath:(NSString *)aPath;

- (NSString *) readLine;
- (NSString *) readTrimmedLine;

#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block;
#endif

@end

.m file:

#import "FileChunkReader.h"

@interface NSData (DDAdditions)

- (NSRange) rangeOfData_dd:(NSData *)dataToFind;

@end

@implementation NSData (DDAdditions)

- (NSRange) rangeOfData_dd:(NSData *)dataToFind {

    const void * bytes = [self bytes];
    NSUInteger length = [self length];

    const void * searchBytes = [dataToFind bytes];
    NSUInteger searchLength = [dataToFind length];
    NSUInteger searchIndex = 0;

    NSRange foundRange = {NSNotFound, searchLength};
    for (NSUInteger index = 0; index < length; index++) {
        if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) {
            //the current character matches
            if (foundRange.location == NSNotFound) {
                foundRange.location = index;
            }
            searchIndex++;
            if (searchIndex >= searchLength)
            {
                return foundRange;
            }
        } else {
            searchIndex = 0;
            foundRange.location = NSNotFound;
        }
    }

    if (foundRange.location != NSNotFound
        && length < foundRange.location + foundRange.length )
    {
        // if the dataToFind is partially found at the end of [self bytes],
        // then the loop above would end, and indicate the dataToFind is found
        // when it only partially was.
        foundRange.location = NSNotFound;
    }

    return foundRange;
}

@end

@implementation FileChunkReader

@synthesize lineDelimiter, chunkSize;

- (id) initWithFilePath:(NSString *)aPath {
    if (self = [super init]) {
        fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath];
        if (fileHandle == nil) {
            return nil;
        }

        lineDelimiter = @"\n";
        currentOffset = 0ULL; // ???
        chunkSize = 128;
        [fileHandle seekToEndOfFile];
        totalFileLength = [fileHandle offsetInFile];
        //we don't need to seek back, since readLine will do that.
    }
    return self;
}

- (void) dealloc {
    [fileHandle closeFile];
    currentOffset = 0ULL;

}

- (NSString *) readLine {
    if (currentOffset >= totalFileLength)
    {
        return nil;
    }

    @autoreleasepool {

        NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding];
        [fileHandle seekToFileOffset:currentOffset];
        unsigned long long originalOffset = currentOffset;
        NSMutableData *currentData = [[NSMutableData alloc] init];
        NSData *currentLine = [[NSData alloc] init];
        BOOL shouldReadMore = YES;


        while (shouldReadMore) {
            if (currentOffset >= totalFileLength)
            {
                break;
            }

            NSData * chunk = [fileHandle readDataOfLength:chunkSize];
            [currentData appendData:chunk];

            NSRange newLineRange = [currentData rangeOfData_dd:newLineData];

            if (newLineRange.location != NSNotFound) {

                currentOffset = originalOffset + newLineRange.location + newLineData.length;
                currentLine = [currentData subdataWithRange:NSMakeRange(0, newLineRange.location)];

                shouldReadMore = NO;
            }else{
                currentOffset += [chunk length];
            }
        }

        if (currentLine.length == 0 && currentData.length > 0)
        {
            currentLine = currentData;
        }

        return [[NSString alloc] initWithData:currentLine encoding:NSUTF8StringEncoding];
    }
}

- (NSString *) readTrimmedLine {
    return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}

#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block {
    NSString * line = nil;
    BOOL stop = NO;
    while (stop == NO && (line = [self readLine])) {
        block(line, &stop);
    }
}
#endif

@end
Kinglet answered 19/2, 2016 at 16:31 Comment(0)
F
0

I am adding this because all other answers I tried fell short one way or another. The following method can handle large files, arbitrary long lines, as well as empty lines. It has been tested with actual content and will strip out newline character from the output.

- (NSString*)readLineFromFile:(FILE *)file
{
    char buffer[4096];
    NSMutableString *result = [NSMutableString stringWithCapacity:1000];

    int charsRead;
    do {
        if(fscanf(file, "%4095[^\r\n]%n%*[\n\r]", buffer, &charsRead) == 1) {
            [result appendFormat:@"%s", buffer];
        }
        else {
            break;
        }
    } while(charsRead == 4095);

    return result.length ? result : nil;
}

Credit goes to @Adam Rosenfield and @sooop

Fizgig answered 8/2, 2018 at 9:35 Comment(0)
F
-2

Here's a nice simple solution i use for smaller files:

NSString *path = [[NSBundle mainBundle] pathForResource:@"Terrain1" ofType:@"txt"];
NSString *contents = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:nil];
NSArray *lines = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"\r\n"]];
for (NSString* line in lines) {
    if (line.length) {
        NSLog(@"line: %@", line);
    }
}
Furfuran answered 25/6, 2011 at 8:21 Comment(1)
He was asking about how to read a line at a time so that it doesn't read the entire contents into memory. Your solution creates a string with the entire contents then splits it into lines.Quasijudicial
W
-7

Use this script, it works great:

NSString *path = @"/Users/xxx/Desktop/names.txt";
NSError *error;
NSString *stringFromFileAtPath = [NSString stringWithContentsOfFile: path
                                                           encoding: NSUTF8StringEncoding
                                                              error: &error];
if (stringFromFileAtPath == nil) {
    NSLog(@"Error reading file at %@\n%@", path, [error localizedFailureReason]);
}
NSLog(@"Contents:%@", stringFromFileAtPath);
Wondering answered 20/9, 2010 at 22:42 Comment(1)
What @fisninear is saying is that this does not address the OP's desire to reduce memory usage. The OP wasn't asking how to use the method (which loads the entire file into memory), he was asking for memory-friendly alternatives for large text files. It's quite possible to have multi-gigabyte text files, which obviously creates a memory problem.Goglet

© 2022 - 2024 — McMap. All rights reserved.