I have a requirement to parse data files in "txf" format. The files may contain more than 1000 entries. Since the format is well defined like JSON, I wanted to make a generic parser like JSON, which can serialise and deserialise txf files.
On contrary to JSON, the mark up doesn't have a way to identify an object or an array. If an entry with same tag occurs, we need to consider it as an array.
#
Marks the start of an object.$
Marks the members of an object/
Marks the end of an object
Following is a sample "txf" file
#Employees
$LastUpdated=2015-02-01 14:01:00
#Employee
$Id=1
$Name=Employee 01
#Departments
$LastUpdated=2015-02-01 14:01:00
#Department
$Id=1
$Name=Department Name
/Department
/Departments
/Employee
#Employee
/Employee
/Employees
I was able to create a generic TXF Parser using NSScanner. But with more entries the performance needs more tweaking.
I wrote the foundation object obtained as plist
and compared its performance again the parser I wrote. My parser is around 10 times slower than plist
parser.
While plist
file size is 5 times more than txf
and has more markup characters, I feel that there is a lot of room for optimization.
Any help in that direction is highly appreciated.
EDIT : Including the parsing code
static NSString *const kArray = @"TXFArray";
static NSString *const kBodyText = @"TXFText";
@interface TXFParser ()
/*Temporary variable to hold values of an object*/
@property (nonatomic, strong) NSMutableDictionary *dict;
/*An array to hold the hierarchial data of all nodes encountered while parsing*/
@property (nonatomic, strong) NSMutableArray *stack;
@end
@implementation TXFParser
#pragma mark - Getters
- (NSMutableArray *)stack{
if (!_stack) {
_stack = [NSMutableArray new];
}return _stack;
}
#pragma mark -
- (id)objectFromString:(NSString *)txfString{
[txfString enumerateLinesUsingBlock:^(NSString *string, BOOL *stop) {
if ([string hasPrefix:@"#"]) {
[self didStartParsingTag:[string substringFromIndex:1]];
}else if([string hasPrefix:@"$"]){
[self didFindKeyValuePair:[string substringFromIndex:1]];
}else if([string hasPrefix:@"/"]){
[self didEndParsingTag:[string substringFromIndex:1]];
}else{
//[self didFindBodyValue:string];
}
}]; return self.dict;
}
#pragma mark -
- (void)didStartParsingTag:(NSString *)tag{
[self parserFoundObjectStartForKey:tag];
}
- (void)didFindKeyValuePair:(NSString *)tag{
NSArray *components = [tag componentsSeparatedByString:@"="];
NSString *key = [components firstObject];
NSString *value = [components lastObject];
if (key.length) {
self.dict[key] = value?:@"";
}
}
- (void)didFindBodyValue:(NSString *)bodyString{
if (!bodyString.length) return;
bodyString = [bodyString stringByTrimmingCharactersInSet:[NSCharacterSet illegalCharacterSet]];
if (!bodyString.length) return;
self.dict[kBodyText] = bodyString;
}
- (void)didEndParsingTag:(NSString *)tag{
[self parserFoundObjectEndForKey:tag];
}
#pragma mark -
- (void)parserFoundObjectStartForKey:(NSString *)key{
self.dict = [NSMutableDictionary new];
[self.stack addObject:self.dict];
}
- (void)parserFoundObjectEndForKey:(NSString *)key{
NSDictionary *dict = self.dict;
//Remove the last value of stack
[self.stack removeLastObject];
//Load the previous object as dict
self.dict = [self.stack lastObject];
//The stack has contents, then we need to append objects
if ([self.stack count]) {
[self addObject:dict forKey:key];
}else{
//This is root object,wrap with key and assign output
self.dict = (NSMutableDictionary *)[self wrapObject:dict withKey:key];
}
}
#pragma mark - Add Objects after finding end tag
- (void)addObject:(id)dict forKey:(NSString *)key{
//If there is no value, bailout
if (!dict) return;
//Check if the dict already has a value for key array.
NSMutableArray *array = self.dict[kArray];
//If array key is not found look for another object with same key
if (array) {
//Array found add current object after wrapping with key
NSDictionary *currentDict = [self wrapObject:dict withKey:key];
[array addObject:currentDict];
}else{
id prevObj = self.dict[key];
if (prevObj) {
/*
There is a prev value for the same key. That means we need to wrap that object in a collection.
1. Remove the object from dictionary,
2. Wrap it with its key
3. Add the prev and current value to array
4. Save the array back to dict
*/
[self.dict removeObjectForKey:key];
NSDictionary *prevDict = [self wrapObject:prevObj withKey:key];
NSDictionary *currentDict = [self wrapObject:dict withKey:key];
self.dict[kArray] = [@[prevDict,currentDict] mutableCopy];
}else{
//Simply add object to dict
self.dict[key] = dict;
}
}
}
/*Wraps Object with a key for the serializer to generate txf tag*/
- (NSDictionary *)wrapObject:(id)obj withKey:(NSString *)key{
if (!key ||!obj) {
return @{};
}
return @{key:obj};
}
EDIT 2:
A sample TXF file with more than 1000 entries.
CFDictionary
andCFStringRef
instead ofNSDisctionary
andNSString
in performance sensitive places. – Evangelista