How to effectively test a fixed length flat file parser using MSpec?

Asked 16/8, 2011 at 13:6 Answered 24/8, 2011 at 5:53

Solved c#unit-testing parsing flat-file mspec

I have this method signature: List<ITMData> Parse(string[] lines)

ITMData has 35 properties.

How would you effectively test such a parser?

Questions:

Should I load the whole file (May I use System.IO)?
Should I put a line from the file into a string constant?
Should I test one or more lines
Should I test each property of ITMData or should I test the whole object?
What about the naming of my test?

EDIT

I changed the method signature to ITMData Parse(string line).

Test Code:

[Subject(typeof(ITMFileParser))]
public class When_parsing_from_index_59_to_79
{
    private const string Line = ".........";
    private static ITMFileParser _parser;
    private static ITMData _data;

    private Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    private It should_get_fldName = () => _data.FldName.ShouldBeEqualIgnoringCase("HUMMELDUMM");
}

EDIT 2

I am still not sure if I should test only one property per class. In my opinion this allows me to give more information for the specification namely that when I parse a single line from index 59 to index 79 I get fldName. If I test all properties within one class I loss this information. Am I overspecifying my tests?

My Tests now looks like this:

[Subject(typeof(ITMFileParser))]
public class When_parsing_single_line_from_ITM_file
{
    const string Line = ""

    static ITMFileParser _parser;
    static ITMData _data;

    Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    ...

}

Selfeffacing answered 16/8, 2011 at 13:7 Comment(7)

What do you mean by "effectively" exactly? Do you want to minimise development time of the unit tests? – Checani 16/8, 2011 at 13:29

well i.e. testing with one string or with many strings? testing with a string that contains only the values I want to assert or take the whole string. – Selfeffacing 16/8, 2011 at 14:1

I'm still not 100% sure what exactly the question is, but I would create a set of different unit tests. One test method might pass in a specific single string and assert that it has been correctly parsed into the relevant properties on the resulting IMTData object. Perhaps another test that tests that given 20 lines, it produces a List of 20 items. I might have other methods for specific difficult cases. This is pretty much what @Kenny was describing, basic unit testing stuff. Take the requirements for this Parse method, and reverse engineer it into a few test cases that proves correctness. – Checani 16/8, 2011 at 14:17

I don't know how you want to test it, but the method signature is a bug. It doesn't make sense to accept a string array for parsing; that encourages people to read a whole file into memory and put it into an array, which is almost certainly a silly thing to do. It would probably be more appropriate to either accept a TextReader or an IEnumerable<string>. (Returning a list is fishy, too, if you can get away with doing the parsing lazily line-by-line.) – Ducharme 18/8, 2011 at 21:7

Reading a whole file into memory is not silly. It depends on the file size. I have to work with files which are about 20 MB big. Ofc huge files should not be loaded into memory. – Selfeffacing 20/8, 2011 at 7:54

@Daniel B is it ok to test the specification that when index 59 to 79 is parsed, it should get the fldName property or is this a kind of over-specification? I am not sure if I should test the whole object once or test each property in a separate test class. – Selfeffacing 23/8, 2011 at 18:24

I've posted an answer below, hopefully it should answer your question. – Checani 24/8, 2011 at 5:54

Here's what I would normally do if I'm facing such a problem:

One short disclaimer in advance: I think I would more go down the "integration testing" or "testing the parser as a whole" route rather than testing individual lines. In the past I've more than once faced the situation where lots of implementation details leaked into my tests and forced me to change the tests often when I changed implementation details. Typical case of overspecification I guess ;-/

I wouldn't include file loading in the parser. As @mquander suggested I would rather go with a TextReader or an IEnumerable as the input parameter instead. This will result in way faster tests since you're able to specify the parser input in-memory and don't have to touch the file system.
I'm not a big fan of hand rolling test data, so in most cases I'm using embedded resources and the ResourceManager to load test data directly from the specification assembly via assembly.GetManifestResource(). I typically have a bunch of extension methods in my solution to streamline the reading of resources (something like TextReader TextResource.Load("NAME_OF_SOME_RESOURCE")).
Regarding MSpec: I'm using one class per file to parse. For each property that is tested in the parsed result I've a separate (It)assertion. These are normally one liners, so the additional amount of coding isn't that big. In terms of documentation and diagnostics imho it's a huge plus since when a property isn't parsed correctly you can see directly which assertion failed without having to look into the source or searching for line numbers. It also appears in your MSpec result file. Besides, you don't hide other failed assertions (the situation where you fix one assertion only to see the spec fail on the next line with the next assertion). This of course forces you to think more about the wording you use in your specifications but for me that's also a huge plus since I'm a proponent of the idea that language forms thinking. In other words, if you've no clue how to frackin name your assertion there's probably something fishy either about your specification or your implementation.
Regarding your method signature for the parser: I wouldn't return a concrete type like List<T> or an array and I would also suggest not to return the mutable List<T> type. What you're basically saying here is: "Hey, you can muck around with the parsing result after I've finished" which in most cases is probably what you don't want. I would suggest to return IEnumerable<T> instead (or ICollection<T> if you REALLY need to modify it afterwards)

Pavla answered 16/8, 2011 at 13:7 Comment(1)

thanks so far :) 1. I parse a single line, because every line is parsed in the same way 2. I will concerned about putting my test data into resources. 3. I can not use one class per file type, because there is some special logic which depends on configuration. So some of the properties has to be tested seperately. 4. I choosed to parse only a single line. – Selfeffacing 31/8, 2011 at 20:53

Should I load the whole file (May I use System.IO)?

If you do this, it's no longer a unit test -- it becomes an integration or regression test. You can do this, if you expect it to show possible bugs that a unit test wouldn't. But that's not too likely.

You're probably better off with unit tests, at least to start.

Should I put a line from the file into a string constant?

If you plan to write more than one test that uses the same input line, then sure. But personally, I would probably tend to write a bunch of different tests, with each one passing a different input string. At that point, there's not much reason to make a constant (unless it's a local constant, declared inside the test method).

Should I test one or more lines?

You didn't specify, but I'm going to assume that your output is one-for-one with your input -- that is, if you pass in three strings, you'll get three ITMDatas returned. In that case, the need for multi-line tests would be limited.

It's almost always worth testing the degenerate case, which in this case would be an empty string array (zero lines). And it's probably worth having at least one test that has more than one line, just so you can make sure there are no silly mistakes in your iteration.

However, if your output is one-for-one with your input, then you really have another method wanting to get out -- you should have a ParseSingleLine method. Then your Parse would be nothing more than iterating lines and calling ParseSingleLine. You would still want a handful of tests for Parse, but most of your testing would focus around ParseSingleLine.

Capablanca answered 18/8, 2011 at 21:0 Comment(0)

Here's what I would normally do if I'm facing such a problem:

I wouldn't include file loading in the parser. As @mquander suggested I would rather go with a TextReader or an IEnumerable as the input parameter instead. This will result in way faster tests since you're able to specify the parser input in-memory and don't have to touch the file system.
I'm not a big fan of hand rolling test data, so in most cases I'm using embedded resources and the ResourceManager to load test data directly from the specification assembly via assembly.GetManifestResource(). I typically have a bunch of extension methods in my solution to streamline the reading of resources (something like TextReader TextResource.Load("NAME_OF_SOME_RESOURCE")).
Regarding MSpec: I'm using one class per file to parse. For each property that is tested in the parsed result I've a separate (It)assertion. These are normally one liners, so the additional amount of coding isn't that big. In terms of documentation and diagnostics imho it's a huge plus since when a property isn't parsed correctly you can see directly which assertion failed without having to look into the source or searching for line numbers. It also appears in your MSpec result file. Besides, you don't hide other failed assertions (the situation where you fix one assertion only to see the spec fail on the next line with the next assertion). This of course forces you to think more about the wording you use in your specifications but for me that's also a huge plus since I'm a proponent of the idea that language forms thinking. In other words, if you've no clue how to frackin name your assertion there's probably something fishy either about your specification or your implementation.
Regarding your method signature for the parser: I wouldn't return a concrete type like List<T> or an array and I would also suggest not to return the mutable List<T> type. What you're basically saying here is: "Hey, you can muck around with the parsing result after I've finished" which in most cases is probably what you don't want. I would suggest to return IEnumerable<T> instead (or ICollection<T> if you REALLY need to modify it afterwards)

Pavla answered 16/8, 2011 at 13:7 Comment(1)

I typically try to consider common success and fail scenarios, along with edge cases. Requirements are also helpful for setting up appropriate use cases. Consider Pex for enumerating various scenarios.

Mother answered 16/8, 2011 at 13:11 Comment(0)

Regarding your newer questions:

Should I test each property of ITMData or should I test the whole object?

If you want to be on the safe side, you should probably have at least one test which checks that each property was matched.

What about the naming of my test?

There are quite a few discussions on this topic, such as this one. The general rule is that you would have multiple methods in your unit test class, each aimed at testing something specific. In your case, it might be things like:

public void Check_All_Properties_Parsed_Correctly(){.....}

public void Exception_Thrown_If_Lines_Is_Null(){.....}

public void Exception_Thrown_If_Lines_Is_Wrong_Length(){.....}

So, in other words, testing for the exact behaviour that you consider "correct" for the parser. Once this is done, you will feel much more at ease when making changes to the parser code, because you will have a comprehensive test suite to check that you didn't break anything. Remember to actually test often, and to keep your tests updated when you make changes! There's a fairly good guide about unit testing and Test Driven Development on MSDN.

In general, I think you can find answers to most of your questions by googling a bit. There are also several excellent books on Test Driven Development, which will drive home not only the how of TDD, but the why. If you are relatively programming language agnostic, I would recommend Kent Beck's Test Driven Development By Example, otherwise something like Test-Driven Development in Microsoft .NET. These should get you on the right track very quickly.

EDIT:

Am I overspecifying my tests?

In my opinion, yes. Specifically, I don't agree with your next line:

If I test all properties within one class I loss this information.

In what way do you lose information exactly? Let's say there are 2 ways to do this test, other than having a new class per test:

Have different methods for each property. Your test methods could be called CheckPropertyX, CheckPropertyY, etc. When you run your tests, you will see exactly which fields passed and which fields failed. This clearly satisfies your requirements, although I would say it's still overkill. I would go with option 2:
Have a few different methods, each testing one specific aspect. This is what I originally recommended, and I think what you are referring to. When one of the tests fails, you will only get information about the first thing that failed, per method, but if you coded your Assert nicely, you will know exactly which property is incorrect. Consider the following code:

Assert.AreEqual("test1", myObject.PropertyX, "Property X was incorrectly parsed"); Assert.AreEqual("test2", myObject.PropertyY, "Property Y was incorrectly parsed");

When one of those fails, you will know which line failed. When you have fixed the relevant error, and re-run your tests, you will see if any other properties have failed. This is generally the approach that most people take, because creating a class or even method per property results in too much code, and too much work to keep up to date.

Checani answered 24/8, 2011 at 5:53 Comment(2)

OK, I responded to your edit, my recommendation would be to at most create a new method per property, to test that property. – Checani 30/8, 2011 at 6:39

"In what way do you lose information exactly? "Machine.Specification allows to generate HTML reports. As you can see my tests begin with "When_..." So the class name is a part of the specification. If I would write a class for each property I would have a specific specification which contains which index is used to get a specific property. – Selfeffacing 30/8, 2011 at 19:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags