Auditing and validating changes to C# class and structure properties in a high-performance way
Asked Answered
T

1

7

I have several c# structs that give shape to structures in a very large data file. These structs interpret bits in the file's data words, and convert them to first-class properties. Here is an example of one:

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct TimeF1_MsgDayFmt
{
    // Time Data Words
    public UInt16 TDW1;
    public UInt16 TDW2;
    public UInt16 TDW3;

    /// <summary>
    /// Tens of milliseconds
    /// </summary>
    public UInt16 Tmn
    {
        // Bits.Get is just a helper method in a static class
        get { return Bits.Get(TDW1, 0, 4); }
        set 
        {
            if (value > 9)
                throw new ArgumentOutOfRangeException();

            TDW1 = Bits.Set(TDW1, value, 0, 4); 
        }
    }

    /// Several other properties follow.

I need to do two things, which I think are related. The first is to have the ability to validate the entire class, using a collection of validation rules. I know there are several ways to do this; the one that most appeals to me is to annotate each property with something like this:

[ValidateRange(0,9)]
public UInt16 Tmn
{
    get { return Bits.Get(TDW1, 0, 4); }
    set 
    {
    /// etc.  Will probably no longer throw the ArgumentOutOfRangeException here.

... and then use a Validator class to read all of the property attributes, check each property value against the annotated rule(s), and return a collection of error objects. But I am concerned with how long the Reflection is going to take; these structures have to be extremely high-performing.

public List<Error> Validate(TimeF1_MsgDayFmt original)

The second thing I need to do is to perform auditing on property changes; that is, for each property that has changed from its original value, I need to be able to get a string that says "Property foo changed from bar to baz." To do that, I'd like a way to compare all properties of a "before" and "after" struct, and note the differences.

public List<string> Compare(TimeF1_MsgDayFmt original, TimeF1_MsgDayFmt new)

In both cases, the code will involve iterating over all of the properties and examining each one individually, in a way that is as fast as possible.

How would I approach this?

Tangram answered 10/10, 2011 at 19:59 Comment(19)
Is code generation of these structures taboo?Exerciser
@sixlettervariables Not necessarily. I've seen some examples like this one, where a proxy object is being generated, but I haven't been able to wrap my mind around it yet.Tangram
Surely the data structures don't change from moment to moment. Are you insisting that all this checking happen at runtime? If the checking happened off-line, you might care a lot less about how fast it runs.Whaler
@IraBaxter: I'm not sure what you mean. The data files are very large; the validation time of a single structure may not be especially meaningful, but if there are millions of them...Tangram
You have a program that reads/writes the data files with whatever structures you have. Does that program have to "validate" the structures? Why? Can't you have a separate tool to validate the structures? [You really have millions of distinct structure types??? Maybe you mean you have millions of instances of a small number of types. Can you elaborate? ]Whaler
Can I assume that we are on .Net 4?Woadwaxen
@svick: 3.5. I updated the tags.Tangram
@IraBaxter: Does that program have to "validate" the structures? Why? Because they could be invalid in the source data file. Can't you have a separate tool to validate the structures? That's the tool I am building. Maybe you mean you have millions of instances of a small number of types. Yes, that's what I mean. I added a bit more clarification to the top of my question.Tangram
Too bad, I think .Net 4's Expressions would be ideal for this. In 3.5, you could either use T4 during compile time or CodeDOM or Reflection.Emit during runtime.Woadwaxen
Are you trying to validate the structures against the actual data? If not, why does the validator tool have to be fast? If so, why are you doing that? If so, where'd the [ValidateRange(0,9)] constraint from?Whaler
Are you trying to validate the structures against the actual data? Yes, to prove the data is correct. The ValidateRange(0,9) attribute is a rule I added that complies with a specification for those particular bits; it doesn't have an implementation yet.Tangram
Aha. I completely misunderstood; I thought you were trying to check the structure declarations themselves were somehow self consistent. So you want to use the structures to define the "shape" of the data, and you want some kind of additional set of constraints associated with each structure, to be used to check that the data matches the constraints. This is a lot like validating that an XML file satisfies an XML schema. Now I understand why it has to be fast.Whaler
Regarding "changes": you mean changes to data field values from one data instance to the next? (If the data varies a lot, this would produce vast amounts of data; is there a reason to belive the data changes "slowly" from one record instance to the next?)Whaler
@IraBaxter: No, it will change for each instance. This particular example is a time record, so it will have a new value each and every time it is encountered. I don't need a log of these changes; what I do need a log of are the changes that occur when a data correction to a record takes place; this will not happen on every record. It might only happen to a handful of records, or to none at all.Tangram
And how would you recognize a "data correction" as opposed to another value change (say a time stamp change)? Can you describe the kind of data in this data stream?Whaler
@IraBaxter: The program will perform the corrections. It's not as straightforward as it sounds; the data actually runs in several pipelines, each running on its own thread; the pipelines each have a filter class that performs the data corrections. The filter will validate each structure, and then decide which corrections to implement. These corrections must be written to a log, and that log writing might not occur in the filter; it might occur downstream from it. That's why I need a way to do a diff.Tangram
The more I hear, the more puzzled I get. If you already have a filter somewhere else that makes the changes, that filter knows it made the change. Why would you want to spend the computational energy ("needs to be high performance") to rediscover such changes? Is the filter you are talking about, the first half of your question, that checks for sanity? If so, if such a filter says the data record is wrong (boolean "false"), how does it know what repair to make? (e.g., a filter might insist that two fields add to 17, but they don't; which field gets changed? Why?)Whaler
@IraBaxter: The filter is not necessarily under my developmental control; it is designed to be written by a third-party, and is dynamically instantiated depending on the structure type being filtered. Each structure type gets its own pipeline/filter. I'm trying to design it so that the writer of the filter has to do as little as possible; it is likely that, in some cases, logging of the changes will be the most complex part of the code, if it resides in the filter. (cont)Tangram
@IraBaxter (cont) The validation is not just a "go/no go" test. The validation method will return a collection, each item of which will contain a file offset where the error occurred, the property in error, the expected value, the actual value, and an error message. That's enough information for the filter (given some direction from an embedded GUI, and perhaps some previous and next data values) to make some educated guesses about what needs to be corrected.Tangram
W
2

If the issue is, Does the data read into a struct match additional constraints, you will first of all have to figure out how to write down such constraints. (In your example, you wrote [ValidateRange(0,9)] as one such constraint).

Once you have a means to write down the constraints, then you have to presumably read data into corresponding struct and then check the constraints. (In your example, you suggested the idea of using reflection).

It seems to me the easiest way to write down such constraints, that executes fast, is to simply write them as additional C# code. For each constraint you can imagine, you can add a method of the class that checks the constraint and returns a boolean. You can add a standard constraint "CheckIt" that computes a conjunction of all the individual constraint methods.

It should be easy to write the constraints. I'd expect the C# compiler to inline the other methods, especially if they are small and take no arguments (other than the implied class). That should make them fast.

If the C# compiler won't do it, switch to C++, where you can pretty much force inlining.

Whaler answered 11/10, 2011 at 0:9 Comment(1)
Yes, that's the conclusion I am moving towards as well. I think I've figure out a solution to the Reflection performance problem, see here: fasterflect.codeplex.com, that might prove useful. The problem I'm having is code brevity; with logging, validation result collection, and a delegate/event that is needed to update the main UI, each validation method is going to be 6+ lines of code. I was hoping for more like one line of code for each constraint. I may add an Assert method to my logging class that takes a Func<bool>, and let my logger handle it all.Tangram

© 2022 - 2024 — McMap. All rights reserved.