I came across this very similar question but that question is tagged QuickFIX (which is not relevant to my question) and most of the answers are QuickFIX-related.
My question is broader. I'm looking for the most efficient way to parse a FIX Protocol message using C#. By way of background, a FIX message consists of a series of tag/value pairs separated by the ASCII <SOH>
character (0x01). The number of fields in a message is variable.
An example message might look like this:
8=FIX.4.2<SOH>9=175<SOH>35=D<SOH>49=BUY1<SOH>56=SELL1<SOH>34=2482<SOH>50=frg<SOH>
52=20100702-11:12:42<SOH>11=BS01000354924000<SOH>21=3<SOH>100=J<SOH>55=ILA SJ<SOH>
48=YY77<SOH>22=5<SOH>167=CS<SOH>207=J<SOH>54=1<SOH>60=20100702-11:12:42<SOH>
38=500<SOH>40=1<SOH>15=ZAR<SOH>59=0<SOH>10=230<SOH>
For each field, the tag (an integer) and the value (for our purposes, a string) are separated by the '=' character. (The precise semantics of each tag are defined in the protocol, but that isn't particularly germane to this question.)
It's often the case that when doing basic parsing, you are only interested in a handful of specific tags from the FIX header, and not really doing random access to every possible field. Strategies I have considered include:
Using
String.Split
, iterating over every element and putting the tag to index mapping in a Hashtable - provides full random-access to all fields if needed at some point(Slight optimisation) Using
String.Split
, scanning the array for tags of interest and putting the tag to index mapping into another container (not necessarily a Hashtable as it may be a fairly small number of items, and the number of items is known prior to parsing)Scanning the message field by field using
String.IndexOf
and storing the offset and length of fields of interest in an appropriate structure
Regarding the first two - although my measurements indicate String.Split
is pretty fast, as per the documentation the method allocates a new String for each element of the resultant array which can generate a lot of garbage if you're parsing a lot of messages. Can anyone see a better way to tackle this problem in .NET?
EDIT:
Three vital pieces of information I left out:
Tags are not necessarily unique within FIX messages, i.e., duplicate tags can occur under certain circumstances.
Certain types of FIX fields can contain 'embedded
<SOH>
' in the data - these tags are referred to as being of type 'data' - a dictionary lists the tag numbers that are of this type.The eventual requirement is to be able to edit the message (particularly replace values).