I have a csv/log file of 35=S (Quote messages; "Tag=Value") and I need to extract the rates into a proper CSV file for data mining. This is not strictly FIX related, it's more of a R related question on how to clean a dataset.
The raw messages look something like this:
190=1.1204 ,191=-0.000029,193=20141008,537=0 ,631=1.12029575,642=0.000145,10=56
190=7.20425,191=0.000141 ,537=0 ,631=7.2034485,10=140 , ,
190=1.26237,191=0 ,537=1 ,10=068 , , ,
I need first to get to an intermediate data set that looks like this, where the same tags are aligned.
190=1.1204 ,191=-0.000029,193=20141008,537=0,631=1.12029575,642=0.000145,10=56
190=7.20425,191=0.000141 , ,537=0,631=7.2034485 , ,10=140
190=1.26237,191=0 , ,537=1, , ,10=068
which in turn will need to be converted to this:
190 ,191 ,193 ,537,631 ,642 ,10
1.1204 ,-0.000029,20141008,0 ,1.12029575,0.000145,56
7.20425,0.000141 , ,0 ,7.2034485 , ,140
1.26237,0 , ,1 , , ,068
I'm in the midst of developing a bash script with awk but I wonder if I can do that in R. At present, my greatest challenge is arriving to the intermediate table. From the intermediate to the final table I thought of using the R with the tidyr package, specifically function 'separate'. If anybody can suggest a better logic, I'll greatly appreciate!