My question involves how to skip metadata in the beginning of a file when importing data into R. My data is in .txt format where the first lines are metadata describing the data and these need to be filtered out. Below is a minimal example of the data frame in tab delimited format:
Type=GenePix Export
DateTime=2010/03/04 16:04:16
PixelSize=10
Wavelengths=635
ImageFiles=Not Saved
NormalizationMethod=None
NormalizationFactors=1
JpegImage=
StdDev=Type 1
FeatureType=Circular
Barcode=
BackgroundSubtraction=LocalFeature
ImageOrigin=150, 10
JpegOrigin=150, 2760
Creator=GenePix Pro 7.2.29.002
var1 var2 var3 var4 var5 var6 var7
1 1 1 molecule1 1F3 400 4020
1 2 1 molecule2 1B5 221 4020
1 3 1 molecule3 1H5 122 2110
1 4 1 molecule4 1D1 402 2110
1 5 1 molecule5 1F1 600 4020
I could use the basic command shown below if I know the line that the actual data starts from:
mydata <- read.table("mydata.txt",header=T, skip=15)
Which would return;
mydata
var1 var2 var3 var4 var5 var6 var7
1 1 1 1 molecule1 1F3 400 4020
2 1 2 1 molecule2 1B5 221 4020
3 1 3 1 molecule3 1H5 122 2110
4 1 4 1 molecule4 1D1 402 2110
5 1 5 1 molecule5 1F1 600 4020
The problem is that I need to write a script that can read various datasets where the row number where the actual data starts from varies from one
data set to another. I could imagine using something like the sqldf
package but I am not quite familiar with sql.
Any assistance would be greatly appreciated.