numpy loadtxt skip first row
Asked Answered
I

3

20

I have a small issue when I'm trying to import data from CSV files with numpy's loadtxt function. Here's a sample of the type of data files I have.

Call it 'datafile1.csv':

# Comment 1
# Comment 2
x,y,z 
1,2,3
4,5,6
7,8,9
...
...
# End of File Comment

The script that I thought would work for this situation looks like:

import numpy as np
FH = np.loadtxt('datafile1.csv',comments='#',delimiter=',',skiprows=1)

But, I'm getting an error:

ValueError: could not convert string to float: x

This tells me that the kwarg 'skiprows' is not skipping the header, it's skipping the first row of comments. I could simply make sure that skiprows=3, but the complication is that I have a very large number of files, which don't all necessarily have the same number of commented lines at the top of the file. How can I make sure that when I use loadtxt I'm only getting the actual data in a situation like this?

P.S. - I'm open to bash solutions, too.

Indiscernible answered 17/6, 2013 at 15:25 Comment(1)
I should also add that I've tried various solutions in python to parse each line for either a comment or a character, but quickly realized nothing of this nature could possibly work because loadtxt is failing at the very beginning.Indiscernible
W
35

Skip comment line manually using generator expression:

import numpy as np

with open('datafile1.csv') as f:
    lines = (line for line in f if not line.startswith('#'))
    FH = np.loadtxt(lines, delimiter=',', skiprows=1)
Whitt answered 17/6, 2013 at 15:31 Comment(0)
R
3

Create your own custom filter function, such as:

def skipper(fname):
    with open(fname) as fin:
        no_comments = (line for line in fin if not line.lstrip().startswith('#'))
        next(no_comments, None) # skip header
        for row in no_comments:
            yield row

a = np.loadtxt(skipper('your_file'), delimiter=',')
Robtrobust answered 17/6, 2013 at 15:46 Comment(0)
W
1
def skipper(fname, header=False):
    with open(fname) as fin:
        no_comments = (line for line in fin if not line.lstrip().startswith('#'))
        if header:
            next(no_comments, None) # skip header
        for row in no_comments:
            yield row

a = np.loadtxt(skipper('your_file'), delimiter=',')

This is just a little modification of @Jon Clements's answer by adding an optional parameter "header", given that in some cases, the csv file has comment lines (starts with #) but doesn't have the header row.

Wrong answered 23/1, 2019 at 7:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.