Changing strings to floats in an imported .csv [duplicate]
Asked Answered
C

5

21

Quick question for an issue I haven't managed to solve quickly:

I'm working with a .csv file and can't seem to find a simple way to convert strings to floats. Here's my code,

import csv

def readLines():
    with open('testdata.csv', 'rU') as data:
        reader = csv.reader(data)
        row = list(reader)
        for x in row:
            for y in x:
                print type(float(y)),
readLines()

As you can see, it will currently print the type of every y element in x set of lists in the variable row; this produces a long list of "<type 'float'>". But this doesn't actually change each element to a float, nor does setting the for loop to execute float(y) (a type test returns 'string' for each element) work either.

I also tried literal_eval, but that failed as well. The only way to change the list elements to floats is to create a new list, either with list comprehension or manually, but that loses the original formatting of each list (as lists of a set amount of elements within one larger list).

I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"

Thanks in advance to those courteous/knowledgeable enough to help.

Cartridge answered 18/9, 2013 at 16:26 Comment(3)
type does not change the type of a variable, it just returns the type of the variable.once you convert the variable to float you need to assign it in-placeSeptempartite
maybe you want y = float(y)Culmiferous
First you seemed to ask "How to convert strings to floats, on csv import", then you morphed it to 'I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"' These are seriously different questions (see my answer for real-world examples why), and the answer if you specifically constrain us to using import csv is different to using pandas. Essentially, the builtin csv module is broken and should not be used on any non-toy dataset containing one or more text or categorical fields. Use pandas.Kuykendall
K
22

You are correct that Python's builtin csv module is very primitive at handling mixed data-types, does all its type conversion at import-time, and even at that has a very restrictive menu of options, which will mangle most real-world datasets (inconsistent quoting and escaping, missing or incomplete values in Booleans and factors, mismatched Unicode encoding resulting in phantom quote or escape characters inside fields, incomplete lines will cause exception). Fixing csv import is one of countless benefits of pandas. So, your ultimate answer is indeed stop using builtin csv import and start using pandas. But let's start with the literal answer to your question.

First you asked "How to convert strings to floats, on csv import". The answer to that is to open the csv.reader(..., quoting=csv.QUOTE_NONNUMERIC) as per the csv doc

csv.QUOTE_NONNUMERIC: Instructs the reader to convert all non-quoted fields to type float.

That works if you're ok with all unquoted fields (integer, float, text, Boolean etc.) being converted to float, which is generally a bad idea for many reasons (missing or NA values in Booleans or factors will get silently squelched). Moreover it will fail (throw exception) on unquoted text fields obviously. So it's brittle and needs to be protected with try..catch.

Then you asked: 'I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"' to which the crappy csv.reader solution is to open with csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)

But as @geoffspear correctly replied 'The answer to your "overall question" may be "Pandas", although it's a bit vague.'

Kuykendall answered 29/8, 2016 at 2:25 Comment(2)
coming back to this after a LONG time, some of which has been spent using pandas, and this is the answer that I think others asker's should see first.Cartridge
Thanks @userNaN. Yes that's distilled from 10 years of painful experience doing CSV data import in Python, csv builtin, pandas, R, Excel and other languages/packages. It becomes infinitely more painful when you use quoted, escaped and/or Unicode fields, as well as NAs. csv simply breaks and you have to find something better.Kuykendall
E
5

Try something like the following

import csv

def read_lines():
    with open('testdata.csv', 'rU') as data:
        reader = csv.reader(data)
        for row in reader:
            yield [ float(i) for i in row ]

for i in read_lines():
    print(i)

# to get a list, instead of a generator, use
xy = list(read_lines())

As for the easiest way, then I suggest you see the xlrd, xlwt modules, personally I always have hard time with all the varying CSV formats.

Eddra answered 18/9, 2013 at 16:30 Comment(0)
B
4

When converting a bunch of strings to floats, you should use a try/except to catch errors:

def conv(s):
    try:
        s=float(s)
    except ValueError:
        pass    
    return s

print [conv(s) for s in ['1.1','bls','1','nan', 'not a float']] 
# [1.1, 'bls', 1.0, nan, 'not a float']

Notice that the strings that cannot be converted are simply passed through unchanged.

A csv file IS a text file, so you should use a similar functionality:

def readLines():
    def conv(s):
        try:
            s=float(s)
        except ValueError:
            pass    
        return s

    with open('testdata.csv', 'rU') as data:
        reader = csv.reader(data)
        for row in reader:
            for cell in row:
                y=conv(cell)
              # do what ever with the single float
         # OR
         # yield [conv(cell) for cell in row]  if you want to write a generator...    
Bilbe answered 18/9, 2013 at 16:32 Comment(0)
C
0

this helped me

import csv
def readLines():
    with open('sample.csv', 'rU') as data:
        reader = csv.reader(data)
        row = list(reader)
        for x in row:
            for y in x:
                t = float(y)
                print (type(t))
readLines()
Condescending answered 19/12, 2022 at 14:40 Comment(0)
C
-2
for y in x:
                print type(float(y)),

float(y) takes the value of y and returns a float based on it. It does not modify y- it returns a new object.

y = float(y)

is more like what you are looking for- you have to modify the objects.

Carminacarminative answered 18/9, 2013 at 16:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.