How to use pandas to_csv float_format?
Asked Answered
P

1

17

I am reading from a data file that has 8 precision, then after interpolating some values I am saving them like where the float_format option is not working,

df.to_csv('data.dat',sep=' ', index=False, header=False, float_format="%.8f")

and the result file looks like

0.02506602 0.05754493 0.36854688
0.02461631 0.0599653 0.43078098
0.02502534 0.06209149 0.44955311
0.4267356675182389 0.1718682822340447 0.5391386354945895
0.426701667727433 0.17191008887193007 0.5391897818631616
0.4266676661681287 0.17195189807522643 0.5392409104354972

The first 3 lines were in data file and next 3 are the new interpolated values. I want all the values to be of same length. Whats going wrong here and how do I fix it?

Also: It would be nice if I can control the float precision differently for different columns.

Peugia answered 23/7, 2018 at 10:31 Comment(0)
L
8

Your code looks fine. Most likely, there is an issue with your input data. Use pd.DataFrame.dtypes to check all your input series are of type float. If they aren't convert to float via:

df[col_list] = df[col_list].apply(pd.to_numeric, downcast='float').fillna(0)

Here's a working example:

from io import StringIO
import pandas as pd

mystr = StringIO("""0.02506602 0.05754493 0.36854688
0.02461631 0.0599653 0.43078098
0.02502534 0.06209149 0.44955311
0.4267356675182389 0.1718682822340447 0.5391386354945895
0.426701667727433 0.17191008887193007 0.5391897818631616
0.4266676661681287 0.17195189807522643 0.5392409104354972""")

df = pd.read_csv(mystr, delim_whitespace=True, header=None)

print(df.dtypes)

# 0    float64
# 1    float64
# 2    float64
# dtype: object

file_loc = r'C:\temp\test.dat'
df.to_csv(file_loc, sep=' ', index=False, header=False, float_format="%.8f")

df = pd.read_csv(file_loc, delim_whitespace=True, header=None)

print(df[0].iloc[-1])

# 0.42666767
Limoges answered 23/7, 2018 at 11:53 Comment(7)
well somwhere along the way I have used a code df.loc[df[col1]==some_value]='' that messed up everythingPeugia
@Eular, Yes, that could be it. Not sure why you'd add empty strings to numeric data. Use np.nan instead and you might have better luck.Limoges
print an empty line in some places - that's non-trivial (and inefficient). I strongly advise against. I think you probably need to provide a minimal reproducible example. Because (as you can see from my example), it's not straightforward to reproduce your problem.Limoges
Ok, float_format working now. Thanks. Can you set the precision as 2 point for 1st column and 8 point for later 2?Peugia
@Eular, I'm not sure this is possible with to_csv. You may wish to start a new question.Limoges
well, I had one integer column and I wanted to write that as integers but when I use np.nan I can't keep the column as integer. Thats why I am trying different precision format for different columns. Also if I use round() then some prints in scientific e notation I also don't want that. So, my best shot would be using round() without introducing e notation.Peugia
It's a common problem. nan is considered a float, but there isn't a substitute for int type. Best, in my opinion, to leave as float, or some integer (e.g. -1) which you know is invalid data. Definitely not a good idea to start rounding and playing around with object type.Limoges

© 2022 - 2024 — McMap. All rights reserved.