pandas to_csv: suppress scientific notation in csv file when writing pandas to csv
Asked Answered
S

3

23

I am writing a pandas df to a csv. When I write it to a csv file, some of the elements in one of the columns are being incorrectly converted to scientific notation/numbers. For example, col_1 has strings such as '104D59' in it. The strings are mostly represented as strings in the csv file, as they should be. However, occasional strings, such as '104E59', are being converted into scientific notation (e.g., 1.04 E 61) and represented as integers in the ensuing csv file.

I am trying to export the csv file into a software package (i.e., pandas -> csv -> software_new) and this change in data type is causing problems with that export.

Is there a way to write the df to a csv, ensuring that all elements in df['problem_col'] are represented as string in the resulting csv or not converted to scientific notation?

Here is the code I have used to write the pandas df to a csv:

df.to_csv('df.csv', encoding='utf-8')

I also check the dtype of the problem column:

for df.dtype, df['problem_column'] is an object
Shane answered 10/4, 2014 at 18:9 Comment(0)
K
22

For python 3.xx (Python 3.7.2)&

In [2]: pd.__version__ Out[2]: '0.23.4':

Options and Settings

For visualization of the dataframe pandas.set_option

import pandas as pd #import pandas package

# for visualisation fo the float data once we read the float data:

pd.set_option('display.html.table_schema', True) # to can see the dataframe/table as a html
pd.set_option('display.precision', 5) # setting up the precision point so can see the data how looks, here is 5
df = pd.DataFrame(np.random.randn(20,4)* 10 ** -12) # create random dataframe

Output of the data:

df.dtypes # check datatype for columns

[output]:
0    float64
1    float64
2    float64
3    float64
dtype: object

Dataframe:

df # output of the dataframe

[output]:
0   1   2   3
0   -2.01082e-12    1.25911e-12 1.05556e-12 -5.68623e-13
1   -6.87126e-13    1.91950e-12 5.25925e-13 3.72696e-13
2   -1.48068e-12    6.34885e-14 -1.72694e-12    1.72906e-12
3   -5.78192e-14    2.08755e-13 6.80525e-13 1.49018e-12
4   -9.52408e-13    1.61118e-13 2.09459e-13 2.10940e-13
5   -2.30242e-13    -1.41352e-13    2.32575e-12 -5.08936e-13
6   1.16233e-12 6.17744e-13 1.63237e-12 1.59142e-12
7   1.76679e-13 -1.65943e-12    2.18727e-12 -8.45242e-13
8   7.66469e-13 1.29017e-13 -1.61229e-13    -3.00188e-13
9   9.61518e-13 9.71320e-13 8.36845e-14 -6.46556e-13
10  -6.28390e-13    -1.17645e-12    -3.59564e-13    8.68497e-13
11  3.12497e-13 2.00065e-13 -1.10691e-12    -2.94455e-12
12  -1.08365e-14    5.36770e-13 1.60003e-12 9.19737e-13
13  -1.85586e-13    1.27034e-12 -1.04802e-12    -3.08296e-12
14  1.67438e-12 7.40403e-14 3.28035e-13 5.64615e-14
15  -5.31804e-13    -6.68421e-13    2.68096e-13 8.37085e-13
16  -6.25984e-13    1.81094e-13 -2.68336e-13    1.15757e-12
17  7.38247e-13 -1.76528e-12    -4.72171e-13    -3.04658e-13
18  -1.06099e-12    -1.31789e-12    -2.93676e-13    -2.40465e-13
19  1.38537e-12 9.18101e-13 5.96147e-13 -2.41401e-12

And now write to_csv using the float_format='%.15f' parameter

df.to_csv('estc.csv',sep=',', float_format='%.15f') # write with precision .15

file output:

,0,1,2,3
0,-0.000000000002011,0.000000000001259,0.000000000001056,-0.000000000000569
1,-0.000000000000687,0.000000000001919,0.000000000000526,0.000000000000373
2,-0.000000000001481,0.000000000000063,-0.000000000001727,0.000000000001729
3,-0.000000000000058,0.000000000000209,0.000000000000681,0.000000000001490
4,-0.000000000000952,0.000000000000161,0.000000000000209,0.000000000000211
5,-0.000000000000230,-0.000000000000141,0.000000000002326,-0.000000000000509
6,0.000000000001162,0.000000000000618,0.000000000001632,0.000000000001591
7,0.000000000000177,-0.000000000001659,0.000000000002187,-0.000000000000845
8,0.000000000000766,0.000000000000129,-0.000000000000161,-0.000000000000300
9,0.000000000000962,0.000000000000971,0.000000000000084,-0.000000000000647
10,-0.000000000000628,-0.000000000001176,-0.000000000000360,0.000000000000868
11,0.000000000000312,0.000000000000200,-0.000000000001107,-0.000000000002945
12,-0.000000000000011,0.000000000000537,0.000000000001600,0.000000000000920
13,-0.000000000000186,0.000000000001270,-0.000000000001048,-0.000000000003083
14,0.000000000001674,0.000000000000074,0.000000000000328,0.000000000000056
15,-0.000000000000532,-0.000000000000668,0.000000000000268,0.000000000000837
16,-0.000000000000626,0.000000000000181,-0.000000000000268,0.000000000001158
17,0.000000000000738,-0.000000000001765,-0.000000000000472,-0.000000000000305
18,-0.000000000001061,-0.000000000001318,-0.000000000000294,-0.000000000000240
19,0.000000000001385,0.000000000000918,0.000000000000596,-0.000000000002414

And now write to_csv using the float_format='%f' parameter

df.to_csv('estc.csv',sep=',', float_format='%f') # this will remove the extra zeros after the '.'

For more details check pandas.DataFrame.to_csv

Kendre answered 16/8, 2018 at 12:28 Comment(2)
Hello, I am trying to do a similar task but I want to avoid save it in the general format without decimals or scientific notation. I have digit which are like 8034109298000000000 and want to preserve this format. Can you help me out? ThanksAddiction
is a mix of dtypes in cols ?Kendre
S
11

Use the float_format argument:

In [11]: df = pd.DataFrame(np.random.randn(3, 3) * 10 ** 12)

In [12]: df
Out[12]:
              0             1             2
0  1.757189e+12 -1.083016e+12  5.812695e+11
1  7.889034e+11  5.984651e+11  2.138096e+11
2 -8.291878e+11  1.034696e+12  8.640301e+08

In [13]: print(df.to_string(float_format='{:f}'.format))
                     0                     1                   2
0 1757188536437.788086 -1083016404775.687134 581269533538.170288
1  788903446803.216797   598465111695.240601 213809584103.112457
2 -829187757358.493286  1034695767987.889160    864030095.691202

Which works similarly for to_csv:

df.to_csv('df.csv', float_format='{:f}'.format, encoding='utf-8')
Spanner answered 11/4, 2014 at 7:38 Comment(3)
Doesn't seem to work as of pandas 0.17.1: TypeError: unsupported operand type(s) for %: 'builtin_function_or_method' and 'float'Tillietillinger
@user1637894 still works for me with 0.17.1 :s. Tested on python 2.7 and 3.4 with a few different numpy versions.Spanner
@user1637894 I recommend posting your issue on pandas' github!Spanner
M
0

If you would like to use the values as formated string in a list, say as part of csvfile csv.writier, the numbers can be formated before creating a list:

with open('results_actout_file','w',newline='') as csvfile:
     resultwriter = csv.writer(csvfile, delimiter=',')
     resultwriter.writerow(header_row_list)

     resultwriter.writerow(df['label'].apply(lambda x: '%.17f' % x).values.tolist())
Manas answered 4/12, 2017 at 17:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.