Precision lost while using read_csv in pandas

About

Asked 28/4, 2016 at 8:35 Answered 28/4, 2016 at 8:40

Solved python csv pandas numpy floating-accuracy

I have files of the below format in a text file which I am trying to read into a pandas dataframe.

895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|

As you can see there are 10 integers after the floating point in the input file.

df = pd.read_csv('mockup.txt',header=None,delimiter='|')

When I try to read it into dataframe, I am not getting the last 4 integers

df[5].head()

0    0.467798
1    0.258165
2    0.860384
3    0.803388
4    0.249820
Name: 5, dtype: float64

How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.

I figured out that I have to do something about dtype but I am not sure where I should use it.

Wampler answered 28/4, 2016 at 8:35 Comment(0)

It is only display problem, see docs:

#temporaly set display precision
with pd.option_context('display.precision', 10):
    print df

     0          1   2      3   4             5            6             7   \
0  895  2015-4-23  19  10000  LA  0.4677978806  0.477346934  0.4089938425   

             8             9            10            11  12  
0  0.8224291972  0.8652525793  0.682994286  0.5139162227 NaN

EDIT: (Thank you Mark Dickinson):

Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip' to read_csv fixes this. See the documentation for more.

Expostulate answered 28/4, 2016 at 8:40 Comment(7)

Thanks. Had one other rookie question. Is there any recommendation in general for faster loading into data frame while using read_csv() when data is mostly floating point values. – Wampler 28/4, 2016 at 8:52

I think you can try set dtypes, see. – Expostulate 28/4, 2016 at 8:56

It may be worth noting that this isn't purely a display problem, in the sense that if you use Pandas to write out a dataframe to a CSV file and then read it back in again, you can end up with small floating-point errors in the result: Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip' to read_csv fixes this. See the documentation for more. – Creatural 28/4, 2016 at 9:17

@Mark Dickinson Thank you very much for comment, I add it to answer. – Expostulate 28/4, 2016 at 9:19

@MarkDickinson my notebook kernel dies once I set float_precision='round_trip' – Hooded 21/4, 2017 at 19:32

This solves my issue! For some reason float_precision=high doesn't work but float_precision=round_trip works – Isomagnetic 17/5, 2018 at 23:49

@Isomagnetic I also see that was the only way to fix the output as well, using Pandas v1.3.4. But my saying that is somewhat of a deceit, because now the resulting df.dtypes shows the columns have a type of object, not float. – Abeyant 6/9, 2022 at 18:58

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags