I'd like to use .ftr files to quickly analyze hundreds of tables. Unfortunately I have some problems with decimal and thousands separator, similar to that post, just that read_feather does not allow for decimal=',', thousands='.'
options. I've tried the following approaches:
df['numberofx'] = (
df['numberofx']
.apply(lambda x: x.str.replace(".","", regex=True)
.str.replace(",",".", regex=True))
resulting in
AttributeError: 'str' object has no attribute 'str'
when I change it to
df['numberofx'] = (
df['numberofx']
.apply(lambda x: x.replace(".","").replace(",","."))
I receive some strange (rounding) mistakes in the results, like 22359999999999998 instead of 2236 for some numbers that are higher than 1k. All below 1k are 10 times the real result, which is probably because of deleting the "." of the float and creating an int of that number.
Trying
df['numberofx'] = df['numberofx'].str.replace('.', '', regex=True)
also leads to some strange behavior in the results, as some numbers are going in the 10^12 and others remain at 10^3 as they should.
Here is how I create my .ftr files from multiple Excel files. I know I could simply create DataFrames from the Excel files but that would slowdown my daily calculations to much.
How can I solve that issue?
EDIT: The issue seems to come from reading in an excel file as df with non US standard regarding decimal and thousands separator and than saving it as feather. using pd.read_excel(f, encoding='utf-8', decimal=',', thousands='.')
options for reading in the excel file solved my issue. That leads to the next question:
why does saving floats in a feather file lead to strange rounding errors like changing 2.236 to 2.2359999999999998?
1.000.000,10
to1000000.10
on yournumberofx
column ? – Yaakovfloat
and notstring
type ? right ? to see type of column execute that line -->df.dtypes['numberofx']
– Yaakovobject
, including the decimal and thousands separator sign. – Morrissey