How can I convert a string with dot and comma into a float in Python
Asked Answered
W

10

155

How can I convert a string like 123,456.908 to float 123456.908 in Python?


For ints, see How to convert a string to a number if it has commas in it as thousands separators?, although the techniques are essentially the same.

Widget answered 9/7, 2011 at 7:59 Comment(3)
The proper way to do this is to use the locale module - everything else is just a very nasty hack that will get you into trouble in the future.Forelimb
The proper localeway is also a very good way to shoot you in the foot if you plan to use your program on several operating systems (like Windows and flavors of Linux), which have different locale formats or even might need you to install a locale that supports your chosen format...Workout
I updated my old locale-based answer to explain these issues comprehensively.Malloch
T
198

Just remove the , with replace():

float("123,456.908".replace(',',''))
Trophoblast answered 9/7, 2011 at 8:2 Comment(7)
this depends on locale. 9 988 776,65 € in France 9.988.776,65 € in Germany $9,988,776.65 in the United StatesAfrikah
If you are writing constants in the source code and want to use commas to make it more readable then this is the way to go rather than locale which would then make the code fail in another location. python cd_size = float("737,280,000".replace(',','')) (I actually used int)Wessels
If you are writing a string literal constant in the source code and then explicitly converting it to integer or float, that's a sign of something wrong with the design. But even if it can be defended - just temporarily set the locale to the one the code is written in, for that context, and then restore the context appropriate to your users when handling user input. That's why there is setlocale in the first place.Malloch
do not do this, it allows for things like "123,,,,345"Vudimir
This extremely dangerous replacement would fail in half of the world, the other half being China. And the US. Europe, Africa, Russia, South America, use , as the decimal separatorSwirly
@Wessels PEP 515 introduced underscores for numeric literals. In Python 3.6 you can just write 737_280_000 instead of doing string manipulations to make things more readable.Serendipity
Thank @Serendipity I think PEP-515 is a better solution if you have control over the text. In my case I didn't as was getting someone elses input.Wessels
M
209

Using the localization services

The default locale

The standard library locale module is Python's interface to C-based localization routines.

The basic usage is:

import locale
locale.atof('123,456')

In locales where , is treated as a thousands separator, this would return 123456.0; in locales where it is treated as a decimal point, it would return 123.456.

However, by default, this will not work:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/locale.py", line 326, in atof
    return func(delocalize(string))
ValueError: could not convert string to float: '123,456'

This is because by default, the program is "in a locale" that has nothing to do with the platform the code is running on, but is instead defined by the POSIX standard. As the documentation explains:

Initially, when a program is started, the locale is the C locale, no matter what the user’s preferred locale is. There is one exception: the LC_CTYPE category is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by calling setlocale(LC_ALL, '').

That is: aside from making a note of the system's default setting for the preferred character encoding in text files (nowadays, this will likely be UTF-8), by default, the locale module will interpret data the same way that Python itself does (via a locale named C, after the C programming language). locale.atof will do the same thing as float passed a string, and similarly locale.atoi will mimic int.

Using a locale from the environment

Making the setlocale call mentioned in the above quote from the documentation will pull in locale settings from the user's environment. Thus:

>>> import locale
>>> # passing an empty string asks for a locale configured on the
>>> # local machine; the return value indicates what that locale is.
>>> locale.setlocale(locale.LC_ALL, '')
'en_CA.UTF-8'
>>> locale.atof('123,456.789')
123456.789
>>> locale.atof('123456.789')
123456.789

The locale will not care if the thousands separators are in the right place - it just recognizes and filters them:

>>> locale.atof('12,34,56.789')
123456.789

In 3.6 and up, it will also not care about underscores, which are separately handled by the built-in float and int conversion:

>>> locale.atof('12_34_56.789')
123456.789

On the other side, the string format method, and f-strings, are locale-aware if the n format is used:

>>> f'{123456.789:.9n}' # `.9` specifies 9 significant figures
'123,456.789'

Without the previous setlocale call, the output would not have the comma.

Setting a locale explicitly

It is also possible to make temporary locale settings, using the appropriate locale name, and apply those settings only to a specific aspect of localization. To get localized parsing and formatting only for numbers, for example, use LC_NUMERIC rather than LC_ALL in the setlocale call.

Here are some examples:

>>> # in Denmark, periods are thousands separators and commas are decimal points
>>> locale.setlocale(locale.LC_NUMERIC, 'en_DK.UTF-8')
'en_DK.UTF-8'
>>> locale.atof('123,456.789')
123.456789
>>> # Formatting a number according to the Indian lakh/crore system:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_IN.UTF-8')
'en_IN.UTF-8'
>>> f'{123456.789:9.9n}'
'1,23,456.789'

The necessary locale strings may depend on your operating system, and may require additional work to enable.

To get back to how Python behaves by default, use the C locale described previously, thus: locale.setlocale(locale.LC_ALL, 'C').

Caveats

Setting the locale affects program behaviour globally, and is not thread safe. If done at all, it should normally be done just once at the beginning of the program. Again quoting from documentation:

It is generally a bad idea to call setlocale() in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored.

If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-C locale settings.

When the Python code is embedded within a C program, setting the locale can even affect the C code:

Extension modules should never call setlocale(), except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale is C).

(N.B: when setlocale is called with a single category argument, or with None - not an empty string - for the locale name, it does not change anything, and simply returns the name of the existing locale.)

So, this is not meant as a tool, in production code, to try out experimentally parsing or formatting data that was meant for different locales. The above examples are only examples to illustrate how the system works. For this purpose, seek a third-party internationalization library.

However, if the data is all formatted according to a specific locale, specifying that locale ahead of time will make it possible to use locale.atoi and locale.atof as drop-in replacements for int and float calls on string input.

Malloch answered 9/7, 2011 at 9:22 Comment(1)
11.5 years later, my original version of the answer seems completely unacceptable to me (despite that apparently nobody ever saw fit to downvote it even once) - so I've completely rewritten it. I now thoroughly explain what the locale module does, how to figure out a proper locale setting, and the limitations of this approach - giving detailed examples throughout.Malloch
T
198

Just remove the , with replace():

float("123,456.908".replace(',',''))
Trophoblast answered 9/7, 2011 at 8:2 Comment(7)
this depends on locale. 9 988 776,65 € in France 9.988.776,65 € in Germany $9,988,776.65 in the United StatesAfrikah
If you are writing constants in the source code and want to use commas to make it more readable then this is the way to go rather than locale which would then make the code fail in another location. python cd_size = float("737,280,000".replace(',','')) (I actually used int)Wessels
If you are writing a string literal constant in the source code and then explicitly converting it to integer or float, that's a sign of something wrong with the design. But even if it can be defended - just temporarily set the locale to the one the code is written in, for that context, and then restore the context appropriate to your users when handling user input. That's why there is setlocale in the first place.Malloch
do not do this, it allows for things like "123,,,,345"Vudimir
This extremely dangerous replacement would fail in half of the world, the other half being China. And the US. Europe, Africa, Russia, South America, use , as the decimal separatorSwirly
@Wessels PEP 515 introduced underscores for numeric literals. In Python 3.6 you can just write 737_280_000 instead of doing string manipulations to make things more readable.Serendipity
Thank @Serendipity I think PEP-515 is a better solution if you have control over the text. In my case I didn't as was getting someone elses input.Wessels
S
12

If you don't know the locale and you want to parse any kind of number, use this parseNumber(text) function (My repo). It is not perfect but take into account most cases :

>>> parseNumber("a 125,00 €")
125
>>> parseNumber("100.000,000")
100000
>>> parseNumber("100 000,000")
100000
>>> parseNumber("100,000,000")
100000000
>>> parseNumber("100 000 000")
100000000
>>> parseNumber("100.001 001")
100.001
>>> parseNumber("$.3")
0.3
>>> parseNumber(".003")
0.003
>>> parseNumber(".003 55")
0.003
>>> parseNumber("3 005")
3005
>>> parseNumber("1.190,00 €")
1190
>>> parseNumber("1190,00 €")
1190
>>> parseNumber("1,190.00 €")
1190
>>> parseNumber("$1190.00")
1190
>>> parseNumber("$1 190.99")
1190.99
>>> parseNumber("1 000 000.3")
1000000.3
>>> parseNumber("1 0002,1.2")
10002.1
>>> parseNumber("")

>>> parseNumber(None)

>>> parseNumber(1)
1
>>> parseNumber(1.1)
1.1
>>> parseNumber("rrr1,.2o")
1
>>> parseNumber("rrr ,.o")

>>> parseNumber("rrr1rrr")
1
Subsume answered 24/4, 2018 at 14:15 Comment(3)
Please include the code as part of your answer. Answers need to be self-contained to prevent link-rot from making the answer useless.Girand
@PranavHosangadi there are several hundred lines of code. The answer effectively boils down to "use a third-party library", which isn't really answering the question, plus some self-promotion. Aside from that, this implementation is not locale-aware; if we are going to look for third-party libraries, we might as well look for one that is.Malloch
@KarlKnechtel I agree but I feel like I've had previous flags rejected for similar answers so I didn't flag this one. I did downvote it thoughGirand
P
11

If the input uses a comma as a decimal point and period as a thousands separator, use .replace twice to convert the data to the format used by the built-in float. Thus:

s = s.replace('.','').replace(',','.')
number = float(s)
Parable answered 25/7, 2018 at 9:45 Comment(0)
R
5

What about this?

 my_string = "123,456.908"
 commas_removed = my_string.replace(',', '') # remove comma separation
 my_float = float(commas_removed) # turn from string to float.

In short:

my_float = float(my_string.replace(',', ''))
Refectory answered 9/7, 2011 at 8:3 Comment(0)
P
5

Better solution for different currency formats:

def text_currency_to_float(text):
    t = text
    dot_pos = t.rfind('.')
    comma_pos = t.rfind(',')
    if comma_pos > dot_pos:
        t = t.replace(".", "")
        t = t.replace(",", ".")
    else:
        t = t.replace(",", "")
    return float(t)

This function detects whether a comma is a thousand separator or a period is a decimal separator, by checking where their positions appear in the string from the right. (The premise is that a thousand separators should not be used in the fractional part of the number.)

Photoactinic answered 5/3, 2019 at 14:7 Comment(0)
N
2
s =  "123,456.908"
print float(s.replace(',', ''))
Notch answered 9/7, 2011 at 8:2 Comment(0)
P
2

You may use babel:

from babel.numbers import parse_decimal
f = float(parse_decimal("123,456.908", locale="en_US"))
Primarily answered 25/9, 2022 at 18:14 Comment(0)
B
1

Here's a simple way I wrote up for you. :)

>>> number = '123,456,789.908'.replace(',', '') # '123456789.908'
>>> float(number)
123456789.908
Bowlds answered 9/7, 2011 at 8:4 Comment(3)
re is a big hammer for such a task.Deploy
@John Doe: Looks way better now. I like float(number) because of its descriptive touch. +1 ;-)Refectory
9 988 776,65 € in France 9.988.776,65 € in Germany $9,988,776.65 in the United States ----> Are you sure it works?Afrikah
S
0

Not the shortest solution, but for the sake of completeness and maybe interesting if you want to rely on an existing function that has been proven a million times: you can leverage pandas by injecting your number as StringIO to its read_csv() function (it has a C backend, so the conversion functionality cannot be leveraged directly - as far as I know).

>>> float(pd.read_csv(StringIO("1,000.23"), sep=";", thousands=",", header=None)[0])
1000.23

Specifically for floats: in case your number uses dots as thousands separator and comma as decimal separator, use the decimal="," parameter in addition to setting thousands=".".

Shan answered 6/5, 2023 at 6:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.