How can I consistently convert strings like "3.71B" and "4M" to numbers in Python?
Asked Answered
C

5

8

I have some rather mangled code that almost produces the tangible price/book from Yahoo Finance for companies (a nice module called ystockquote gets the intangible price/book value already).

My problem is this:

For one of the variables in the calculation, shares outstanding I'm getting strings like 10.89B and 4.9M, where B and M stand respectively for billion and million. I'm having trouble converting them to numbers, here's where I'm at:

shares=''.join(node.findAll(text=True)).strip().replace('M','000000').replace('B','000000000').replace('.','') for node in soup2.findAll('td')[110:112]

Which is pretty messy, but I think it would work if instead of

.replace('M','000000').replace('B','000000000').replace('.','') 

I was using a regular expression with variables. I guess the question is simply which regular expression and variables. Other suggestions are also good.

EDIT:

To be specific I'm hoping to have something that works for numbers with zero, one, or two decimals but these answers all look helpful.

Cristinecristiona answered 10/8, 2012 at 6:32 Comment(3)
unrelated to your question, beware join.strip.replace.replace.replace constructs, they are hard to read and harder to debug. See python.org/dev/peps/pep-0008 and python.org/dev/peps/pep-0020 and keep to them wholly. Your future self will thank you.Peculium
Unrelated to what was unrelated: I especially like import thisCristinecristiona
See also import antigravityPeculium
B
20
>>> from decimal import Decimal
>>> d = {
        'K': 3,
        'M': 6,
        'B': 9
}
>>> def text_to_num(text):
        if text[-1] in d:
            num, magnitude = text[:-1], text[-1]
            return Decimal(num) * 10 ** d[magnitude]
        else:
            return Decimal(text)

>>> text_to_num('3.17B')
Decimal('3170000000.00')
>>> text_to_num('4M')
Decimal('4000000')
>>> text_to_num('4.1234567891234B')
Decimal('4123456789.1234000000000')

You can int() the result if you want too

Budd answered 10/8, 2012 at 6:40 Comment(5)
>>>text_to_num('4.1234567891234B') # 4123456789L. tsk tsk. precision :)Unweighed
I assume he is only having integer inputs anyway so that is not a problem and this solution still stands correctBudd
it stands 'correct' until someone passes a sub-decimal-resolving input. Maybe you should throw an exception to reinforce this assumption.Unweighed
Hahah, code battles! You guys are awesome, they're not only integers but either of your solutions would suffice because it doesn't need to be that specific. There are only two decimals in each case though, so really @Budd wins the code battle here in truth, as far as I can see. Unfortunatly I'm getting inf with both of your guys' solutions, but I think it's my fault because @Budd has demonstrated that everyone's code works fine ;) I'm selecting his answer, partly for his enthusiasm and partly for my upper commentary (max two decimals) though the int assumption is not correct. Thanks!Cristinecristiona
The issues referred to by @PreetKukreti relate to an original revision of this code that converted to int which is why it was having issues, but that was shortly switched to Decimal completely so it has had no such issue since thenBudd
E
4

Parse the numbers as floats, and use a multiplier mapping:

multipliers = dict(M=10**6, B=10**9)
def sharesNumber(nodeText):
    nodeText = nodeText.strip()
    mult = 1
    if nodeText[-1] in multipliers:
        mult = multipliers[nodeText[-1]]
        nodeText = nodeText[:-1]
    return float(nodeText) * mult
Essentiality answered 10/8, 2012 at 6:36 Comment(3)
I think either of these will work but the bottom one seems like it might be easier so I'm trying it first.Cristinecristiona
>>> sharesNumber('4.123456789B') 4123456788.9999995 >>> sharesNumber('123456789.987654321B') 1.2345678998765434e+17Budd
@jamylak: I doubt the share numbers will ever be that large.Essentiality
U
3
num_replace = {
    'B' : 1000000000,
    'M' : 1000000,
}

a = "4.9M" 
b = "10.89B" 

def pure_number(s):
    mult = 1.0
    while s[-1] in num_replace:
        mult *= num_replace[s[-1]]
        s = s[:-1]
    return float(s) * mult 

pure_number(a) # 4900000.0
pure_number(b) # 10890000000.0

This will work with idiocy like:

pure_number("5.2MB") # 5200000000000000.0

and because of the dictionary approach, you can add as many suffixes as you want in an easy to maintain way, and you can make it more lenient by expressing your dict keys in one capitalisation form and then doing a .lower() or .upper() to make it match.

Unweighed answered 10/8, 2012 at 6:43 Comment(2)
>>> pure_number('4.123456789B') 4123456788.9999995 also >>> pure_number('123456789.987654321B') 1.2345678998765434e+17Budd
@Budd this is a representation inaccuracy. Even python int or long have precision limitations given a few more digits. There is nothing wrong with the solutionUnweighed
A
2
num_replace = {
    'B' : 'e9',
    'M' : 'e6',
}

def str_to_num(s):
    if s[-1] in num_replace:
        s = s[:-1]+num_replace[s[-1]]
    return int(float(s))

>>> str_to_num('3.71B')
3710000000L
>>> str_to_num('4M')
4000000

So '3.71B' -> '3.71e9' -> 3710000000L etc.

Alfons answered 10/8, 2012 at 6:54 Comment(0)
C
1

This could be an opportunity to safely use eval!! :-)

Consider the following fragment:

>>> d = { "B" :' * 1e9', "M" : '* 1e6'}
>>> s = "1.493B"
>>> ll = [d.get(c, c) for c in s]
>>> eval(''.join(ll), {}, {})
1493000000.0

Now put it all together into a neat one liner:

d = { "B" :' * 1e9', "M" : '* 1e6'}

def human_to_int(s):
    return eval(''.join([d.get(c, c) for c in s]), {}, {})

print human_to_int('1.439B')
print human_to_int('1.23456789M')

Gives back:

1439000000.0
1234567.89
Cuneiform answered 10/8, 2012 at 8:37 Comment(5)
ast.literal_eval is safer and doesn't require ugly dictionaries passed as eval namespaceBudd
@Budd ast.literal_eval only works for literals and not expressions - hence cannot be substituted in the above code. :)Cuneiform
Oh sorry I missed the '* ' in the strings. I can't see properly today. On that note, If you remove the ' *' then ast.literal_eval will work, although in that case you just use int and be more explicit and that solution has already been posted.Budd
True .. but my solution is fully functional, while the int solution requires if statement :-)Cuneiform
Ah ok then +1 then, although you should remove the '* 1' and use ast.literal_eval.Budd

© 2022 - 2024 — McMap. All rights reserved.