Python equivalent to C strtod

Asked 27/9, 2011 at 6:11 Answered 21/5, 2018 at 19:57

I am working on converting parts of a C++ program to Python, but I have some trouble replacing the C function strtod. The strings I'm working on consists of simple mathmatical-ish equations, such as "KM/1000.0". The problem is that the both constants and numbers are mixed and I'm therefore unable to use float().

How can a Python function be written to simulate strtod which returns both the converted number and the position of the next character?

Stour answered 27/9, 2011 at 6:11 Comment(4)

Can't you just split up the string beforehand? – Camillacamille 27/9, 2011 at 6:16

Do you need to parse exponential notation, too? – Mcreynolds 27/9, 2011 at 6:16

#386058 – Asphyxia 27/9, 2011 at 6:26

Here is the c code from Python's source for this, if you want to re-implement -- svn.python.org/projects/python/trunk/Python/strtod.c – Duprey 27/9, 2011 at 6:55

I'm not aware of any existing functions that would do that.

However, it's pretty easy to write one using regular expressions:

import re

# returns (float,endpos)
def strtod(s, pos):
  m = re.match(r'[+-]?\d*[.]?\d*(?:[eE][+-]?\d+)?', s[pos:])
  if m.group(0) == '': raise ValueError('bad float: %s' % s[pos:])
  return float(m.group(0)), pos + m.end()

print strtod('(a+2.0)/1e-1', 3)
print strtod('(a+2.0)/1e-1', 8)

A better overall approach might be to build a lexical scanner that would tokenize the expression first, and then work with a sequence of tokens rather than directly with the string (or indeed go the whole hog and build a yacc-style parser).

Elonore answered 27/9, 2011 at 6:22 Comment(0)

You can create a simple C strtod wrapper:

#include <stdlib.h>

double strtod_wrap(const char *nptr, char **endptr)
{
   return strtod(nptr, endptr);
}

compile with:

gcc -fPIC -shared -o libstrtod.dll strtod.c

(if you're using Python 64 bit, the compiler must be 64-bit as well)

and call it using ctypes from python (linux: change .dll to .so in the lib target and in the code below, this was tested on Windows):

import ctypes

_strtod = ctypes.CDLL('libstrtod.dll')
_strtod.strtod_wrap.argtypes = (ctypes.c_char_p, ctypes.POINTER(ctypes.c_char_p))
_strtod.strtod_wrap.restype = ctypes.c_double

def strtod(s):
    p = ctypes.c_char_p(0)
    s = ctypes.create_string_buffer(s.encode('utf-8'))
    result = _strtod.strtod_wrap(s, ctypes.byref(p))
    return result,ctypes.string_at(p)

print(strtod("12.5hello"))

prints:

(12.5, b'hello')

(It's not as hard as it seems, since I learned how to do that just 10 minutes ago)

Useful Q&As about ctypes

Corfam answered 21/5, 2018 at 19:57 Comment(4)

Creating the wrapper seems unnecessary; you should be able to do this with strtod directly. – Airboat 21/5, 2018 at 20:10

that would be even better. I have to test that first :) – Woodsum 21/5, 2018 at 20:19

You should be able to load strtod from a platform-specific existing shared library file. ctypes.cdll.msvcrt should work on Windows. I believe it's commonly cdtypes.CDLL('libc.so.6') on Linux, but I don't know how universal that is. It's probably also possible to compile your own file to access strtod from, though I'm not sure what the details of that would look like. (#include <stdlib.h> on its own seems like it might work.) – Airboat 21/5, 2018 at 20:34

I have tried that single stdlib.h include alone in the C file and it seems that the strtod symbol isn't linked so it doesn't work (python cannot find it). Sticking to the empty wrapper for now. It's working, and it's portable at source level apart from the .dll/.so part. As stated in the answer, I'm not a ctypes specialist. Just made it work (and was impressed by the simplicity of the python code). – Woodsum 21/5, 2018 at 20:49

I'd use a regular expression for this:

import re
mystring = "1.3 times 456.789 equals 593.8257 (or 5.93E2)"
def findfloats(s):
    regex = re.compile(r"[+-]?\b\d+(?:\.\d+)?(?:e[+-]?\d+)?\b", re.I)
    for match in regex.finditer(mystring):
        yield (match.group(), match.start(), match.end())

This finds all floating point numbers in the string and returns them together with their positions.

>>> for item in findfloats(mystring):
...     print(item)
...
('1.3', 0, 3)
('456.789', 10, 17)
('593.8257', 25, 33)
('5.93E2', 38, 44)

Mcreynolds answered 27/9, 2011 at 6:23 Comment(2)

I can think of a bunch of valid floats that wouldn't get picked up. – Elonore 27/9, 2011 at 6:27

The regex assumes an integer part. Everything else is optional. If there is a decimal point, a fractional part is required. So .1 and 1. won't be picked up. Of course it's trivial to modify the regex if necessary. – Mcreynolds 27/9, 2011 at 6:33

parse the number yourself.

a recursive-descent parser is very easy for this kind of input. first write a grammar:

float ::= ipart ('.' fpart)* ('e' exp)*
ipart ::= digit+
fpart ::= digit+
exp   ::= ('+'|'-') digit+
digit = ['0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9']

now converting this grammar to a function should be straightforward...

Amenity answered 27/9, 2011 at 6:27 Comment(4)

There should be a ('+'|'-') before ipart in the definition of float – Telemechanics 21/5, 2018 at 20:20

@madphysicist it depends on the context. When parsing single standalone numbers, indeed you need to parse the leading sign. when parsing a numerical expression, you avoid including the sign because it would allow strange expression like "42-+37.2" (I seem to remember that I copied this grammar from the grammar of a well known language) – Amenity 23/5, 2018 at 6:8

42-+37.2 seems like a reasonable expression to me. – Telemechanics 23/5, 2018 at 6:32

although it is reasonable to any math-inclined human being, such grammar implies that you can also write 42--37.2, which confuses a C or C++ parser (but strangely C++ accepts 42-+37.2). As such, many (most) programming languages treats a leading sign as an unary operator, that is, an entity clearly separated from the following number. and some languages do not allow a unary operator anywhere else than the start of an expression. Anyway, for simple parsing of standalone numbers, the grammar above is indeed missing those unary operators. – Amenity 23/5, 2018 at 13:48

Recommended topics

Hot tags