Interpreting Strings as Other Data Types in Python

Asked 31/1, 2012 at 0:30 Answered 4/6, 2024 at 7:2

I'm reading a file into python 2.4 that's structured like this:

field1: 7
field2: "Hello, world!"
field3: 6.2

The idea is to parse it into a dictionary that takes fieldfoo as the key and whatever comes after the colon as the value.

I want to convert whatever is after the colon to it's "actual" data type, that is, '7' should be converted to an int, "Hello, world!" to a string, etc. The only data types that need to be parsed are ints, floats and strings. Is there a function in the python standard library that would allow one to make this conversion easily?

The only things this should be used to parse were written by me, so (at least in this case) safety is not an issue.

Coagulase answered 31/1, 2012 at 0:30 Comment(0)

For older python versions, like the one being asked, the eval function can be used but, to reduce evilness, a dict to be the global namespace should be used as second argument to avoid function calls.

>>> [eval(i, {"__builtins__":None}) for i in ['6.2', '"Hello, world!"', '7']]
[6.2, 'Hello, world!', 7]

Firearm answered 31/1, 2012 at 0:50 Comment(1)

it raise "SyntaxError: unexpected EOF while parsing" when applying "alphanumeric" values instead to interpret a string. – Dowsabel 8/4, 2022 at 8:54

First parse your input into a list of pairs like fieldN: some_string. You can do this easily with re module, or probably even simpler with slicing left and right of the index line.strip().find(': '). Then use a literal eval on the value some_string:

>>> import ast
>>> ast.literal_eval('6.2')
6.2
>>> type(_)
<type 'float'>
>>> ast.literal_eval('"Hello, world!"')
'Hello, world!'
>>> type(_)
<type 'str'>
>>> ast.literal_eval('7')
7
>>> type(_)
<type 'int'>

Sartorius answered 31/1, 2012 at 0:35 Comment(7)

The version of python I'm using doesn't have the ast module. – Coagulase 31/1, 2012 at 0:40

@MikeSamuel obviously the input must be preprocessed into fieldn: string pairs first, but that part is trivial. @julio.alegria _ is a handy shortcut for the last returned value in the interactive interpreter. @Coagulase ..erm.. now you tell me ;) upgrade python? is there a reason why you need to use such an old version? – Sartorius 31/1, 2012 at 0:46

@Mike Samuel: Safety isn't an issue for me. I don't need to parse anything that I haven't written myself with another program. +1 on your comment for pointing it out, though. – Coagulase 31/1, 2012 at 0:47

mail.python.org/pipermail/python-list/2009-September/… here someone backported literal_eval to 2.4, but it all sounds a bit hacky to me. i would prefer to upgrade python than use that, personally. – Sartorius 31/1, 2012 at 0:57

@wim: I figured out I could just use eval(). See answer below, and thanks for pointing me in the right direction. – Coagulase 31/1, 2012 at 0:59

I know it's hacky and evil, but I don't see any other easy way of doing it. – Coagulase 31/1, 2012 at 2:57

@juliomalegria are you seriously that lost? – Jute 26/12, 2017 at 0:29

You can use yaml to parse the literals which is better than ast in that it does not throw you an error if strings are not wrapped around extra pairs of apostrophes or quotation marks.

>>> import yaml
>>> yaml.safe_load('7')
7
>>> yaml.safe_load('Hello')
'Hello'
>>> yaml.safe_load('7.5')
7.5

Bunker answered 18/2, 2022 at 3:11 Comment(0)

You can attempt to convert it to an int first using the built-in function int(). If the string cannot be interpreted as an int a ValueError exception is raised. You can then attempt to convert to a float using float(). If this fails also then just return the initial string

def interpret(val):
    try:
        return int(val)
    except ValueError:
        try:
            return float(val)
        except ValueError:
            return val

Lorica answered 31/1, 2012 at 0:56 Comment(0)

>>> [eval(i, {"__builtins__":None}) for i in ['6.2', '"Hello, world!"', '7']]
[6.2, 'Hello, world!', 7]

Firearm answered 31/1, 2012 at 0:50 Comment(1)

it raise "SyntaxError: unexpected EOF while parsing" when applying "alphanumeric" values instead to interpret a string. – Dowsabel 8/4, 2022 at 8:54

Since the "only data types that need to be parsed are int, float and str", maybe somthing like this will work for you:

entries = {'field1': '7', 'field2': "Hello, world!", 'field3': '6.2'}

for k,v in entries.items():
    if v.isdecimal():
        conv = int(v)
    else:
        try:
            conv = float(v)
        except ValueError:
            conv = v
    entries[k] = conv

print(entries)
# {'field2': 'Hello, world!', 'field3': 6.2, 'field1': 7}

Absent answered 31/1, 2012 at 0:57 Comment(0)

There is strconv lib.

In [22]: import strconv
/home/tworec/.local/lib/python2.7/site-packages/strconv.py:200: UserWarning: python-dateutil is not installed. As of version 0.5, this will be a hard dependency of strconv fordatetime parsing. Without it, only a limited set of datetime formats are supported without timezones.
  warnings.warn('python-dateutil is not installed. As of version 0.5, '

In [23]: strconv.convert('1.2')
Out[23]: 1.2

In [24]: type(strconv.convert('1.2'))
Out[24]: float

In [25]: type(strconv.convert('12'))
Out[25]: int

In [26]: type(strconv.convert('true'))
Out[26]: bool

In [27]: type(strconv.convert('tRue'))
Out[27]: bool

In [28]: type(strconv.convert('12 Jan'))
Out[28]: str

In [29]: type(strconv.convert('12 Jan 2018'))
Out[29]: str

In [30]: type(strconv.convert('2018-01-01'))
Out[30]: datetime.date

Sennet answered 9/5, 2018 at 17:56 Comment(1)

Actually, it does not handle unicode strings, see github.com/bruth/strconv/issues/2 – Unbeaten 10/1, 2019 at 20:30

Hope this helps to do what you are trying to do:

#!/usr/bin/python

a = {'field1': 7}
b = {'field2': "Hello, world!"}
c = {'field3': 6.2}

temp1 = type(a['field1'])
temp2 = type(b['field2'])
temp3 = type(c['field3'])

print temp1
print temp2
print temp3

Brack answered 31/1, 2012 at 0:40 Comment(2)

I don't want to get the types of objects in a dictionary, I want to convert strings in a dictionary that are annotated as python types to the types they represent. – Coagulase 31/1, 2012 at 0:42

Can you post example input and output, that will easier to understand? – Brack 31/1, 2012 at 0:44

Thanks to wim for helping me figure out what I needed to search for to figure this out.

One can just use eval():

>>> a=eval("7")
>>> b=eval("3")
>>> a+b
10
>>> b=eval("7.2")
>>> a=eval("3.5")
>>> a+b
10.699999999999999
>>> a=eval('"Hello, "')
>>> b=eval('"world!"')
>>> a+b
'Hello, world!'

Coagulase answered 31/1, 2012 at 0:57 Comment(3)

Great! Now make sure you don't import os in your source, to avoid evaluating values like os.system("rm *"). And that's not the only way. So this method works, but it's not recommended. – Wald 14/2, 2012 at 20:45

It's evil and insecure, but this entire script is a quick and dirty fix that should (ideally) be thrown away in a few months. – Coagulase 14/2, 2012 at 20:57

I had a Q&D awk script that I wrote in 1989 implementing a very crude commercial order processor “until the app we wait is ready” that was still being used up to 1996 that I know of, and a Q&D 1995 QBasic army service chores assigner (whatever you might understand of it :) that was still used in 2007 (albeit modified by others to no end, I presume), so I'm certain “quick&dirty” programs are as quick but lots more dirtier than people usually think they are. – Wald 15/2, 2012 at 0:22

I put together this function to help with the type inference of lists.

def infer_dtypes(values:List, sample_size:int=300, stop_after:int=300):
    """
    Infers the data type by randomly sampling from a list. Values are explicitly converted to string before checking.

    Args:
        values (list): A list to infer data types from.
        sample_size (int, optional): The number of values to sample from the list. Entire list will be sampled if set to None. Defaults to 300.
        stop_after (int, optional): The maximum number of non-empty values needed for the test. Equal to sample_size if set to None. Defaults to 300.

    Returns:
        str: The inferred data type ('int', 'float', 'bool', 'str', 'mixed', 'empty').
    """
    found = 0
    non_empty_count = 0

    sample_size = sample_size if sample_size is not None else len(values)
    stop_after = stop_after if stop_after is not None else sample_size

    for v in np.random.choice(values, sample_size):
        v = str(v)
        if v != '':
            non_empty_count += 1
            if non_empty_count > stop_after:
                break
            try:
                int(v)
                found |= 1
            except ValueError:
                try:
                    float(v)
                    found |= 2
                except ValueError:
                    if v.lower() in ['true', 'false']:
                        found |= 4
                    else:
                        found |= 8


    # Check if the data is mixed
    if bin(found).count('1') > 1:
        return 'mixed'

    if found & 8:
        return 'str'
    elif found & 4:
        return 'bool'
    elif found & 2:
        return 'float'
    elif found & 1:
        return 'int'
    else:
        return 'empty'

Produces:

infer_dtypes(['', '', '1', '2', '3', '4', '5'])  # int
infer_dtypes(['', '', '1.0', '2.0', '', '3.0', '4.4', '5.0'])  # float
infer_dtypes(['', '', 'True', 'False', '', '', 'False', 'True'])  # bool
infer_dtypes(['', '', 'never', 'gonna', '', '', 'give', ''])  # str
infer_dtypes(['', '', 'never', '', '5', 'True', '5.2', ''])  # mixed
infer_dtypes(['', '', '', '', '', '', '', ''])  # empty

Rationale, feel free to skip this:

I wrote this function as currently Pandas' df.convert_dtypes, df.infer_objects and pd.to_numeric don't work nicely if you have columns with empty strings. This could be solved (source 1, source 2) if a DataFrame has columns of uniform datatypes, for example if we know that it only has floats we could replace '' with np.nan and then infer. However for a DataFrame with mixed column types (strings, floats, ints), replacing '' with np.nan wouldn't work. This function helps solve this issue by running:

values = np.where(pd.isnull(df.T.values), '', df.T.values)
for l in values:
    infer_dtypes(l)

See this GitHub Gist for a full example. Hope it helps!

Cherry answered 4/6, 2024 at 7:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags