Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?
Asked Answered
S

5

21

I am having issues accessing data inside a dictionary.

Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.

I am working with a dask.dataframe created from a csv.

Edit Question

How I got to this point

Assume I start out with a Pandas Series:

df.Coordinates
130      {u'type': u'Point', u'coordinates': [-43.30175...
278      {u'type': u'Point', u'coordinates': [-51.17913...
425      {u'type': u'Point', u'coordinates': [-43.17986...
440      {u'type': u'Point', u'coordinates': [-51.16376...
877      {u'type': u'Point', u'coordinates': [-43.17986...
1313     {u'type': u'Point', u'coordinates': [-49.72688...
1734     {u'type': u'Point', u'coordinates': [-43.57405...
1817     {u'type': u'Point', u'coordinates': [-43.77649...
1835     {u'type': u'Point', u'coordinates': [-43.17132...
2739     {u'type': u'Point', u'coordinates': [-43.19583...
2915     {u'type': u'Point', u'coordinates': [-43.17986...
3035     {u'type': u'Point', u'coordinates': [-51.01583...
3097     {u'type': u'Point', u'coordinates': [-43.17891...
3974     {u'type': u'Point', u'coordinates': [-8.633880...
3983     {u'type': u'Point', u'coordinates': [-46.64960...
4424     {u'type': u'Point', u'coordinates': [-43.17986...

The problem is, this is not a true dataframe of dictionaries. Instead, it's a column full of strings that LOOK like dictionaries. Running this show it:

df.Coordinates.apply(type)
130      <class 'str'>
278      <class 'str'>
425      <class 'str'>
440      <class 'str'>
877      <class 'str'>
1313     <class 'str'>
1734     <class 'str'>
1817     <class 'str'>
1835     <class 'str'>
2739     <class 'str'>
2915     <class 'str'>
3035     <class 'str'>
3097     <class 'str'>
3974     <class 'str'>
3983     <class 'str'>
4424     <class 'str'>

My Goal: Access the coordinates key and value in the dictionary. That's it. But it's a str

I converted the strings to dictionaries using eval.

new = df.Coordinates.apply(eval)
130      {'coordinates': [-43.301755, -22.990065], 'typ...
278      {'coordinates': [-51.17913026, -30.01201896], ...
425      {'coordinates': [-43.17986794, -22.91000096], ...
440      {'coordinates': [-51.16376782, -29.95488677], ...
877      {'coordinates': [-43.17986794, -22.91000096], ...
1313     {'coordinates': [-49.72688407, -29.33757253], ...
1734     {'coordinates': [-43.574057, -22.928059], 'typ...
1817     {'coordinates': [-43.77649254, -22.86940539], ...
1835     {'coordinates': [-43.17132318, -22.90895217], ...
2739     {'coordinates': [-43.1958313, -22.98755333], '...
2915     {'coordinates': [-43.17986794, -22.91000096], ...
3035     {'coordinates': [-51.01583481, -29.63593292], ...
3097     {'coordinates': [-43.17891379, -22.96476163], ...
3974     {'coordinates': [-8.63388008, 41.14594453], 't...
3983     {'coordinates': [-46.64960938, -23.55902666], ...
4424     {'coordinates': [-43.17986794, -22.91000096], ...

Next I text the type of object and get:

130      <class 'dict'>
278      <class 'dict'>
425      <class 'dict'>
440      <class 'dict'>
877      <class 'dict'>
1313     <class 'dict'>
1734     <class 'dict'>
1817     <class 'dict'>
1835     <class 'dict'>
2739     <class 'dict'>
2915     <class 'dict'>
3035     <class 'dict'>
3097     <class 'dict'>
3974     <class 'dict'>
3983     <class 'dict'>
4424     <class 'dict'>

If I try to access my dictionaries: new.apply(lambda x: x['coordinates']

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-71-c0ad459ed1cc> in <module>()
----> 1 dfCombined.Coordinates.apply(coord_getter)

/Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2218         else:
   2219             values = self.asobject
-> 2220             mapped = lib.map_infer(values, f, convert=convert_dtype)
   2221 
   2222         if len(mapped) and isinstance(mapped[0], Series):

pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)()

<ipython-input-68-748ce2d8529e> in coord_getter(row)
      1 import ast
      2 def coord_getter(row):
----> 3     return (ast.literal_eval(row))['coordinates']

TypeError: 'bool' object is not subscriptable

It's some type of class, because when I run dir I get this for one object:

new.apply(lambda x: dir(x))[130]
130           __class__
130        __contains__
130         __delattr__
130         __delitem__
130             __dir__
130             __doc__
130              __eq__
130          __format__
130              __ge__
130    __getattribute__
130         __getitem__
130              __gt__
130            __hash__
130            __init__
130            __iter__
130              __le__
130             __len__
130              __lt__
130              __ne__
130             __new__
130          __reduce__
130       __reduce_ex__
130            __repr__
130         __setattr__
130         __setitem__
130          __sizeof__
130             __str__
130    __subclasshook__
130               clear
130                copy
130            fromkeys
130                 get
130               items
130                keys
130                 pop
130             popitem
130          setdefault
130              update
130              values
Name: Coordinates, dtype: object

My Problem: I just want to access the dictionary. But, the object is <class 'dict'>. How do I covert this to a regular dict or just access the key:value pairs?

Any ideas??

Sophist answered 26/8, 2016 at 15:25 Comment(2)
The exception you've shown doesn't match the code you said was causing it. It shows a coord_getter function, which is not quite the same as the lambda you showed before.Selfsatisfaction
Are you reading the csv yourself into a dataframe? It seems likely that this problem could be solved by improving how the data is read from the csv in the first place.Hildick
W
12

Just ran into this problem. My solution:

import ast
import pandas as pd

df = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"])

df = df["Coordinates"].astype('str')
df = df.apply(lambda x: ast.literal_eval(x))
df = df.apply(pd.Series)
Whiney answered 29/9, 2019 at 13:58 Comment(0)
M
8

My first instinct is to use the json.loads to cast the strings into dicts. But the example you've posted does not follow the json standard since it uses single instead of double quotes. So you have to convert the strings first.

A second option is to just use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more robust since lat/long coords are fairly standard.

import re
import pandasd as pd

df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",
    "{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],
    'idx': [130, 278]})


##
# Solution 1- use json.loads
##

def string_to_dict(dict_string):
    # Convert to proper json format
    dict_string = dict_string.replace("'", '"').replace('u"', '"')
    return json.loads(dict_string)

df.CoordDicts = df.Coordinates.apply(string_to_dict)
df.CoordDicts[0]['coordinates']
#>>> [-43.30175, 123.45]


##
# Solution 2 - use regex
##
def get_lat_lon(dict_string):
    # Get the coordinates string with regex
    rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()
    # Cast to floats
    coords = [float(x) for x in rs.split(',')]
    return coords

df.Coords = df.Coordinates.apply(get_lat_lon)
df.Coords[0]
#>>> [-43.30175, 123.45]
Maiamaiah answered 27/8, 2016 at 3:5 Comment(4)
@Linwoodc3, FYI, on my system, your method of using eval works with my example DataFrame. I am using Python 2.7. Despite the version differences, I expect the regex solution to still work.Maiamaiah
Sorry, just came back. Will check!Sophist
Got an error again. "TypeError: expected string or bytes-like object"Sophist
So the string.replace for the quotes, followed by json.loads, works in my case. However, I think this shouldn't happen - in my case the original data was formatted correctly as dictionaries, and only got coerced to strings after I wrote it out to CSV and read it back in.Welcy
S
2

Assuming you start with a Series of dicts, you can use the .tolist() method to create a list of dicts and use this as input for a DataFrame. This approach will map each distinct key to a column.

You can filter by keys on creation by setting the columns argument in pd.DataFrame(), giving you the neat one-liner below. Hope that helps.

# Starting assumption:
data = ["{'coordinates': [-43.301755, -22.990065], 'type': 'Point', 'elevation': 1000}",
        "{'coordinates': [-51.17913026, -30.01201896], 'type': 'Point'}"]
s = pd.Series(data).apply(eval)

# Create a DataFrame with a list of dicts with a selection of columns
pd.DataFrame(s.tolist(), columns=['coordinates'])
Out[1]: 
                    coordinates
0      [-43.301755, -22.990065]
1  [-51.17913026, -30.01201896]
Stadia answered 11/11, 2019 at 17:38 Comment(1)
Note - The dicts in your list do not need to be of the same length for this to work. Dicts may miss multiple keys that are present in other dicts and vice versa. For example, when you run pd.DataFrame(s.tolist()) you will notice that elevation is set to NaN in the second row.Stadia
B
1

I just found a solution which works for all type of dict(json format) whether it is in pure format or not.

pd.json_normalize((df['column'].apply(lambda x: json.loads(x))))

The apply will convert each row to json format. Remember, this is to be used if values are not in proper format.

If the dict is in proper format use directly pd.json_normalize.

Bellebelleek answered 9/2 at 7:30 Comment(0)
S
0

It looks like you end up with something like this

s = pd.Series([
        dict(type='Point', coordinates=[1, 1]),
        dict(type='Point', coordinates=[1, 2]),
        dict(type='Point', coordinates=[1, 3]),
        dict(type='Point', coordinates=[1, 4]),
        dict(type='Point', coordinates=[1, 5]),
        dict(type='Point', coordinates=[2, 1]),
        dict(type='Point', coordinates=[2, 2]),
        dict(type='Point', coordinates=[2, 3]),        
    ])

s

0    {u'type': u'Point', u'coordinates': [1, 1]}
1    {u'type': u'Point', u'coordinates': [1, 2]}
2    {u'type': u'Point', u'coordinates': [1, 3]}
3    {u'type': u'Point', u'coordinates': [1, 4]}
4    {u'type': u'Point', u'coordinates': [1, 5]}
5    {u'type': u'Point', u'coordinates': [2, 1]}
6    {u'type': u'Point', u'coordinates': [2, 2]}
7    {u'type': u'Point', u'coordinates': [2, 3]}
dtype: object

Solution

df = s.apply(pd.Series)
df

enter image description here

then access coordinates

df.coordinates

0    [1, 1]
1    [1, 2]
2    [1, 3]
3    [1, 4]
4    [1, 5]
5    [2, 1]
6    [2, 2]
7    [2, 3]
Name: coordinates, dtype: object

Or even

df.coordinates.apply(pd.Series)

enter image description here

Somato answered 26/8, 2016 at 15:33 Comment(1)
Thanks for the help @piRSquared, but that gave me the same error. I added more information above. When I run dir on the objects, it's some type of class. Any suggestions?Sophist

© 2022 - 2024 — McMap. All rights reserved.