parent-child relationship query in simple_salesforce python, extracting from ordered dicts
Asked Answered
S

3

6

I'm trying to query information from salesforce using the simple_salesforce package in python.

The problem is that it's nesting fields that are a part of a parent-child relationship into an ordered dict within an ordered dict

I want.. from the Opportunity object, to find the id, and the accountid associated with that record.

The SOQL query may look like..

query = "select id, account.id from opportunity where closedate = last_n_days:5"

in SOQL (salesforce object query language), a dot denotes a parent child relationship in the database. So I'm trying to get the id from the opportunity object, and then the related id from the account object on that record.

for some reason the Id comes in fine, but the account.id is nested in an ordered dict within an ordered dict:

q = sf.query_all(query)

this pulls back an ordered dictionary..

OrderedDict([('totalSize', 455),
             ('done', True),
             ('records',
              [OrderedDict([('attributes',
                             OrderedDict([('type', 'Opportunity'),
                                          ('url',

I would pull the records piece of the ordereddict to create a df

df = pd.DataFrame(q['records'])

This gives me 3 columns, an ordered dict called 'attributes', Id and another ordered dict called 'Account'. I'm looking for a way to extract the ('BillingCountry', 'United States') piece out of the nested ordered dict 'Account'

[OrderedDict([('attributes',
               OrderedDict([('type', 'Opportunity'),
                            ('url',
                             '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
              ('Id', '0061B003451RhZgiHHF'),
              ('Account',
               OrderedDict([('attributes',
                             OrderedDict([('type', 'Account'),
                                          ('url',
                                           '/services/data/v34.0/sobjects/Account/001304300MviPPF3Z')])),
                            ('BillingCountry', 'United States')]))])

Edit: clarifying what I'm looking for.

I want to end with a dataframe with a column for each of the queried fields.

When I put the 'records' piece into a DataFrame using df = pd.DataFrame(sf.query_all(query)['records']) it gives me:

attributes  Id  Account
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])    0061B003451RhZgiHHF OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')]) 0061B00001Pa52QQAR  OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')]) 0061B00001TRu5mQAD  OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])), ('BillingCountry', 'United States')])

after I remove the 'attributes' column I want the output to be

Id BillingCountry
0061B003451RhZgiHHF 'United States'
0061B00001Pa52QQAR 'United States'
0061B00001TRu5mQAD 'United States'
Seamstress answered 17/12, 2017 at 22:34 Comment(7)
@StephenRauch I updated my Q to provide some claritySeamstress
Why are you putting this into a Dataframe? What is this providing you? What form are the records from sf.query_all(query)?Furlana
its effectively a sql query from a database. But for some reason the query function within the package returns an ordered dict. I can turn the records from the ordered dict into a dataframe, but in this case, one of the lists which become Series when I convert it, is actually another ordered dict.Seamstress
Yes but. Why are you going into the dataframe? These seems an extra step, that does not provide any value, only confusion.Furlana
what form would you keep it in?Seamstress
I would think a simple iteration on the query result would be easier.Furlana
the way the system accepts updates is in the form of a traditional dataframe or table. Its easiest to work with a dataframe to manipulate the data when its in a tidy form, so i can push the output to different places.Seamstress
F
11

Pandas is an amazing tool for tabular data. But while it can contain Python objects, that is not its sweet spot. I suggest you extract your data from the query prior to inserting them into a pandas.Dataframe:

Extract records:

To extract the desired fields as a list of dictionaries is as easy as:

records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
           for rec in data['records']]

Insert records into a dataframe:

With a list of dicts, a dataframe is as easy as:

df = pd.DataFrame(records)

Test Code:

import pandas as pd
from collections import OrderedDict

data = OrderedDict([
    ('totalSize', 455),
    ('done', True),
    ('records', [
        OrderedDict([
            ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
            ('Id', '0061B003451RhZgiHHF'),
            ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])),
                                     ('BillingCountry', 'United States')])),
        ]),
        OrderedDict([
            ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')])),
            ('Id', '0061B00001Pa52QQAR'),
            ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])),
                                     ('BillingCountry', 'United States')])),
        ]),
        OrderedDict([
            ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')])),
            ('Id', '0061B00001TRu5mQAD'),
            ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])),
                                     ('BillingCountry', 'United States')])),
        ]),
    ])
])

records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
           for rec in data['records']]
for r in records:
    print(r)

print(pd.DataFrame(records))

Test Results:

{'country': 'United States', 'id': '0061B003451RhZgiHHF'}
{'country': 'United States', 'id': '0061B00001Pa52QQAR'}
{'country': 'United States', 'id': '0061B00001TRu5mQAD'}

         country                   id
0  United States  0061B003451RhZgiHHF
1  United States   0061B00001Pa52QQAR
2  United States   0061B00001TRu5mQAD
Furlana answered 18/12, 2017 at 0:48 Comment(3)
perfect answer. Well explained, I learned a lot, and it solved my issue. Thanks very much @Stephen RauchSeamstress
If the desired field could have NoneType value from Salesforce, could you please share how to make dictionaries script that you provide working? Sorry I am the python newbie so please let me know. Thank you.Igbo
@IversonWang. It is best to ask a new question, instead of leaving a comment here. In your question be sure to explain exactly what your data looks like, what you are trying to achieve, and what problems you have encountered.Furlana
S
2

Pandas can read ordered dicts.

import pandas as pd
from simple_salesforce import Salesforce

sf = Salesforce(username='your_username',   
                password='your_password',
                security_token='your_token')

query = "select id, account.id from opportunity where closedate = last_n_days:5"
df = pd.DataFrame(sf.query_all(query)['records']).drop(columns='attributes')
Shannashannah answered 19/7, 2018 at 8:38 Comment(1)
it can, but this doesn't solve the issue where account.id comes in as an ordered dict as well, not as a field. the account table is a sub table that is related to the opportunity table. The dictionary in Stephen's answer is how to solve for that. But thanks!Seamstress
L
1

For anyone still interested to this topic I recommend the burner_process package. It does recursive un-nesting so you can make multiple-level relationship queries (not just two levels) and get a plain dataframe as ouptut.

pip install simpler-sf

import simpler_sf
simpler_sf.simple_salesforce()
import simple_salesforce
sf = simple_salesforce.Salesforce(...)
query = 'SELECT Contact.Id, Account.Name, Campaign FROM CampaignMember'
df = sf.smart_query(query)
print(df)

Output:

    Contact.FirstName   Account.Name           Campaign
0   Emily                     Amazon   CampaignA_2023Q2
1   Jasmine                   Amazon   CampaignA_2023Q2
2   Míng                   Microsoft   CampaignB_2022Q4
3   Magdalena                 Google   CampaignC_2023Q1

GitHub repo

Ledda answered 4/5, 2023 at 19:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.