How to get SalesForce data to Python Panda dataframes
Asked Answered
M

3

6

Currently we are taking SalesForce data in to CSV file and reading this CSV file in Pandas using read_csv, to_csv methods. Do we have any other way to get data from SalesForce to pandas dataframe.

Mention answered 31/8, 2018 at 14:30 Comment(0)
F
9

With Python - you can download a package called Simple Salesforce and write SOQL queries to return data

https://github.com/simple-salesforce/simple-salesforce

Here's an example of how to do this:

from simple_salesforce import Salesforce
sf = Salesforce(username='<enter username>', password='<enter password>', 
     security_token = '<enter your access token from your profile>')

a_query= pd.DataFrame(sf.query(
     "SELECT Name, CreatedDate FROM User")['records'])
Franciscofranciska answered 4/9, 2018 at 10:56 Comment(1)
How can I find out the names of all the columns present? Thanks.Hanshansard
S
3

In my case, to display the information as a dataframe I had to use the following code:

# Import libraries
import simple_salesforce as ssf, pandas

# Create the connection
session_id, instance = ssf.SalesforceLogin(username='<username>', password='<password>', security_token='<token>', sandbox=False)
sf_ = ssf.Salesforce(instance=instance, session_id=session_id)

# Query to execute
sql_code = "SELECT id, name FROM main_table"

# Store query result as dataframe
information = sf_.query(query= sql_code)
table = pandas.DataFrame(information['records']).drop(columns='attributes')
Shellacking answered 3/3, 2020 at 10:15 Comment(2)
How do you manage parent fields in your conversion to data frame?Hydrography
I haven't experienced such thing before, I'm afraid I can't help.Pendent
L
2

Adding up to the original answer, the function below is also suitable for simple joins.

def sf_results_to_dataframe(results, drop_index=True) -> pd.DataFrame:

    df = pd.DataFrame(results['records'])
    df.drop('attributes', axis=1, inplace=True)  # clean up from technical info
    df.set_index('Id', drop=drop_index, inplace=True)

    for table in ['Account', 'Contact', 'Lead', 'Opportunity']:
        if table in results['records'][0].keys(): # detect JOIN
            local_keys = list(results['records'][0][table].keys()) # keys from the joined table
            if 'attributes' in local_keys:
                local_keys.remove('attributes')

            global_keys  = [table + key for key in local_keys] # name for the fields in the output table

            # fields of the joined table and the record index
            table_records = [{'Id': record['Id'],
                              **{global_key:record[table][local_key] for global_key, local_key in zip(global_keys, local_keys)}}
                              for record in results['records']]
            df_extra = pd.DataFrame(table_records)
            df_extra.set_index('Id', drop=True, inplace=True) # match index
            df.drop(table, axis=1, inplace=True) # drop duplicated info
            df = df.merge(df_extra, left_index=True, right_index=True) # merge on index

    return df

Example:

import pandas as pd
from simple_salesforce import Salesforce

SALESFORCE_EMAIL = '...'
SALESFORCE_TOKEN = '...'
SALESFORCE_PASSWORD = '...'

sf = Salesforce(username=SALESFORCE_EMAIL, password=SALESFORCE_PASSWORD, security_token=SALESFORCE_TOKEN)

query = """SELECT Id, Name, Account.Name
FROM Contact
LIMIT 1
"""

results = sf.query(query)
df = sf_results_to_dataframe(results)
Lithotomy answered 13/9, 2022 at 15:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.