Importing salesforce report data using python

Asked 4/4, 2014 at 3:56 Answered 20/1, 2021 at 18:21

I am new to sfdc . I have report already created by user . I would like to use python to dump the data of the report into csv/excel file. I see there are couple of python packages for that. But my code gives an error

from simple_salesforce import Salesforce
sf = Salesforce(instance_url='https://cs1.salesforce.com', session_id='')
sf = Salesforce(password='xxxxxx', username='xxxxx', organizationId='xxxxx')

Can i have the basic steps for setting up the API and some example code

Cid answered 4/4, 2014 at 3:56 Comment(0)

This worked for me:

import requests
import csv
from simple_salesforce import Salesforce
import pandas as pd


sf = Salesforce(username=your_username, password=your_password, security_token = your_token)


login_data = {'username': your_username, 'password': your_password_plus_your_token}


with requests.session() as s:
    d = s.get("https://your_instance.salesforce.com/{}?export=1&enc=UTF-8&xf=csv".format(reportid), headers=sf.headers, cookies={'sid': sf.session_id})

d.content will contain a string of comma separated values which you can read with the csv module.

I take the data into pandas from there, hence the function name and import pandas. I removed the rest of the function where it puts the data into a DataFrame, but if you're interested in how that's done let me know.

Hypermeter answered 16/2, 2016 at 15:25 Comment(4)

I know this is pretty old, but I would be very interested in how you moved d.content into a dataframe/CSV file – Adversaria 22/9, 2017 at 15:33

@Rukgo: I now use pypi.python.org/pypi/salesforce-reporting/0.1.3, a dedicated package. SFDC returns 'table-like' data from the csv query, which pandas (and Excel) is smart enough to accommodate, so I just loaded it into pandas straight. – Hypermeter 25/9, 2017 at 11:25

@Odol I looked into that though went with the s.get because it will include headers in the output. – Adversaria 25/9, 2017 at 15:1

I think this way doesn´t work anymore, try this https://mcmap.net/q/1923945/-how-to-download-a-report-as-a-csv-directly-from-salesforce-lightning – Fulllength 26/8, 2019 at 18:37

In case it is helpful, I wanted to write out the steps I used to answer this question now (Aug-2018), based on Obol's comment. For reference, I followed the README instructions at https://github.com/cghall/force-retrieve/blob/master/README.md for the salesforce_reporting package.

To connect to Salesforce:

from salesforce_reporting import Connection, ReportParser

sf = Connection(username='your_username',password='your_password',security_token='your_token')

Then, to get the report I wanted into a Pandas DataFrame:

report = sf.get_report(your_reports_id)
parser = salesforce_reporting.ReportParser(report)
report = parser.records_dict()
report = pd.DataFrame(report)

If you were so inclined, you could also simplify the four lines above into one, like so:

report = pd.DataFrame(salesforce_reporting.ReportParser(sf.get_report(your_reports_id)).records_dict())

One difference I ran into from the README is that sf.get_report('report_id', includeDetails=True) threw an error stating get_report() got an unexpected keyword argument 'includeDetails'. Simply removing it out seemed result in the code working fine.

report can now be exported via report.to_csv('report.csv',index=False), or manipulated directly.

EDIT: parser.records() changed to parser.records_dict(), as this allows the DataFrame to have the columns already listed, rather than indexing them numerically.

Meatus answered 16/8, 2018 at 21:1 Comment(4)

Does this report/response tell you if there are more than 2000 lines ? Or you dont need to worry about this limit since you are getting the complete CSV? Thx – Janessajanet 20/1, 2019 at 16:39

I am hitting the 2k limit, I guess it applies to this solution. – Villada 1/5, 2019 at 9:30

Do you know how to get more than 2k rows? – Fulllength 26/8, 2019 at 16:13

Does anyone know of a solution that gets around the 2k limit? – Cassaundra 8/4, 2022 at 23:32

The code below is rather long and might be just for our use case but the basic idea is the following:

Find out date interval length and additional needed filtering to never run into the "more the 2'000" limit. In my case I could have weekly date range filter but would need to apply some additional filters

Then run it like this:

report_id = '00O4…'
sf = SalesforceReport(user, pass, token, report_id)
it = sf.iterate_over_dates_and_filters(datetime.date(2020,2,1),
     'Invoice__c.InvoiceDate__c', 'Opportunity.CustomField__c', 
     [('a', 'startswith'), ('b', 'startswith'), …])
for row in it:
  # do something with the dict

The iterator goes through every week (if you need daily iterators or monthly then you'd need to change the code, but the change should be minimal) since 2020-02-01 and applies the filter CustomField__c.startswith('a'), then CustomField__c.startswith('b'), … and acts as a generator so you don't need to mess with the filter cycling yourself.

The iterator throws an Exception if there's a query which returns more than 2000 rows, just to be sure that the data is not incomplete.

One warning here: SF has a limit of max 500 queries per hour. Say if you have one year with 52 weeks and 10 additional filters you'd already run into that limit.

Here's the class (relies on simple_salesforce)

import simple_salesforce
import json
import datetime

"""
helper class to iterate over salesforce report data
and manouvering around the 2000 max limit
"""

class SalesforceReport(simple_salesforce.Salesforce):
  def __init__(self, username, password, security_token, report_id):
    super(SalesforceReport, self).__init__(username=username, password=password, security_token=security_token)
    self.report_id = report_id
    self._fetch_describe()

  def _fetch_describe(self):
    url = f'{self.base_url}analytics/reports/{self.report_id}/describe'
    result = self._call_salesforce('GET', url)
    self.filters = dict(result.json()['reportMetadata'])

  def apply_report_filter(self, column, operator, value, replace=True):
    """
    adds/replaces filter, example:
    apply_report_filter('Opportunity.InsertionId__c', 'startsWith', 'hbob').
    For date filters use apply_standard_date_filter.

    column:   needs to correspond to a column in your report, AND the report
              needs to have this filter configured (so in the UI the filter
              can be applied)
    operator: equals, notEqual, lessThan, greaterThan, lessOrEqual,
              greaterOrEqual, contains, notContain, startsWith, includes
              see https://sforce.co/2Tb5SrS for up to date list
    value:    value as a string
    replace:  if set to True, then if there's already a restriction on column
              this restriction will be replaced, otherwise it's added additionally
    """
    filters = self.filters['reportFilters']
    if replace:
      filters = [f for f in filters if not f['column'] == column]
    filters.append(dict(
      column=column, 
      isRunPageEditable=True, 
      operator=operator, 
      value=value))
    self.filters['reportFilters'] = filters

  def apply_standard_date_filter(self, column, startDate, endDate):
    """
    replace date filter. The date filter needs to be available as a filter in the
    UI already

    Example: apply_standard_date_filter('Invoice__c.InvoiceDate__c', d_from, d_to)

    column: needs to correspond to a column in your report
    startDate, endDate: instance of datetime.date
    """
    self.filters['standardDateFilter'] = dict(
      column=column,
      durationValue='CUSTOM',
      startDate=startDate.strftime('%Y-%m-%d'),
      endDate=endDate.strftime('%Y-%m-%d')
    )

  def query_report(self):
    """
    return generator which yields one report row as dict at a time
    """
    url = self.base_url + f"analytics/reports/query"
    result = self._call_salesforce('POST', url, data=json.dumps(dict(reportMetadata=self.filters)))
    r = result.json()
    columns = r['reportMetadata']['detailColumns']
    if not r['allData']:
      raise Exception('got more than 2000 rows! Quitting as data would be incomplete')
    for row in r['factMap']['T!T']['rows']:
      values = []
      for c in row['dataCells']:
        t = type(c['value'])
        if t == str or t == type(None) or t == int:
          values.append(c['value'])
        elif t == dict and 'amount' in c['value']:
          values.append(c['value']['amount'])
        else:
          print(f"don't know how to handle {c}")
          values.append(c['value'])
      yield dict(zip(columns, values))

  def iterate_over_dates_and_filters(self, startDate, date_column, filter_column, filter_tuples):
    """
    return generator which iterates over every week and applies the filters 
    each for column
    """
    date_runner = startDate
    while True:
      print(date_runner)
      self.apply_standard_date_filter(date_column, date_runner, date_runner + datetime.timedelta(days=6))
      for val, op in filter_tuples:
        print(val)
        self.apply_report_filter(filter_column, op, val)
        for row in self.query_report():
          yield row
      date_runner += datetime.timedelta(days=7)
      if date_runner > datetime.date.today():
        break

Weedy answered 26/2, 2020 at 8:15 Comment(1)

Thanks for posting this! Do you know of a way to get more than the 2k limit? – Cassaundra 8/4, 2022 at 1:39

For anyone just trying to download a report into a DataFrame this is how you do it (I added some notes and links for clarifications):

import pandas as pd
import csv
import requests
from io import StringIO
from simple_salesforce import Salesforce

# Input Salesforce credentials:
sf = Salesforce(
    username='[email protected]', 
    password='<password>', 
    security_token='<security_token>') # See below for help with finding token 

# Basic report URL structure:
orgParams = 'https://<INSERT_YOUR_COMPANY_NAME_HERE>.my.salesforce.com/' # you can see this in your Salesforce URL
exportParams = '?isdtp=p1&export=1&enc=UTF-8&xf=csv'

# Downloading the report:
reportId = 'reportId' # You find this in the URL of the report in question between "Report/" and "/view"
reportUrl = orgParams + reportId + exportParams
reportReq = requests.get(reportUrl, headers=sf.headers, cookies={'sid': sf.session_id})
reportData = reportReq.content.decode('utf-8')
reportDf = pd.read_csv(StringIO(reportData))

You can get your token by following the instructions at the bottom of this page

Hundley answered 20/1, 2021 at 18:21 Comment(0)

Recommended topics

Hot tags