get list of files in a sharepoint directory using python
Asked Answered
D

4

19

I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?

Deglutition answered 25/5, 2018 at 14:25 Comment(3)
I'm using 'requests' module to send get a request to the server. please suggest a better module to get the list of documents in a subfolder, given folder and subfolder names in the server.Deglutition
I tried with URL : server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=Title but no use.Deglutition
You can do that by using the simple http.server provided by the python lib it automatically list all the content on the current directoryPiccaninny
J
23

Posting in case anyone else comes across this issue of getting files from a SharePoint folder from just the folder path. This link really helped me do this: https://github.com/vgrem/Office365-REST-Python-Client/issues/98. I found so much info about doing this for HTTP but not in Python so hopefully someone else needs more Python reference. I am assuming you are all setup with client_id and client_secret with the Sharepoint API. If not you can use this for reference: https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs

I basically wanted to grab the names/relative urls of the files within a folder and then get the most recent file in the folder and put into a dataframe. I'm sure this isn't the "Pythonic" way to do this but it works which is good enough for me.

!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd


sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()

#if you want to get the folders within <sub_folder> 
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
    print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))

#if you want to get the files in the folder        
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()

#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
    #use mod_time to get in better date format
    mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')  
    #create a dict of all of the info to add into dataframe and then append to dataframe
    my_dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
    df_files = df_files.append(my_dict, ignore_index= True )

    #print statements if needed
    # print("File name: {0}".format(myfile.properties["Name"]))
    # print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
    # print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']

# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
    # save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)  # set file object to start
    # load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)

Here is another helpful link of the file properties you can view: https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15). Scroll down to file properties section.

Hopefully this is helpful to someone. Again, I am not a pro and most of the time I need things to be a bit more explicit and written out. Maybe others feel that way too.

Juneberry answered 22/9, 2021 at 23:14 Comment(3)
I think this line: client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret']) should be replaced with: client_credentials = ClientCredential(['client_id'], ['client_secret'])Contradiction
Wow that was super helpful!! Thanks!Crosson
sites/<my_site> - what does my_site here refer to ?Liam
I
0

I have a url for sharepoint directory

Assuming you asking about a library, you can use SharePoint's REST API and make a web service call to:

https://yourServer/sites/yourSite/_api/web/lists/getbytitle('Documents')/items?$select=Title

This will return a list of documents at: https://yourServer/sites/yourSite/Documents

See: https://msdn.microsoft.com/en-us/library/office/dn531433.aspx

You will of course need the appropriate permissions / credentials to access that library.

Index answered 27/5, 2018 at 2:43 Comment(1)
Hi Mike, Thanks for the response, I have many folders on the server and each folder has subfolders in them and each subfolder will have files in them. I want to list out the file names and their metadata, given the folder name and subfolder name. I tried URL below but it resulted in the error - 'Max retries exceeded with URL:. server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=TitleDeglutition
L
0

You need to do 2 things here.

  1. Get a list of files (which can be directories or simple files) in the directory of your interest.
  2. Loop over each item in this list of files and check if the item is a file or a directory. For each directory do the same as step 1 and 2.

You can find more documentation at https://learn.microsoft.com/en-us/sharepoint/dev/sp-add-ins/working-with-folders-and-files-with-rest#working-with-files-attached-to-list-items-by-using-rest

def getFilesList(directoryName):
    ...
    return filesList

# This will tell you if the item is a file or a directory.
def isDirectory(item):
    ...
    return true/false

Hope this helps.

Latinism answered 5/6, 2018 at 11:6 Comment(0)
U
0

You can not use "server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=Title" as URL in SharePoint REST API.

The URL structure should be like below considering WebSiteURL is the URL of site/subsite containing document library from which you are trying to get files and Documents is the Display name of document library:

WebSiteURL/_api/web/lists/getbytitle('Documents')/items?$select=Title

And if you want to list metadata field values you should add Field names separated by comma in $select.

Quick tip: If you are not sure about the REST API URL formation. Try pasting the URL in Chrome browser (you must be logged in to SharePoint site with appropriate permissions) and see if you get proper result as XML if you are successful then update the REST URL and run the code. This way you will save time of running your python code.

Urian answered 7/6, 2018 at 11:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.