I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?
Posting in case anyone else comes across this issue of getting files from a SharePoint folder from just the folder path. This link really helped me do this: https://github.com/vgrem/Office365-REST-Python-Client/issues/98. I found so much info about doing this for HTTP but not in Python so hopefully someone else needs more Python reference. I am assuming you are all setup with client_id and client_secret with the Sharepoint API. If not you can use this for reference: https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs
I basically wanted to grab the names/relative urls of the files within a folder and then get the most recent file in the folder and put into a dataframe. I'm sure this isn't the "Pythonic" way to do this but it works which is good enough for me.
!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()
#if you want to get the folders within <sub_folder>
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
#if you want to get the files in the folder
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()
#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
#use mod_time to get in better date format
mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')
#create a dict of all of the info to add into dataframe and then append to dataframe
my_dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
df_files = df_files.append(my_dict, ignore_index= True )
#print statements if needed
# print("File name: {0}".format(myfile.properties["Name"]))
# print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
# print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']
# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
# save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) # set file object to start
# load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)
Here is another helpful link of the file properties you can view: https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15). Scroll down to file properties section.
Hopefully this is helpful to someone. Again, I am not a pro and most of the time I need things to be a bit more explicit and written out. Maybe others feel that way too.
sites/<my_site>
- what does my_site here refer to ? –
Liam I have a url for sharepoint directory
Assuming you asking about a library, you can use SharePoint's REST API and make a web service call to:
https://yourServer/sites/yourSite/_api/web/lists/getbytitle('Documents')/items?$select=Title
This will return a list of documents at: https://yourServer/sites/yourSite/Documents
See: https://msdn.microsoft.com/en-us/library/office/dn531433.aspx
You will of course need the appropriate permissions / credentials to access that library.
You need to do 2 things here.
- Get a list of files (which can be directories or simple files) in the directory of your interest.
- Loop over each item in this list of files and check if the item is a file or a directory. For each directory do the same as step 1 and 2.
You can find more documentation at https://learn.microsoft.com/en-us/sharepoint/dev/sp-add-ins/working-with-folders-and-files-with-rest#working-with-files-attached-to-list-items-by-using-rest
def getFilesList(directoryName):
...
return filesList
# This will tell you if the item is a file or a directory.
def isDirectory(item):
...
return true/false
Hope this helps.
You can not use "server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=Title" as URL in SharePoint REST API.
The URL structure should be like below considering WebSiteURL is the URL of site/subsite containing document library from which you are trying to get files and Documents is the Display name of document library:
WebSiteURL/_api/web/lists/getbytitle('Documents')/items?$select=Title
And if you want to list metadata field values you should add Field names separated by comma in $select.
Quick tip: If you are not sure about the REST API URL formation. Try pasting the URL in Chrome browser (you must be logged in to SharePoint site with appropriate permissions) and see if you get proper result as XML if you are successful then update the REST URL and run the code. This way you will save time of running your python code.
© 2022 - 2024 — McMap. All rights reserved.