Authenticating on ADFS with Python script
Asked Answered
S

2

7

I need to parse site, which is hidden by ADFS service.

and struggling with authentication to it.

Is there any options to get in?

what i can see, most of solutions for backend applications, or for "system users"(with app_id, app_secret). in my case, i can't use it, only login and password.

example of problem: in chrome I open www.example.com and it redirects me to to https://login.microsoftonline.com/ and then to https://federation-sts.example.com/adfs/ls/?blabla with login and password form.

and how to get access into it with python3?

Scarecrow answered 27/2, 2019 at 9:59 Comment(0)
D
9

ADFS uses complicated redirection and CSRF protection techniques. Thus, it is better to use a browser automation tool to perform the authentication and parse the webpage afterwards. I recommend the selenium toolkit with python bindings. Here is a working example:

from selenium import webdriver
def MS_login(usrname, passwd):  # call this with username and password
    driver = webdriver.Edge()   # change to your browser (supporting Firefox, Chrome, ...)
    driver.delete_all_cookies() # clean up the prior login sessions
    driver.get('https://login.microsoftonline.com/') # change the url to your website
    time.sleep(5) # wait for redirection and rendering

    driver.find_element_by_xpath("//input[@name='loginfmt'").send_keys(usrname)
    driver.find_element_by_xpath("//input[@type='submit']").click()
    time.sleep(5)

    driver.find_element_by_xpath("//input[@name='passwd'").send_keys(passwd)
    driver.find_element_by_xpath("//input[@name='KMSI' and @type='checkbox'").click()
    driver.find_element_by_xpath("//input[@type='submit']").click()
    time.sleep(5)

    driver.find_element_by_xpath("//input[@type='submit']").click()

    # Successfully login

    # parse the site ...

    driver.close() # close the browser
    return driver

This script calls Microsoft Edge to open the website. It injects the username and password to the correct DOM elements and then let the browser to handle the rest. It has been tested on the webpage "https://login.microsoftonline.com". You may need to modify it to suit your website.

Dobrinsky answered 7/3, 2019 at 8:28 Comment(5)
as i know, selenium will need more dependencies on host. So i can't run it on AWS Lambda, for example.Scarecrow
To my surprise, actually you can run selenium and headless Chrome in AWS Lambda. After googling I found several tutorial articles [1, 2, 3]. Is this your target environment?Dobrinsky
that makes sense! and it looks better in that way. But, i'm surprised that there is no solution with only sessions...Scarecrow
@Scarecrow There IS a solution with only requests, but it requires you to learn all the endpoints and do whatever request preparation the UI is doing (which probably isn't trivial, like adding CSRF tokens to the headers and whatnot). Using browser automation is WAY more simple. No one is going to code a CSRF workaround for you on StackOverflow, because that would require opening a browser and parsing their JS and request headers to figure out how to set the information the browser will JUST DO for you.Radioscope
@julian, its true, i agree... i thought that there is already a solution for microsoft stuff.Scarecrow
Q
2

To Answer your question "How to Get in with python" i am assuming you want perform some web scraping operation on the pages which is secured by Azure AD authentication.

In these kind of scenario, you have to do the following steps.

  1. For this script we will only need to import the following:

    import requests from lxml import html

First, we would like to create our session object. This object will allow us to persist the login session across all our requests.

session_requests = requests.session()

Second, we would like to extract the csrf token from the web page, this token is used during login. For this example we are using lxml and xpath, we could have used regular expression or any other method that will extract this data.

login_url = "https://bitbucket.org/account/signin/?next=/"
result = session_requests.get(login_url)

tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='csrfmiddlewaretoken']/@value")))[0]

Next, we would like to perform the login phase. In this phase, we send a POST request to the login url. We use the payload that we created in the previous step as the data. We also use a header for the request and add a referer key to it for the same url.

result = session_requests.post(
    login_url, 
    data = payload, 
    headers = dict(referer=login_url)
)

Payload would be a dictionary object of user name and password etc.

payload = {
    "username": "<USER NAME>", 
    "password": "<PASSWORD>", 
    "csrfmiddlewaretoken": "<CSRF_TOKEN>"
}

Note:- This is just an example.

Step 2:

Scrape content

Now, that we were able to successfully login, we will perform the actual scraping

url = 'https://bitbucket.org/dashboard/overview'
result = session_requests.get(
    url, 
    headers = dict(referer = url)
)

So in other words, you need to get the request details payload from Azure AD and then create a session object using logged in method and then finally do the scraping.

Here is a very good example of Web scraping of a secured website.

Hope it helps.

Quandary answered 5/3, 2019 at 11:1 Comment(4)
thanks, but... in case of ADFS, there is redirecting to https://login.microsoftonline.com/e0793d39-0939-496d-b129-198edd916feb/wsfed?blabla where e0793d39-0939-496d-b129-198edd916feb generates every request.. next it again redirects to https://federation-sts.example.com/adfs/ls/blabla... is it possible to ask for needed page, and get final page, where login and password asked? or i need to know that url?Scarecrow
and, as i see, there are a lot of fields, which looks like token... but i can't say for sure what is needed. is there a solution, to grab all and pass it all too to requested resource?Scarecrow
you can anyway trigger a first sign in page and can get the final login window URL, so you would be able to get the URL in case if it changes every time due to some change of setting.Quandary
Usually for any Azure AD authentication you need to send some header in your request which are client _id,redirect_uri,response_type,scopeand finally the url which looks like login.microsoftonline.com/common/oauth2/authorize.Quandary

© 2022 - 2024 — McMap. All rights reserved.