How to get number of result pages for the data fetched from Github API for a request?
Asked Answered
W

3

9

I am not using CURL and only using jQuery, AJAX and JS for getting information from Github API. I am using a URL like this to get information about issues- https://api.github.com/repos/jquery/jquery/issues

But the result comes in multiple pages since Github API uses pagination feature. When using CURL we get to know about the header information in which the number of result pages are also shown but I am not using CURL and directly requesting data from above API url using jQuery and AJAX so I am unable to get the header information for above URL. I want to count the number of open and closed issues and open and closed PRs using the above URL for jquery/jquery repository and some other repositoris as well but since there is a lot of issues for some repositories, I am getting result in multiple pages.

I know about the "page" and "per_page" GET parameter that can be passed through the URL to get that result page and to display a number of results( e.g - 100) per page like this- https://api.github.com/repos/jquery/jquery/issues?page=5&per_page=100

I don't want to check the number of result pages manually. I want my script to get the number of result pages information automatically so that I can create a loop and iterate through all the pages to get information about all the issues.

e.g. if I get to know that the number of result pages are 8 then I can create a loop like this to get information about all the issues from all the result pages-

var number_of_pages=8;
var issues_information;
for(var nof=1; nof<=number_of_result_pages;nof++){
    var URL='https://api.github.com/repos/jquery/jquery/issues?page='+nof+'&per_page=100';
    $.getJSON(URL, function(json)){
        issues_information=json;
    }
}

Where "issues_information" will get JSON data that is fetched from Github API. But I am unable to get the count of result pages for a particular API call.

Can anybody tell me how to get number of result pages from Github API for a request? Please give an example code, URL format etc.

Waitabit answered 8/4, 2015 at 14:57 Comment(0)
C
7

From the docs:

Information about pagination is provided in the Link header of an API call. For example, let's make a curl request to the search API, to find out how many times Mozilla projects use the phrase addClass:

curl -I "https://api.github.com/search/code?q=addClass+user:mozilla" The -I

parameter indicates that we only care about the headers, not the actual content. In examining the result, you'll notice some information in the Link header that looks like this:

Link:
<https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>;
rel="next",  
<https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>;
rel="last" 

Let's break that down. rel="next" says that the next page is page=2. This makes sense, since by default, all paginated queries start at page 1. rel="last" provides some more information, stating that the last page of results is on page 34. Thus, we have 33 more pages of information about addClass that we can consume. Nice!

So to iterate overall the pages, just keep requesting pages until there is no "next" in the link header.

Here is some python code showing the logic:

params = {'page': 1, 'per_page':100}
another_page = True
api = GH_API_URL+'orgs/'+org['login']+'/teams'
while another_page: #the list of teams is paginated
    r = requests.get(api, params=params, auth=(username, password))
    json_response = json.loads(r.text)
    results.append(json_response)
    if 'next' in r.links: #check if there is another page of organisations
        api = r.links['next']['url']
    else:
        another_page=False
Conscientious answered 1/3, 2017 at 11:11 Comment(0)
D
0

In case anyone else comes across this, the answer above has a bug. It should be as follows. Basically the page is hard coded to be 1 so it will loop infinitely

params = {'per_page':100}
another_page = True
api = GH_API_URL+'orgs/'+org['login']+'/teams'
while another_page: #the list of teams is paginated
    r = requests.get(api, params=params, auth=(username, password))
    json_response = json.loads(r.text)
    results.append(json_response)
    if 'next' in r.links: #check if there is another page of organisations
        api = r.links['next']['url']
    else:
        another_page=False
Denning answered 29/7, 2022 at 13:51 Comment(0)
C
0

Here's a JS-based solution for getting the Link header and parsing it into an object, which will have the shape { self: string, first?: string, prev?: string, next?: string, last?: string }.

async function rels(href, currentPage = 1) {
    const url = new URL(href)
    url.searchParams.set('page', currentPage)

    const res = await fetch(url, { method: 'HEAD' })
    const link = res.headers.get('Link')

    return {
        self: url.href,
        ...Object.fromEntries(
            [...link.matchAll(
                /<(?<href>[^>]+)>[^,]*\brel=["']?(?<rel>\w+)[^,]*/g,
            )]
                .map(({ groups: { href, rel } }) => [rel, href]),
        ),
    }
}

async function main() {
    const denoRels = await rels('https://api.github.com/repos/denoland/deno/issues')
    const tsRels = await rels('https://api.github.com/repos/microsoft/TypeScript/issues', 5)
    
    console.log({ denoRels, tsRels })
}

main()
Confrere answered 25/7, 2023 at 12:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.