How can I retrieve all the child pages in Confluence by a parent page title?
Asked Answered
J

3

5

I know how to retrieve a page in Confluence by its title

res = requests.get(BASE_URL + "/confluence/rest/api/content", params={"title": "parent page title"} , auth=("username", "pass"))

https://developer.atlassian.com/confdev/confluence-rest-api/confluence-rest-api-examples

How can I retrieve all the child pages given a parent page title?

Jeffreyjeffreys answered 29/3, 2016 at 10:20 Comment(0)
B
8

As I see it, you can't search child pages by their parent's title. You need to search with the parent's Id.

Try the following to get the parent id:

/rest/api/content/search?cql=title=<parentTitle>

If you only have the parent's title, you need to send a second call first to get the id from the title

/rest/api/content/search?cql=parent=<parentId>

Id and children cannot be found with /confluence/rest/api/content, so this is not going to work:

res = requests.get(BASE_URL + "/confluence/rest/api/content", params={"parent": "<parentId>"} , auth=("username", "pass"))

res = requests.get(BASE_URL + "/confluence/rest/api/content", params={"title": "parents title"} , auth=("username", "pass"))
Bruno answered 29/3, 2016 at 11:58 Comment(8)
for some reason it still returns all the pages, the parent pages of the parent page too.Jeffreyjeffreys
You are right, my suggestion only works for "/search", which leads to /rest/api/content/search?cql=parent=<parentId>. please try this.Bruno
would you mind taking a look at this #36298269 ?Jeffreyjeffreys
I saw this question. I'm not sure what you want to achieve - only searching or manipulationg content? For searching just use the /rest/api/content/search?cql=parent=<parentId> url, which is powerful and documented.Bruno
are you absolutely sure that /rest/api/content/search?cql=parent=<parentId> returns only the child pages of the parent page? What I want to achieve is written in my question.Jeffreyjeffreys
just test it with a page and only 1 child page. you can paste the search string to your browser and see the results (you have to be logged in first).Bruno
the question was "are you absolutely sure..."?Jeffreyjeffreys
whaaayy, you're awesome!Jeffreyjeffreys
D
1

I wrote a complete solution that uses recursion and does it off space key. Even though the original question was asking about how to do it off of only the title, I thought I'd show how it would all work in a single, compact script.

This script will iterate through an entire space, given its space key, and then print out the title of every page and child page.

import json
import requests
import builtins


class list(list):
    def __init__(self, *args):
        super().__init__(args)

    def print(self):
        for i in self:
            print(f"{i}")

    def append_unique(self, item):
        if item not in self:
            self.append(item)


class Requests:
    def __init__(self, requests_username, requests_secret_file_name, requests_url_root):
        self.session = requests.Session()
        self.session.auth = (requests_username, self.load_password(requests_secret_file_name))
        self.url_root = requests_url_root

    @staticmethod
    def load_password(file_name):
        with open(f"{file_name}") as f: contents = f.read()
        return contents

    def get_top_level_space_content(self, space_key):
        url = f"{self.url_root}/rest/api/content?spaceKey={space_key}"
        response = self.session.get(url)
        return str(response.text)


class Parser:
    def __init__(self, parser_requests):
        self.page_names = list()
        self.page_ids = list()
        self.parser_requests = parser_requests

    def extract_list_of_page_ids(self, content):
        as_json = json.loads(content)
        content_list = dict(as_json).get('results')
        if content_list is None:
            return
        for c_l in content_list:
            if c_l.get('type') == 'page':
                self.page_titles.append_unique(c_l.get("title"))
                self.page_ids.append_unique(c_l.get("id"))
                self.extract_list_of_page_ids(
                    self.parser_requests.session.get(f'{requests.url_root}/rest/api/content/search?cql=parent='
                                                     f'{c_l.get("id")}').text)
        return


if __name__ == "__main__":
    # I wrote this with a .gitignore that ignores *.secret files
    # I recommend using an access token and not a password
    username, secret_file_name, url_root = "username", \
                                           "file_containing_password.secret", \
                                           "https://confluence-wiki-root.com"

    requests = Requests(username, secret_file_name, url_root)

    parser = Parser(requests)

    space_content = parser.parser_requests.get_top_level_space_content("SPACE-KEY")

    parser.extract_list_of_page_ids(space_content)

    parser.page_names.print()
  • NOTE 1: Works with on-prem Confluence 7.7.4
  • NOTE 2: I will be the first to admit, it's not the fastest thing I've ever written, but it was a first pass. Maybe I can optimize later.
Danley answered 24/11, 2021 at 0:29 Comment(0)
S
0

If you are looking for a 1-line code to answer your question

confluence.get_space_content(space_key, depth="all", start=0, limit=100, content_type='page', expand="body.storage")

Note that there is limit of 100 pages to cull per time, so you will have to construct a loop.

See ths link https://atlassian-python-api.readthedocs.io/confluence.html#:~:text=confluence.get_space_content(space_key%2C%20depth%3D%22all%22%2C%20start%3D0%2C%20limit%3D500%2C%20content_type%3DNone%2C%20expand%3D%22body.storage%22) for refeence

Scherzo answered 21/12, 2023 at 0:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.