How to export to PDF a confluence page within a script

Asked 4/8, 2016 at 15:0 Answered 24/9, 2019 at 12:41

I'd like to automatically export to PDF some confluence pages.

It can be downloaded with URL :

http://<confluence server>/confluence/spaces/flyingpdf/pdfpageexport.action?pageId=<pageID>

When typing this URL, it works perfectly.

But when I try to download with wget, an HTML page is downloaded instead ( asking for login and password). I tried to provide login/password with --user and --password wget options but it does not work.

Do you have an idea to provide confluence credentials to wget command? Or another solution to download the PDF page?

Shluh answered 4/8, 2016 at 15:0 Comment(0)

If you are using a Confluence Server before Confluence 5.5 you are in luck! Confluence has an API to handle this, see their documentation.

Update : If you are using Confluence Server 5.5 or later, they do not enable the API for this by default. See Confluence Administration > Further Configuration to enable the XML-RPC and SOAP APIs. (Thanks @fatpanther for pointing this out)

The new REST API does not support this, see the REST API documentation.

You may be able to use the Confluence Command Line Interface to export to PDF.

Ferriage answered 5/8, 2016 at 13:34 Comment(2)

Actually, it seems that the XML-RPC and SOAP APIs are deprecated in versions > 5.5, but still available for use. You have to enable them first: (Confluence Administration > Further Configuration) – Immoralist 20/1, 2017 at 15:49

Thanks @fatpanther, I have updated the answer to include the bit you provided about enabling the XML-RPC/SOAP APIs – Ferriage 26/1, 2017 at 22:48

First request the resource:

curl -D- -u user:pwd -X GET -H "Content-Type: application/json" "https://your-url/confluence/spaces/flyingpdf/pdfpageexport.action?pageId=12345678"

Extract the "Location" value from the resulting JSON (e.g. grep | cut), then repeat the query with adjusted URL and mime type:

curl -D- -u user:pwd -X GET -H "Content-Type: text/html;charset=UTF-8" "https://your-url/$LOCATION_JUST_EXTRACTED" --output file.pdf

Mudra answered 14/5, 2018 at 10:54 Comment(0)

Narcolessico 's answer worked for me, but it took me some time to completely understand the approach. I will add to the answer provided above.

NOTE: I am using Java (Apache HttpClient) to perform the HTTP GET requests to the Confluence server.

I used Chrome to navigate to the Confluence page I wanted to export to PDF. I expanded the tools menu, right-clicked on 'Export to PDF', and then clicked on 'Inspect'. This will reveal the underlying HTML element for this menu option containing the link used to launch the PDF export operation.

inspect element to find url

The element inspection revealed the relative link to the PDF export action as follows.

html source

From Java, if you perform a HTTP GET to https://your-confluence-server-hostname/the-relative-link-from-step-2, you will need to disable redirect handling. This is where Narcolessico's answer confused me as I was getting different responses from cURL vs. Java. When I realized that the cURL operation was returning a 302 response and that the Apache Http client was auto handling it, I found a means to disable that auto redirect handling so that I can capture the Location header information.

The code to disable the auto redirect handling is as follows.

    final HttpClient client = HttpClientBuilder
        .create()
        .setSSLContext(sslContext)
        .disableRedirectHandling() // disable the auto handling here
        .build();

    final String urlToGetLocation = "https://<your-confluence-server-hostname><the-relative-link-from-step-2>"

    final HttpGet request = new HttpGet(urlToGetLocation);
    // You'll need to provide Basic Auth credentials. This is a base-64 encoded
    // username:password string, else the Location header returned will be a 
    // redirect to the login page.
    request.setHeader(HttpHeaders.AUTHORIZATION, authorizationHeaderValue);
    request.setHeader(HttpHeaders.CONTENT_TYPE, "application/json");

    final HttpResponse response = client.execute(request);

    final HttpEntity payload = response.getEntity();

NOTE: I am also overriding the SSL context to do nothing. That is another issue you may need to contend with if Confluence is using HTTPs.

On a side note, if you were to perform a CURL GET for the above stated url, you get a response as follows.

redacted cURL output

The above GET request and resulting 302 response, will reveal the location of the PDF document that you can then download. The 302 response headers will contain the following.
```
final Header[] headers = response.getHeaders(HttpHeaders.LOCATION);

final String location = headers[0].getValue();
```

This is a url in the form of the following.

/download/temp/pdfexport-20190924-240919-0526-189/a-filename-for-pdf.pdf?contentType=application/pdf

The Location header above contain the url to the exported/generated PDF. You can then make a subsequent HTTP GET to that url to download the generated PDF document. if you're using the Apache Http client, you'll need to use auto redirect handling for this subsequent GET request.

All credit to Narcolessico for this answer. I simply wanted to add the details I had to sort out to get it to work from Java.

Algae answered 24/9, 2019 at 12:41 Comment(1)

Do you have a working script trying to download page as html or pdf using python – Rheotaxis 28/6, 2022 at 17:31

Recommended topics

Hot tags