Generate PDF Behind Authentication Wall
Asked Answered
D

4

13

I'm trying to generate a PDF using WKHTMLTOPDF that requires me to first log in. There's some on this on the internet already but I can't seem to get mine working. I'm in Terminal - nothing fancy.

I've tried (among a whole lot of other stuff):

/usr/bin/wkhtmltopdf --post username=myusername --post password=mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --username myusername --password mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --cookie-jar my.jar --post username=myusername --post password=mypassword "URL to Generate Cookie For"

username and password are both the id and the name of the input fields on the form. I am getting the my.jar file to show up, but nothing is written to it.

Specific questions:

  1. Should I be specifying the login page and/or form action anywhere?
  2. the --cookie-jar parameter has been mentioned in various places (both as being needed and otherwise). Should that be necessary, how does it work? I've created the my.jar file but how do I use it again? Referencing:

http://code.google.com/p/wkhtmltopdf/issues/detail?id=356


EDIT:

Surely someone has done this successfully? A good way to showcase an example might if someone is willing to get it to work on some popular website that requires login credentials to eliminate a potential variable.

Devisor answered 23/4, 2012 at 19:59 Comment(0)
D
9

I think the form I'm trying to log in to is too complex. It's secure, sets three cookies, redirects twice, and posts a number of other variables outside of the username and password, one of which requires a cookie value (I even tried concatenating the value into the post variable, but no luck). This is probably a pretty rare issue - by no means the fault of WKHTMLTOPDF.

I wound up using CURL to log in and write the page to a local file, then ran WKHTMLTOPDF against that. Definitely a solid work around for anyone else having a similar issue.


Edit: CURL, if interested:

curl_setopt($ch, CURLOPT_HEADER, 1); # Change to 1 to see WTF
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
Devisor answered 2/5, 2012 at 19:15 Comment(2)
Would have been much more helpful to outline this cURL syntax which did the trick for you.Alkalify
Hi Ifedi, not sure my specific implementation actually will be helpful for your use case (it's the post string that's specific to my needs, and implemented via PHP) but I added it, so hopefully it helps.Devisor
L
11

Every login form will be different for every site. What you're going to want to do is determine what all you need to pass in to that login form's target by reading the HTML on the page (which you're probably aware of). It may take an additional hidden field on top of the username/password fields to prevent cross site request forgeries.

The cookie jar parameter is a file that it stores the cookies it gets back from the webserver in. You need to specify it in the first request to the login form, and in subsequent requests to continue to use the cookie/session information that the webserver will have given you back after logging in.

So to sum it up:

  1. Look and see if there are any additional parameters on the page required.
  2. Make sure the URL you are submitting to is the same as the ACTION attribute of the form element on that page.
  3. Use the --cookie-jar parameter in both the login request and the second content request.
  4. The syntax for the --post parameters are --post username user_name_value --post password password_value
Legwork answered 1/5, 2012 at 15:49 Comment(2)
Thanks, hsanders. Even though I wound up taking another route your answer looks solid. Thanks for taking the time to reply!Devisor
@Devisor No problem. I've used wkhtmltopdf a couple of times before. I think for a more complicated case, like the one you described it's a bit of a pain to use... I'm not sure how it would deal with the redirects you mentioned in your followup, never had to deal with that.Legwork
D
9

I think the form I'm trying to log in to is too complex. It's secure, sets three cookies, redirects twice, and posts a number of other variables outside of the username and password, one of which requires a cookie value (I even tried concatenating the value into the post variable, but no luck). This is probably a pretty rare issue - by no means the fault of WKHTMLTOPDF.

I wound up using CURL to log in and write the page to a local file, then ran WKHTMLTOPDF against that. Definitely a solid work around for anyone else having a similar issue.


Edit: CURL, if interested:

curl_setopt($ch, CURLOPT_HEADER, 1); # Change to 1 to see WTF
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
Devisor answered 2/5, 2012 at 19:15 Comment(2)
Would have been much more helpful to outline this cURL syntax which did the trick for you.Alkalify
Hi Ifedi, not sure my specific implementation actually will be helpful for your use case (it's the post string that's specific to my needs, and implemented via PHP) but I added it, so hopefully it helps.Devisor
P
4

You might be interested in trying to render to PDF with phantomjs.

phantomjs rasterize.js http://blah.com/ webgl.pdf

You can find rasterize.js here. Basically, you write some javascript to log in on the login page, then you do the PDF creation.

However, the output is not the same as wkhtmltopdf. You could just save the HTML to a file, and then render with wkhtmltopdf if the phantomjs PDF output is too awful.

Portemonnaie answered 3/5, 2012 at 3:36 Comment(0)
F
0

I just got it working in the Terminal! Logging in on a Wordpress website and once logged in render a PDF from the webpage. You need to find ALL the input fields on the login page, also the hidden ones. You can find them in Firefox, right click into the field and > inspect

Assume our login page is https://www.mywebsite.com/login/ There were 2 visible input fields here

<input type="text" id="user_login" name="log" value="">
<input type="password" id="user_pass" name="pwd" value="">

Than look for the submit button

<input type="submit" id="wp-submit" name="wp-submit" class="button-primary mepr-share-button" value="Log In">

Underneath there were 3 more HIDDEN fields

<input type="hidden" name="mepr_process_login_form" value="true">
<input type="hidden" name="mepr_is_login_page" value="true">
<input type="hidden" name="redirect_to" value="https://www.mywebsite.com/members/">

So now we can POST these values, no need for the redeirect_to

wkhtmltoimage --cookie-jar my.jar --post log insertLoginHere --post pwd insertPasswordHere --post mepr_process_login_form true --post mepr_is_login_page true --disable-javascript https://www.mywebsite.com/login/ dummy.jpg

The dummy image shows me I am logged in and my credentials are written in my.jar So now I can happily render the page(s) to PDF's, whilst logged in

wkhtmltopdf --cookie-jar my.jar --disable-javascript --print-media-type https://www.mywebsite.com/mymembercontenturl/ members.pdf

Fantastic!

Fosdick answered 15/12, 2022 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.