how to get the cookies from a php curl into a variable
Asked Answered
B

8

145

So some guy at some other company thought it would be awesome if instead of using soap or xml-rpc or rest or any other reasonable communication protocol he just embedded all of his response as cookies in the header.

I need to pull these cookies out as hopefully an array from this curl response. If I have to waste a bunch of my life writing a parser for this I will be very unhappy.

Does anyone know how this can simply be done, preferably without writing anything to a file?

I will be very grateful if anyone can help me out with this.

Brooking answered 21/5, 2009 at 23:31 Comment(0)
B
200
$ch = curl_init('http://www.google.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// get headers too with this line
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
// get cookie
// multi-cookie variant contributed by @Combuster in comments
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches);
$cookies = array();
foreach($matches[1] as $item) {
    parse_str($item, $cookie);
    $cookies = array_merge($cookies, $cookie);
}
var_dump($cookies);
Bant answered 21/5, 2009 at 23:55 Comment(12)
Unfortunately I have a feeling that this is the right answer. I think its ridiculous that curl can't just hand me a mapped array though.Brooking
I'll give it to you but the preg_match was wrong. I didn't just want the session, tho I understand why you would think that. But the genius who made their system is loading the cookie with an entire response map like with a get or post. Shit like this: Set-Cookie: price=1 Set-Cookie: status=accept I needed a preg_match_all with '/^Set-Cookie: (.*?)=(.*?)$/sm'Brooking
No it is, I just haven't been on a computer in a bit :)Brooking
@Brooking Curl cant hand you a mapped array. But shows you a way to save it curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'callback_SaveHeaders');Tanatanach
Depending on the cookie structure returned, the last line might need to be modified to something like parse_str($m[1], $cookies), which will stuff the cookies into an associative array in the $cookies variable....Scholastic
This answer was close to what I needed but I ended up using: preg_match_all('/(?<=Set-Cookie: ).[a-zA-Z0-9_=]+/', $curl_header, $matches)Edaedacious
For the combined fixes that grabs more than one cookie: preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches); $cookies = array(); foreach($matches[1] as $item) { parse_str($item, $cookie); $cookies = array_merge($cookies, $cookie); }Pappy
cookie values are urlencoded. this code does not urldecode() the values. sure it works for a-zA-Z0-9, but you'll get the url encoded values of stuff like @ as "%40", space as "%20", etc :pHawse
If you do not have a semicolon at the end of your cookie, the variable will continue to the end of the headers. You can prevent that by using this: preg_match_all('/^Set-Cookie:\s*([^;\r\n]*)/mi', $header, $matches); Thanks to https://mcmap.net/q/161050/-regular-expression-http-header-for-cookies-while-not-going-over-an-end-of-lineDelarosa
As I've seen this only gets values for Set-Cookies. What happens to cookies that are already sent in the previous request?Mote
@TML: I am unable to get all the cookies using this script/code. I used 'medium.com' to get the details of the cookies, but there are only 2 records, however, if I check in the console, there all so many other cookies set by this website. How to get all the cookies? No luck so far.Showman
WARNINGs: This won't work if CURLOPT_HEADER is set to 0. Regex reads html content too.Misconstruction
G
50

Although this question is quite old, and the accepted response is valid, I find it a bit unconfortable because the content of the HTTP response (HTML, XML, JSON, binary or whatever) becomes mixed with the headers.

I've found a different alternative. CURL provides an option (CURLOPT_HEADERFUNCTION) to set a callback that will be called for each response header line. The function will receive the curl object and a string with the header line.

You can use a code like this (adapted from TML response):

$cookies = Array();
$ch = curl_init('http://www.google.com/');
// Ask for the callback.
curl_setopt($ch, CURLOPT_HEADERFUNCTION, "curlResponseHeaderCallback");
$result = curl_exec($ch);
var_dump($cookies);

function curlResponseHeaderCallback($ch, $headerLine) {
    global $cookies;
    if (preg_match('/^Set-Cookie:\s*([^;]*)/mi', $headerLine, $cookie) == 1)
        $cookies[] = $cookie;
    return strlen($headerLine); // Needed by curl
}

This solution has the drawback of using a global variable, but I guess this is not an issue for short scripts. And you can always use static methods and attributes if curl is being wrapped into a class.

Green answered 2/8, 2014 at 19:50 Comment(4)
Instead of a global, you can use a closure holding a reference to $cookies. $curlResponseHeaderCallback = function ($ch, $headerLine) use (&$cookies) { then curl_setopt($ch, CURLOPT_HEADERFUNCTION, $curlResponseHeaderCallback);.Gismo
What happens if you have this all in a class? How do you reference the class function $class->curlResponseHeaderCallback()? Or do you just have curlResponseHeaderCallback outside of the class?Multitudinous
As said in another comment, better use ([^;\r\n]*) instead of ([^;]*) to avoid newline characters.Frugal
@RobertJohnstone The usual way to handle callbacks in classes works here: curl_setopt($this->ch, CURLOPT_HEADERFUNCTION, [$this, "curlResponseHeaderCallback"]); with $this being the current $class.Avila
F
13

This does it without regexps, but requires the PECL HTTP extension.

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
curl_close($ch);

$headers = http_parse_headers($result);
$cookobjs = Array();
foreach($headers AS $k => $v){
    if (strtolower($k)=="set-cookie"){
        foreach($v AS $k2 => $v2){
            $cookobjs[] = http_parse_cookie($v2);
        }
    }
}

$cookies = Array();
foreach($cookobjs AS $row){
    $cookies[] = $row->cookies;
}

$tmp = Array();
// sort k=>v format
foreach($cookies AS $v){
    foreach ($v  AS $k1 => $v1){
        $tmp[$k1]=$v1;
    }
}

$cookies = $tmp;
print_r($cookies);
Fugleman answered 24/8, 2011 at 16:49 Comment(2)
Thanks for this. A Clear, semantic solution is worth the trouble of installing an extension.Vance
This would be the best solution, if only pecl install actually worked. Grrr.Junitajunius
A
11

If you use CURLOPT_COOKIE_FILE and CURLOPT_COOKIE_JAR curl will read/write the cookies from/to a file. You can, after curl is done with it, read and/or modify it however you want.

Any answered 21/5, 2009 at 23:36 Comment(1)
I think the goal is not to use this fileXenocryst
E
4

libcurl also provides CURLOPT_COOKIELIST which extracts all known cookies. All you need is to make sure the PHP/CURL binding can use it.

Engels answered 6/6, 2009 at 21:26 Comment(1)
It's not usable via PHP API.Rainy
H
2

The accepted answer seems like it will search through the entire response message. This could give you false matches for cookie headers if the word "Set-Cookie" is at the beginning of a line. While it should be fine in most cases. The safer way might be to read through the message from the beginning until the first empty line which indicates the end of the message headers. This is just an alternate solution that should look for the first blank line and then use preg_grep on those lines only to find "Set-Cookie".

    curl_setopt($ch, CURLOPT_HEADER, 1);
    //Return everything
    $res = curl_exec($ch);
    //Split into lines
    $lines = explode("\n", $res);
    $headers = array();
    $body = "";
    foreach($lines as $num => $line){
        $l = str_replace("\r", "", $line);
        //Empty line indicates the start of the message body and end of headers
        if(trim($l) == ""){
            $headers = array_slice($lines, 0, $num);
            $body = $lines[$num + 1];
            //Pull only cookies out of the headers
            $cookies = preg_grep('/^Set-Cookie:/', $headers);
            break;
        }
    }
Hightension answered 28/4, 2016 at 1:33 Comment(1)
The accepted answer seems like it will search through the entire response message. This could give you false matches for cookie headers if the word "Set-Cookie" is at the beginning of a line. While it should be fine in most cases. The safer way might be to read through the message from the beginning until the first empty line which indicates the end of the message headers. This is just an alternate solution that should look for the first blank line and then use preg_grep on those lines only to find "Set-Cookie".Hightension
H
1

someone here may find it useful. hhb_curl_exec2 works pretty much like curl_exec, but arg3 is an array which will be populated with the returned http headers (numeric index), and arg4 is an array which will be populated with the returned cookies ($cookies["expires"]=>"Fri, 06-May-2016 05:58:51 GMT"), and arg5 will be populated with... info about the raw request made by curl.

the downside is that it requires CURLOPT_RETURNTRANSFER to be on, else it error out, and that it will overwrite CURLOPT_STDERR and CURLOPT_VERBOSE, if you were already using them for something else.. (i might fix this later)

example of how to use it:

<?php
header("content-type: text/plain;charset=utf8");
$ch=curl_init();
$headers=array();
$cookies=array();
$debuginfo="";
$body="";
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$body=hhb_curl_exec2($ch,'https://www.youtube.com/',$headers,$cookies,$debuginfo);
var_dump('$cookies:',$cookies,'$headers:',$headers,'$debuginfo:',$debuginfo,'$body:',$body);

and the function itself..

function hhb_curl_exec2($ch, $url, &$returnHeaders = array(), &$returnCookies = array(), &$verboseDebugInfo = "")
{
    $returnHeaders    = array();
    $returnCookies    = array();
    $verboseDebugInfo = "";
    if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
        throw new InvalidArgumentException('$ch must be a curl handle!');
    }
    if (!is_string($url)) {
        throw new InvalidArgumentException('$url must be a string!');
    }
    $verbosefileh = tmpfile();
    $verbosefile  = stream_get_meta_data($verbosefileh);
    $verbosefile  = $verbosefile['uri'];
    curl_setopt($ch, CURLOPT_VERBOSE, 1);
    curl_setopt($ch, CURLOPT_STDERR, $verbosefileh);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    $html             = hhb_curl_exec($ch, $url);
    $verboseDebugInfo = file_get_contents($verbosefile);
    curl_setopt($ch, CURLOPT_STDERR, NULL);
    fclose($verbosefileh);
    unset($verbosefile, $verbosefileh);
    $headers       = array();
    $crlf          = "\x0d\x0a";
    $thepos        = strpos($html, $crlf . $crlf, 0);
    $headersString = substr($html, 0, $thepos);
    $headerArr     = explode($crlf, $headersString);
    $returnHeaders = $headerArr;
    unset($headersString, $headerArr);
    $htmlBody = substr($html, $thepos + 4); //should work on utf8/ascii headers... utf32? not so sure..
    unset($html);
    //I REALLY HOPE THERE EXIST A BETTER WAY TO GET COOKIES.. good grief this looks ugly..
    //at least it's tested and seems to work perfectly...
    $grabCookieName = function($str)
    {
        $ret = "";
        $i   = 0;
        for ($i = 0; $i < strlen($str); ++$i) {
            if ($str[$i] === ' ') {
                continue;
            }
            if ($str[$i] === '=') {
                break;
            }
            $ret .= $str[$i];
        }
        return urldecode($ret);
    };
    foreach ($returnHeaders as $header) {
        //Set-Cookie: crlfcoookielol=crlf+is%0D%0A+and+newline+is+%0D%0A+and+semicolon+is%3B+and+not+sure+what+else
        /*Set-Cookie:ci_spill=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22305d3d67b8016ca9661c3b032d4319df%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%2285.164.158.128%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A109%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F43.0.2357.132+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1436874639%3B%7Dcab1dd09f4eca466660e8a767856d013; expires=Tue, 14-Jul-2015 13:50:39 GMT; path=/
        Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT;
        //Cookie names cannot contain any of the following '=,; \t\r\n\013\014'
        //
        */
        if (stripos($header, "Set-Cookie:") !== 0) {
            continue;
            /**/
        }
        $header = trim(substr($header, strlen("Set-Cookie:")));
        while (strlen($header) > 0) {
            $cookiename                 = $grabCookieName($header);
            $returnCookies[$cookiename] = '';
            $header                     = substr($header, strlen($cookiename) + 1); //also remove the = 
            if (strlen($header) < 1) {
                break;
            }
            ;
            $thepos = strpos($header, ';');
            if ($thepos === false) { //last cookie in this Set-Cookie.
                $returnCookies[$cookiename] = urldecode($header);
                break;
            }
            $returnCookies[$cookiename] = urldecode(substr($header, 0, $thepos));
            $header                     = trim(substr($header, $thepos + 1)); //also remove the ;
        }
    }
    unset($header, $cookiename, $thepos);
    return $htmlBody;
}

function hhb_curl_exec($ch, $url)
{
    static $hhb_curl_domainCache = "";
    //$hhb_curl_domainCache=&$this->hhb_curl_domainCache;
    //$ch=&$this->curlh;
    if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
        throw new InvalidArgumentException('$ch must be a curl handle!');
    }
    if (!is_string($url)) {
        throw new InvalidArgumentException('$url must be a string!');
    }

    $tmpvar = "";
    if (parse_url($url, PHP_URL_HOST) === null) {
        if (substr($url, 0, 1) !== '/') {
            $url = $hhb_curl_domainCache . '/' . $url;
        } else {
            $url = $hhb_curl_domainCache . $url;
        }
    }
    ;

    curl_setopt($ch, CURLOPT_URL, $url);
    $html = curl_exec($ch);
    if (curl_errno($ch)) {
        throw new Exception('Curl error (curl_errno=' . curl_errno($ch) . ') on url ' . var_export($url, true) . ': ' . curl_error($ch));
        // echo 'Curl error: ' . curl_error($ch);
    }
    if ($html === '' && 203 != ($tmpvar = curl_getinfo($ch, CURLINFO_HTTP_CODE)) /*203 is "success, but no output"..*/ ) {
        throw new Exception('Curl returned nothing for ' . var_export($url, true) . ' but HTTP_RESPONSE_CODE was ' . var_export($tmpvar, true));
    }
    ;
    //remember that curl (usually) auto-follows the "Location: " http redirects..
    $hhb_curl_domainCache = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), PHP_URL_HOST);
    return $html;
}
Hawse answered 5/9, 2015 at 17:53 Comment(0)
Y
-1

My understanding is that cookies from curl must be written out to a file (curl -c cookie_file). If you're running curl through PHP's exec or system functions (or anything in that family), you should be able to save the cookies to a file, then open the file and read them in.

Yodel answered 21/5, 2009 at 23:37 Comment(1)
He's almost certainly referring to php.net/curl :)Bant

© 2022 - 2024 — McMap. All rights reserved.