I've been running two socket clients side-by-side, collecting http streaming data (not Twitter, but similar kind of thing). The data comes through in chunked encoding.
One of the clients is curl (on the commandline, not php-curl), where both http and https work fine. The other is my own PHP script, using fsockopen
and fgets
. Works fine for https, but I have a specific problem with http. How specific? It only happens if the stream goes quiet for 60 seconds. If there is only 50 seconds of quiet it works fine. I've been comparing curl's http headers sent and received to my script, and have removed all differences. I thought I knew everything there was to know about PHP sockets, and especially chunked encoding, but it is time to eat humble pie as this one has me stumped.
So, running curl with "--trace - --trace-time", I see this come through for the first packet after the 60 second quiet period:
05:56:57.025023 <= Recv data, 136 bytes (0x88)
0000: 38 32 0d 0a 7b 22 64 61 74 61 66 65 65 64 22 3a 82..{"datafeed":
0010: 22 64 65 6d 6f 2e 31 64 36 2e 31 6d 2e 72 61 6e "demo.1d6.1m.ran
...
0080: 34 22 7d 5d 7d 0a 0d 0a 4"}]}...
The 82 is hex for size of the chunk. The \r\n marks the end of the chunk size line. The chunk starts at the "{".
Over on the PHP side, my loop starts like this:
while(true){
if(feof($fp)){fclose($fp);return "Remote server has closed\n";}
$chunk_info=trim(fgets($fp)); //First line is hex digits giving us the length
$len=hexdec($chunk_info); //$len includes the \r\n at the end of the chunk (despite what wikipedia says)
With https, or with a less than 60 second gap, this works fine, $len is 100 or whatever the chunk size is. But, after that 60 second gap, what I get in $chunk_info is:
datafeed":"demo.1d6.1m.ran...
So, I seem to have lost the first six bytes: 38 32 0d 0a 7b 22
All subsequent chunks are fine, and exactly the same as what curl is receiving.
Version Details
curl 7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15 Protocols: tftp ftp telnet dict ldap ldaps http file https ftps Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz
PHP 5.3.2-1ubuntu4.18 with Suhosin-Patch (cli) (built: Sep 12 2012 19:12:47)
Server: Apache/2.2.14 (Ubuntu)
(I've only tested with localhost connections so far.)
The rest of the loop looks like this:
$s='';
$len+=2; //For the \r\n at the end of the chunk
while(!feof($fp)){
$s.=fread($fp,$len-strlen($s));
if(strlen($s)>=$len)break; //TODO: Can never be >$len, only ==$len??
}
$s=substr($s,0,-2);
if(!$s)continue;
$d=json_decode($s);
//Do something with $d here
}
(Aside: in the way I've tested so far, the code has passed through this loop exactly once, before the 60 second quiet period.)
NOTE: I have numerous workarounds to get things working: e.g. enforce use of https, or use curl-php. This question is because I want to know what is going on, know what is changing after 60 seconds, and learn how to stop it happening. And maybe learn a new troubleshooting idea. Think of it as bloody-minded intellectual curiosity :-)