java.util.zip.ZIPException: Not in GZIP format
Asked Answered
R

1

9

I am everything but the most experienced JAVA user, however, I am quite desperate regarding my problem. Every time I execute the below code, I receive the following error:

 java.util.zip.ZipException: Not in GZIP format
 at java.util.zip.GZIPInputStream.readHeader(Unknown Source)
 at java.util.zip.GZIPInputStream.(init)(Unknown Source)
 at java.util.zip.GZIPInputStream.(init)(Unknown Source)
 at DidYouMean.executeGet(DidYouMean.java:56)
 at DidYouMean.didYouMean(DidYouMean.java:11)
 at DidYouMean.main(DidYouMean.java:39)
 Exception in thread "main" java.lang.IllegalArgumentException: String input must not be null....

A friend of mine (using a Mac, instead of me using windows 7 64) is able to execute the program. So it appears not to be a problem of the code itself (which was developed by someone on Github anyways). I would really appreciate any help! My search for a solution has not been very successful, even though the error is not that rare.

import java.io.*;
import java.net.*;
import org.jsoup.*;
import java.util.zip.*;
import org.jsoup.nodes.*;
import org.jsoup.examples.HtmlToPlainText;
public class DidYouMean {
    public static String didYouMean(String s){
        String word="";
        String url="http://www.google.co.in/search?hl=en&q="+URLEncoder.encode(s);
        String html=executeGet(url,"www.google.co.in",'i');
        Document content=Jsoup.parse(html);
        Element submitted=null;
        try{
            submitted=content.getElementById("topstuff").clone();
            HtmlToPlainText h=new HtmlToPlainText();
            word=h.getPlainText(submitted);
            int q,p=word.indexOf("Did you mean:");
            if(p>=0){
                word=word.substring(p+"Did you mean:".length());
                p=word.indexOf("<>");
                if(p>0) word=word.substring(0,p);
                word=word.trim();
            }
            else{
                p=word.indexOf("Showing results for");
                if(p>=0){
                    word=word.substring(p+"Showing results for".length());
                    p=word.indexOf("<>");
                    if(p>0) word=word.substring(0,p);
                    word=word.trim();
                }
                else return "No results";
            }
        }catch(Exception e){e.printStackTrace();}   
        return word;
    }
    public static void main(String args[]){
        System.out.println(didYouMean(args[0]));
    }
    public static String executeGet(String targetURL,String host,char ch){
        URL url;
        HttpURLConnection connection=null;  
        try{
          url=new URL(targetURL);
          connection=(HttpURLConnection)url.openConnection();
          connection.setRequestMethod("GET");
          connection.setRequestProperty("Host",host);
          connection.setRequestProperty("Accept-Encoding", "gzip,deflate,sdch");
          connection.setRequestProperty("Accept-Language","en-US,en;q=0.8");
          if(ch=='c') connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52 Safari/536.5");
          if(ch=='i') connection.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; ShopperReports 3.1.22.0; SRS_IT_E879047EB0765B5336AF90)");
          connection.setUseCaches (false);
          connection.setDoInput(true);
          connection.setDoOutput(true);
          GZIPInputStream gzis=new GZIPInputStream(connection.getInputStream());
          InputStreamReader reader=new InputStreamReader(gzis);
          BufferedReader in=new BufferedReader(reader);
          String line;
          StringBuffer response=new StringBuffer(); 
          while((line=in.readLine())!=null) {
              response.append(line);
              response.append('\r');
          }
          in.close();
          return response.toString();
        } catch (Exception e) {e.printStackTrace();return null;}
    }
}
Reiners answered 22/10, 2012 at 19:50 Comment(12)
The zip is a format for zipping what do you want to zip or unzip?Quash
@RomanC OP is trying to unzip connection.getInputStream()Cybil
Check the response data yourself, e.g. save it to a file. Is it actually gzipped? Are you receiving a different response from your friend? See if it has the appropriate GZIP header.Energid
It is not always that simple. I know it is Wikipedia but see this for a little more info.Fibrin
String url="http://www.google.co.in/search?hl=en&q="+.. Just a tip that Google generally frowns on programmatic use of their resource to deliver advertising.Tia
@MiserableVariable: how do I download the file and then look at the headers?Reiners
You need to look at the response headers before downloading the response message, I guess using getHeaderFieldKey and getHeaderFieldCybil
@MiserableVariable Let OP post here binhex of it. Base64 accepted.Quash
Interestingly, I just tried the program in windows safe mode, and it worked. So, some application (virus scanner, firewall,...) must interfere... any way to circumvent this? or to force gzip request?Reiners
I think the basic issue is that you don't know whether the server will encode gzip, deflate, sdch or not at all. I doubt it has anything to do with the client, beyond specifying acceptable encodingCybil
@MiserableVariable: I receive it completely unencoded. Do I have to readjust my code accordingly? How? Thank you so far, btw!Reiners
In that case you don't need to create a GZIPInputStream at all, create InputStreamReader reader=new InputStreamReader((connection.getInputStream()). But note that you can get an encoded stream later.Cybil
C
3
   connection.setRequestProperty("Accept-Encoding", "gzip,deflate,sdch"

Your request says it is willing to accept any of the following encoding formats: gzip, deflate and sdch. One approach is to look at the response-headers to see what type of encoding the server uses and decode it appropriately.

Another approach is to accept only gzip

Cybil answered 22/10, 2012 at 20:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.