403 error while getting the google result using jsoup [duplicate]
Asked Answered
A

6

8

I'm trying to get Google results using the following code:

Document doc = con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

But I get this exception:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403,URL=http://www.google.com/search?q=lakshman

A 403 error means the server is forbidding access, but I can load this URL in a web browser just fine. Why does Jsoup get a 403 error?

Apocynaceous answered 22/1, 2013 at 20:31 Comment(9)
It's probably the absence of a USER_AGENT header that triggers the 403. I think this is against Google's TOS in any caseMaser
oh.thanks for the warning.then is there a way to get the google result by automating?Apocynaceous
I think they used to have a search API, but I'm not sure what the status isMaser
You can set user-agent using jsoup: #6582155Pincenez
#10121349Publicize
@Apocynaceous can u demonstrate your solution .I got this problem too.Quechuan
@Vito: Solution is to add the user agent property as mentioned by Liang or use the search API.Apocynaceous
.@Apocynaceous I have added thisuserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36") or modified to my own browser version .The 403 error still exists .Quechuan
@Quechuan userAgent("Mozilla") worked and below 2 options didn't work //final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"; //final String USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36";Deutschland
S
38

You just need to add the UserAgent property to HTTP header as follows:

Jsoup.connect(itemUrl)
     .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
     .get()
Synthesize answered 18/3, 2014 at 2:44 Comment(3)
userAgent("Mozilla") worked and below 2 options didn't work //final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"; //final String USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36";Deutschland
Interestingly, in my case the user agent string in the answer didn't resolve the problem but my browser's actual user agent string, Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/116.0, did.For
Not working in 2023, please check #77227673Deutschland
C
6

Google doesn't allow robots, you couldn't use jsoup to connect google. You can use the Google Web Search API (Deprecated) but the number of requests you may make per day will be limited.

Carter answered 16/12, 2013 at 22:1 Comment(0)
I
3

Actually, you can evade 403 error by just adding a user-agent

doc = Jsoup.connect(url).timeout(timeout)
                    .userAgent("Mozilla")

But that is against the google policy I think.

EDIT: Google catches robots quicker than you think. You can however, use this as a temporary solution.

Ics answered 10/2, 2014 at 22:10 Comment(1)
Not working in 2023, please check #77227673Deutschland
A
1

Replace statement

Document doc =con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

with statement

Document doc=Jsoup.connect("http://www.google.com/search?q=lakshman").userAgent("Chrome").get();
Augury answered 29/4, 2014 at 20:0 Comment(0)
I
1

try this:

Document doc =con.connect("http://www.google.com/search?q=lakshman").ignoreHttpErrors(true).timeout(5000).get();

in case userAgent did not work Just like it didn't for me.

Issuant answered 23/8, 2016 at 23:11 Comment(1)
Not working in 2023, please check #77227673Deutschland
L
1

In some cases you need to set a referrer. It helped in my case.

The full source here

    try{

        String strText = 
                Jsoup
                .connect("http://www.whatismyreferer.com")
                .referrer("http://www.google.com")
                .get()
                .text();

        System.out.println(strText);

    }catch(IOException ioe){
        System.out.println("Exception: " + ioe);
    }
Loraine answered 29/1, 2018 at 21:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.