java.io.FileNotFoundException for valid URL

Asked 8/5, 2010 at 12:16 Answered 8/5, 2010 at 13:29

I use library rome.dev.java.net to fetch RSS.

Code is

URL feedUrl = new URL("http://planet.rubyonrails.ru/xml/rss");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

You can check that http://planet.rubyonrails.ru/xml/rss is valid URL and the page is shown in browser.

But I get exception from my application

java.io.FileNotFoundException: http://planet.rubyonrails.ru/xml/rss
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1311)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:237)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:213)
        at rssdaemonapp.ValidatorThread.run(ValidatorThread.java:32)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

I don't use any proxy. I get this exception on my PC and on the production server and only for this URL, other URLs are working.

Doth answered 8/5, 2010 at 12:16 Comment(0)

The code that is throwing that exception looks like this ... assuming I've got the right version:

if (respCode >= 400) {
    if (respCode == 404 || respCode == 410) {
        throw new FileNotFoundException(url.toString());
    } else {
        throw new java.io.IOException(
            "Server returned HTTP"
            + " response code: " + respCode
            + " for URL: " + url.toString());
    }
}

In other words, when you are doing the GET from Java, you are getting a 404 or 410 response. Now when I do the request using the wget utility, I get a 200 response. So my guess is that the problem is one of the following:

You happened to make the request when they were suffering from some configuration problem.
They have implemented their server to return 404 / 410 for certain User-Agent strings.

Other possibilities are that they are doing some kind of server-side filtering on IP addresses or that there is some DNS problem that is causing your requests to go to a different IP address. But both of these seem to be contradicted by the fact that you can access the feed in your browser.

If this is the User-Agent, take a look at their terms of service to see if they have a banned certain kinds of use of their site / RSS feed.

Hussar answered 8/5, 2010 at 13:17 Comment(1)

I tried to get page using apacha HttpClient and it works! See my answer. – Doth 8/5, 2010 at 13:30

I suspect it doesn't like Java. You need to fake your "User-Agent" header, not sure if it's doable with your RSS library.

Another suggestion is that you fetch the data yourself and feed the data to the feed reader.

Knudsen answered 8/5, 2010 at 13:3 Comment(0)

I tried this code

HttpClient httpClient = new DefaultHttpClient();
HttpGet pageGet = new HttpGet(feedUrl.toURI());
HttpResponse response = httpClient.execute(pageGet);
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(response.getEntity().getContent()));

It works! Thank for your suggestions. Looks like this is about user-agent.

Doth answered 8/5, 2010 at 13:29 Comment(0)

Recommended topics

Hot tags