UrlFetchApp.fetch() doesn't seem to change user agent
Asked Answered
S

2

2

Trying to grab data from a website using Google Apps Script to put it directly into a spreadsheet. The fetch does not seem to be working, where the Python requests equivalent works just fine.

Python code:

page = requests.get("someurl?as_data_structure", headers={'user-agent':'testagent'})

GAS code:

var page = UrlFetchApp.fetch("someurl?as_data_structure", headers={'user-agent':'testagent'});

The only required header is the user-agent, and the error I am getting from the GAS code is what I would usually get from the Python code if I hadn't included the header. I am new to js but as far as I know this is the proper way to do it..?

EDIT: Now got the headers in the right place but the issue persists, exactly the same error as before.

var options = {"headers": {"User-Agent": "testagent"}};
var page = UrlFetchApp.fetch("someurl?as_data_structure", options);
Sinkhole answered 12/5, 2019 at 12:13 Comment(2)
Quote the error?Aaronaaronic
Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>403 Forbidden</title> </head><body> <h1>Forbidden</h1> <p>You don't have permission to access / on this server.<br /> </p> (This is exactly what i get on Python if i don't use the header)Sinkhole
A
8

Star ★(on top left) the issue here for Google developers to prioritize the issue.


Google doesn't always document it's restrictions(Annoying?). One such restriction is changing the user agent. It's fixed to

"User-Agent": "Mozilla/5.0 (compatible; Google-Apps-Script)"

You can't change it.

Sample Test:

function testUrlFetchAppHeaders() {
  var options = {
    headers: {
      'User-Agent':
        'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
    },
  };
  var fakeRequest = UrlFetchApp.getRequest(
    'https://www.httpbin.org/headers',
    options
  );//providing fake assurance
  var realRequest = UrlFetchApp.fetch(
    'https://www.httpbin.org/headers',
    options
  );//like a wrecking ball
  Logger.log({ fake: fakeRequest, real: realRequest });
}

Sample Response:

{
  "fake": {
    "headers": {
      "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
    },
    "method": "get",
    "payload": "",
    "followRedirects": true,
    "validateHttpsCertificates": true,
    "useIntranet": false,
    "contentType": null,
    "url": "https://www.httpbin.org/headers"
  },
  "real": {
    "headers": {
      "Accept-Encoding": "gzip,deflate,br",
      "Host": "www.httpbin.org",
      "User-Agent": "Mozilla/5.0 (compatible; Google-Apps-Script)"
    }
  }
}

getRequest(url)

Returns the request that would be made if the operation was invoked.

This method does not actually issue the request.

Neither does it accurately return the request that would be made.

Aaronaaronic answered 12/5, 2019 at 14:15 Comment(8)
Ah thanks. I wanted to update the spreadsheet with data from this website using a button on the spreadsheet itself, since GAS has this limitation is it still possible to achieve what I'm trying to do?Sinkhole
@STUD Depends. Can the website provide the data without this header?Aaronaaronic
perhaps, but that would take some convincing. Would it be possible to add the GAS User-Agent to the website alongside the existing one? If that's possible it would probably be accepted.Sinkhole
@STUD If you have admin access to the website, You probably can. If not, and If this header is required, then you'd need to host your own server. Make requests to your server > Let your server make the same request as new request with a new user agent.Aaronaaronic
I dont think I would go so far as to host my own server for it, its a small project. I know the admin tho, I might be able to convince him. Thanks for all the help :)Sinkhole
By convention, the User-Agent header is supposed to be set by the device dispatching the request not the end-user making the request. Google seems to strictly adhere to that rule. Also keep in mind these requests will originate from Google's Servers, I don't think they'll allow an end-user to 'mask' details about a request's point-of-origin - too much opportunity for someone to try something nefarious with their platform.Concerning
@Dimu That may be true for a browser. But this is server to server interaction. Regardless, If it's not maskable it should've been explicitly mentioned in the documentation as Forbidden header instead of sending developers on a wild goose chase or at least throw a error instead of silently masking it.Aaronaaronic
@Aaronaaronic I see no reason why they should. The HTTP request protocol is not restricted to the browser (you can make HTTP requests using curl for example). The onus is on the developer to know the limitations of the protocols a platform interacts with. All the documentation is responsible for is describing the APIs that allows you to interface with said protocol.Concerning
C
0

the headers belong into the options:

var options = {"headers": {"User-Agent": "testagent"}};
var page = UrlFetchApp.fetch("someurl?as_data_structure", options);
Cosmetician answered 12/5, 2019 at 12:19 Comment(2)
@Sinkhole then you might need more headers (this is not part of the question).Cosmetician
As I said, there are no other headers required by the website. Does GAS specifically need more headers?Sinkhole

© 2022 - 2024 — McMap. All rights reserved.