How to stop NodeJS "Request" module changes request when using proxy
Asked Answered
B

4

13

Sorry if this comes off as confusing.

I have written a script using the NodeJS request module that runs and performs a function on a website then returns with the data. This script works perfectly fine when I do not use a proxy by setting it to false. This is not a task that is NOT allowed to be done with Selenium/puppeteer

proxy: false

However, when I set a (working) proxy. It fails to perform the same task and is detected by the website firewall/antibot software.

proxy: http://xx.xxx.xx.xx:3128

Some things to note:

  • I have tried many (20+) different proxy providers (Residential and Datacenter) and they all have this issue
  • The issue does not occur if that proxy is set globally on my system
  • The issue does not occur if that proxy is set in a chrome extension
  • The SSL cipher suites do not match Chrome but they still don't match when not using a proxy so I assume that isn't the issue
  • It is very important to keep consistency in the header order

The question basically is. Does the request module change anything when using a proxy such as the header order?

Here is an image of what happens when it passes/fails. enter image description here

The only difference is changing the proxy that causes this to fail. One request being made with, one request being made without.

url    : url,
simple : false,
forever: true,
resolveWithFullResponse: true,
gzip: true,
headers: {
    'Host'             : 'www.sitename.com',
    'Connection'       : 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent'       : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
    'Accept'           : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-encoding'  : 'gzip, deflate, br',
    'Accept-Language'  : 'en-GB,en-US;q=0.9,en;q=0.8',
},
method : 'GET',
jar: globalJar,
simple: false,
followRedirect: false,
followAllRedirects: false, 
Balthazar answered 19/3, 2019 at 14:54 Comment(17)
You need to show how are you using proxy-Optimistic
proxy: http://xx.xxx.xx.xx:3128 @MarcosCasagrande The way it's documented into the request libraryBalthazar
I don't think the order of HTTP headers is important. If you want to check your headers you can use httpbin.org/anything.Precedency
Hi @Precedency - I understand is very normal circumstances header order isn't important. In this circumstance, the header order is important and will prevent execution. I will do a diagram to further helpBalthazar
That's strange, the order of headers should not matter (see rfc2616). What kind of server is this?Precedency
@Precedency Changing header order in the slightest causes a failure (even without a proxy), Changing the header order back causes it to work again. The whole point of this program is to stop me from gaining access since browsers use certain header orders every request, when the order isn't correct it knows its not a browser and causes a failure.Balthazar
Ah, that makes sense. Then maybe the proxy is adding headers that are detected by the firewall - X-Forwarded-For for example. You can use httpbin.org to see what headers the server receives.Precedency
It doesn't. Also, the proxies work fine when using them for my computer or as a chrome extension (on the same website).Balthazar
Can you provide 2 dumps of the sent headers?Corrincorrina
#23585871 couldn't this question help you?Jessamyn
Curious case. Do you confirm the headers order DOES change with proxy?Aho
@Aho Hi. The header order does not change with proxies.Balthazar
I assume, when you use system-wide or chrome extension proxy, you are accessing the website in question from a browser, not from your script? Because your script would ignore these settings anyway. Is that correct assumption?Rhoda
@SergeyNudnov Actually no, If I use a system-wide proxy and run the script without a proxy (So it uses the system-wide one) it still works.Balthazar
Why do you think it uses system-wide proxy?Rhoda
@ConorReid node actually won't use your systems proxy if I'm not mistaken. This should be implemented in the applcation that requires it, so you're probably making a direct request instead. Also, which proxies are you using? Do you host them or you're trying public/paid ones?Atterbury
@the issue isn't solved yet, is it? So some log-files of the proxy would still be useful.Shope
B
3

After deactivating my old account I wanted to come back and give an actual answer to this question now I fully understand the answer. What I was asking one year ago was not possible, The antibot was fingerprinting me through the TLS ClientHello (And even slightly on the TCP/frame level).

To start, I wrote my a wrapper called request-curl which wrapped libcurl/curl binaries into a single library with the same format as request-promise, this gave me much more control over the request (preventing encoding, http2/proxy support and further session/TLS control) this still only let me reach a medicore rank of the 687th most popular ClientHello (https://client.tlsfingerprint.io:8443/). It wasn't good enough.

I had to move language. NodeJS is too much of a high-level language to allow for a really deep control (had to modify packets being sent from Layer 3). So as the answer to my question.

This is not yet possible to do in NodeJS - Let alone with the now unmaintained request.js library.

For anyone reading this, if you want to forge perfect requests to bypass antibot security you must move to a different language: I recommend utls in Golang or BouncyCastle in c#. Godspeed to you as it took me a year to really know how to do this. Even then, there's more internal issues these languages have and features they do not yet supposed (Go doesn't support 'basic' header-ordering, you need to monkey-patch/modify internals etc, utls doesn't easily support proxies). The list goes on and on.

If you're not already too deep into it, it's one hell of a rabbithole and I recommend you do not enter it.

Bruner answered 19/5, 2020 at 15:23 Comment(0)
B
2

According to the proxies documentation of the request module:

By default, when proxying http traffic, request will simply make a standard proxied http request. This is done by making the url section of the initial line of the request a fully qualified url to the endpoint.

Instead you can use a http tunnel by setting:

tunnel : true

in the request module proxy settings.

It could be that in your case, you are making a standard proxied http request, whereas when using a proxy globally on your system or a chrome extension a http tunnel is created.

From the documentation:

Note that, when using a tunneling proxy, the proxy-authorization header and any headers from custom proxyHeaderExclusiveList are never sent to the endpoint server, but only to the proxy server.

Blairblaire answered 1/4, 2019 at 20:54 Comment(2)
Unfortunately, the same error happens. Setting a tunnel, creating an agent through a tunnel (all methods). All break :(Balthazar
@ConorReid Can you access other urls besides this one? I see you are using also 'Upgrade-Insecure-Requests', is the website url "http" or "https"? If it supports https, try setting tunnel: false. You can see the tunnel setting here: github.com/request/request#requestoptions-callbackBlairblaire
A
0

There are some scenarios that I can think of

  • Proxy is actually adding some headers to the final request (in order to identify you to the server)
  • The website you're trying to reach has your proxy IPs blacklisted (public/paid ones?)

It really depends on why you need to use that proxy

  • Is it because of network restrictions?
  • Is it because you want to hide the original request address?

Also, if you have control over the proxy server, can you log the requests being made to the final server?

My suggestion

Try writing your own proxy (a reverse one) and host it somewhere. Instead of requesting to https://target.com, to a request to your http[s]://proxy.com/ and let the reverse proxy do the work. Also, remember to disable X headers on the implementation as it will change the request headers

Reference for node.js implementation:

https://github.com/nodejitsu/node-http-proxy

Note: let me know about the questions I made in the comments

Atterbury answered 6/4, 2019 at 3:11 Comment(1)
The proxy is not adding headers. I have checked on my own web server. I have used 20+ free/paid proxies aswell and again. A lot of people seem to not understand that this works set globally and as a chrome extension. If headers are the issue it wouldn't work then either.Balthazar
S
0

You're using the http-scheme for you request, but if the webserver redirects http to https and if the proxy-server is not configured to accept redirects (to https) then the problem might only be about the scheme respectively the URL you enter.

So the proxy had to be configured to accept redirects or the URL has to be checked manually in the case of faults and then adjusted in the case of a redirect.

Here you can read about redirects on one proxy-server (Apache Traffic Server), the scenario there includes more redirects than I described above:
https://docs.trafficserver.apache.org/en/4.2.x/admin/reverse-proxy-http-redirects.en.html#handling-origin-server-redirect-responses

If you still encounter problems the server-logs of the proxy-server would be helpful.

EDIT:
According to he page @Jannes Botis linked there exist still more proxy-settings that might be able to support or disrupt the desired functionality, so the whole issue is perhaps about configuring the proxy-server correct. Here are a few settings that are directly related to redirects:

followRedirect - follow HTTP 3xx responses as redirects (default: true). This property can also be implemented as function which gets response object as a single argument and should return true if redirects should continue or false otherwise.
followAllRedirects - follow non-GET HTTP 3xx responses as redirects (default: false)
followOriginalHttpMethod - by default we redirect to HTTP method GET. you can enable this property to redirect to the original HTTP method (default: false)
maxRedirects - the maximum number of redirects to follow (default: 10)
removeRefererHeader - removes the referer header when a redirect happens (default: false). Note: if true, referer header set in the initial request is preserved during redirect chain.

It's quite possible that other settings of the proxy-server have impact on fail or success of your scenario too.

Shope answered 6/4, 2019 at 4:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.