Sorry if this comes off as confusing.
I have written a script using the NodeJS request module that runs and performs a function on a website then returns with the data. This script works perfectly fine when I do not use a proxy by setting it to false. This is not a task that is NOT allowed to be done with Selenium/puppeteer
proxy: false
However, when I set a (working) proxy. It fails to perform the same task and is detected by the website firewall/antibot software.
proxy: http://xx.xxx.xx.xx:3128
Some things to note:
- I have tried many (20+) different proxy providers (Residential and Datacenter) and they all have this issue
- The issue does not occur if that proxy is set globally on my system
- The issue does not occur if that proxy is set in a chrome extension
- The SSL cipher suites do not match Chrome but they still don't match when not using a proxy so I assume that isn't the issue
- It is very important to keep consistency in the header order
The question basically is. Does the request module change anything when using a proxy such as the header order?
Here is an image of what happens when it passes/fails.
The only difference is changing the proxy that causes this to fail. One request being made with, one request being made without.
url : url,
simple : false,
forever: true,
resolveWithFullResponse: true,
gzip: true,
headers: {
'Host' : 'www.sitename.com',
'Connection' : 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-encoding' : 'gzip, deflate, br',
'Accept-Language' : 'en-GB,en-US;q=0.9,en;q=0.8',
},
method : 'GET',
jar: globalJar,
simple: false,
followRedirect: false,
followAllRedirects: false,
proxy
- – Optimisticproxy: http://xx.xxx.xx.xx:3128
@MarcosCasagrande The way it's documented into the request library – BalthazarX-Forwarded-For
for example. You can use httpbin.org to see what headers the server receives. – Precedency