Node JS HTTP Proxy hanging up
Asked Answered
S

2

7

I have an http-proxy to proxy any website and inject some custom JS file before to serve the HTML back to the client. Whenever I try to access the proxied website, it will hang up or the browser seems to load indeterminately. But when I check the HTML source, I successfully managed to inject my custom JavaScript file. Here is the code:

const cheerio = require('cheerio');
const http = require('http');
const httpProxy = require('http-proxy');
const { ungzip } = require('node-gzip');

_initProxy(host: string) {
    let proxy = httpProxy.createProxyServer({});
    let option = {
        target: host,
        selfHandleResponse: true
    };

    proxy.on('proxyRes', function (proxyRes, req, res) {
        let body = [];
        proxyRes.on('data', function (chunk) {
            body.push(chunk);
        });
        proxyRes.on('end', async function () {
            let buffer = Buffer.concat(body);
            if (proxyRes.headers['content-encoding'] === 'gzip') {
                try {
                    let $ = null;
                    const decompressed = await ungzip(buffer);
                    const scriptTag = '<script src="my-customjs.js"></script>';
                    $ = await cheerio.load(decompressed.toString());
                    await $('body').append(scriptTag);
                    res.end($.html());
                } catch (e) {
                    console.log(e);
                }
            }
        });
    });

    let server = http.createServer(function (req, res) {
        proxy.web(req, res, option, function (e) {
            console.log(e);
        });
    });

    console.log("listening on port 5051");
    server.listen(5051);
}

Can someone please tell me if I am doing anything wrong, it looks like node-http-proxy is dying a lot and can't rely much on it since the proxy can work sometimes and die at the next run, depending on how many times I ran the server.

Ski answered 17/3, 2020 at 13:7 Comment(14)
Strange that you're having issues, looks good to me. Although you don't handle the case when the response is not gzipped.Fraise
I do, was just trying to simplify the code for this post.Ski
Not really an answer to your problem, but did you know that vanilla nginx can do this as well with a few lines of config and probably more efficiently than node proxy as you don't need to parse the html? gist.github.com/Tofandel/1ce9635335c6969b5bc0289140b925cdYpsilanti
Is your html up to the spec btw? If it's not maybe that's what's causing crashesYpsilanti
I would not use cheerio for a proxy, rather just replace </body> with <script src="my-customjs.js"></script></body> that would save quite a lot in performanceYpsilanti
I may just have add a light bulb about your problem.. Is your script being proxied as well? Check that you can access it. If not it may just be that the server is trying to proxy js files as well and not knowing how to parse them they will have an empty body and hang up the browserYpsilanti
Try using plain curl to fetch the proxy site. If it gives you complete html, Then the problem might be with cross-origin requestsTullus
@Ypsilanti Why do you think Cheerio library is not suitable for a Proxy?Ski
@Ypsilanti How do you want to just replace the body, Cheerio helps in doing it. Not sure why you think this could be a performance issue.Ski
@Tullus Thanks for your suggestion but I can see error logs and I already handle CORS issues. This won't be a simple cross-origin request issue, otherwise, I won't post on StackOverflow.Ski
@Ypsilanti I was aware of Nginx Reverse Proxy. My requirements are limited to NodeJS stack, unfortunately.Ski
@Ypsilanti Sorry, not sure what do you mean by "If my html up to the spec".Ski
@Ypsilanti Good suggestion, however, my scripts are statically served outside of the Proxy Server and no risk of not being able to read them. Also in my Post, I do mention that the injection is working fine.Ski
@NizarB. I meant that Cheerio is parsing all your html, which is quite memory and cpu intensive for a proxy, when to add a script you don't need to do that, you can just find the string "</body>" and replace it with your script + "</body>" like res.end(decompressed.toString().replace('</body>', '<script src="my-customjs.js"></script></body>')); See if that solves your issueYpsilanti
S
-2

I ended up writing a small Python Server using CherryPy and proxied the web app with mitmproxy. Everything is now working smoothly. Maybe I was doing it wrong with node-http-proxy but I also became sceptic about using it in a production environment.

Ski answered 27/3, 2020 at 14:37 Comment(1)
This cannot be considered as an awaser to the question you have asked.Filter
F
4

Your code looked fine so I was curious and tried it.

Although you do log a few errors, you don't handle several cases:

  • The server returns a body with no response (cheerio will generate an empty HTML body when this happens)
  • The server returns a response that is not gzipped (your code will silently discard the response)

I made a few modifications to your code.

Change initial options

let proxy = httpProxy.createProxyServer({
    secure: false,
    changeOrigin: true
});
  • Don't verify TLS certificates secure: false
  • Send the correct Host header changeOrigin: true

Remove the if statement and replace it with a ternary

const isCompressed = proxyRes.headers['content-encoding'] === 'gzip';
const decompressed = isCompressed ? await ungzip(buffer) : buffer;

You can also remove the 2 await on cheerio, Cheerio is not async and doesn't return an awaitable.

Final code

Here's the final code, which works. You mentioned that "it looks like node-http-proxy is dying a lot [...] depending on how many times I ran the server." I experienced no such stability issues, so your problems may lie elsewhere if that is happening (bad ram?)

const cheerio = require('cheerio');
const http = require('http');
const httpProxy = require('http-proxy');
const { ungzip } = require('node-gzip');

const host = 'https://github.com';

let proxy = httpProxy.createProxyServer({
    secure: false,
    changeOrigin: true
});
let option = {
    target: host,
    selfHandleResponse: true
};

proxy.on('proxyRes', function (proxyRes, req, res) {

    console.log(`Proxy response with status code: ${proxyRes.statusCode} to url ${req.url}`);
    if (proxyRes.statusCode == 301) {
        throw new Error('You should probably do something here, I think there may be an httpProxy option to handle redirects');
    }
    let body = [];
    proxyRes.on('data', function (chunk) {
        body.push(chunk);
    });
    proxyRes.on('end', async function () {
        let buffer = Buffer.concat(body);
        try {
            let $ = null;
            const isCompressed = proxyRes.headers['content-encoding'] === 'gzip';
            const decompressed = isCompressed ? await ungzip(buffer) : buffer;
            const scriptTag = '<script src="my-customjs.js"></script>';
            $ = cheerio.load(decompressed.toString());
            $('body').append(scriptTag);
            res.end($.html());
        } catch (e) {
            console.log(e);
        }
    });
});

let server = http.createServer(function (req, res) {
    proxy.web(req, res, option, function (e) {
        console.log(e);
    });
});

console.log("listening on port 5051");
server.listen(5051);
Fraise answered 21/3, 2020 at 7:4 Comment(6)
Thanks for your answer. I have it working as well on the original version of my code. I appreciate the improvement you did in your code version. However, I did mention in my thread that I also have it working fine when it comes to injecting custom JS file in the proxied website. The main issue that I am trying to solve, somehow it can cause indeterminate loading. I may start to monitor a bit more but I really doubt it is a RAM issue. Will try to post a StackBlitz to replicate my issue.Ski
Another note, this is working well with HTTP protocol, having only "secure" flag set to false won't load HTTPS website properly.Ski
@NizarB. it was unclear whether it was working or not, because you stated several outcomes... "hang up", "load indefinitely", "successful inject", and I wondered if the latter might be inside an empty body, which is something I saw several times. I figured that there was more code, appreciate that the code you posted was greatly simplified.Fraise
@NizarB. with regards to HTTPS, this was working perfectly for me against https://github.com with the exact code posted above (on Node 12.13.0, http-proxy 1.18.0). Is it possible that the issue lies elsewhere in your code? Can you confirm that the code I posted works the way I described? If it does, then the issue may lie elsewhere in your code. If it does not, then something funky is going on..Fraise
Definitely something odd is going on and I am trying to hunt it down. Did you try different HTTPS website? To me, it looks broken and still loads forever. I keep you updated.Ski
@NizarB. No, but this part of the code is working fine, so I don't expect it will make a difference. If you have a specific one that you'd like me to try, I can.Fraise
S
-2

I ended up writing a small Python Server using CherryPy and proxied the web app with mitmproxy. Everything is now working smoothly. Maybe I was doing it wrong with node-http-proxy but I also became sceptic about using it in a production environment.

Ski answered 27/3, 2020 at 14:37 Comment(1)
This cannot be considered as an awaser to the question you have asked.Filter

© 2022 - 2024 — McMap. All rights reserved.