How to replace hyperlinks in StreamingResponse?
Asked Answered
E

1

2

Is that possible to replace hyperlinks in StreamingResponse?

I'm using below code to stream HTML content.

from starlette.requests import Request
from starlette.responses import StreamingResponse
from starlette.background import BackgroundTask

import httpx

client = httpx.AsyncClient(base_url="http://containername:7800/")


async def _reverse_proxy(request: Request):
    url = httpx.URL(path=request.url.path, query=request.url.query.encode("utf-8"))
    rp_req = client.build_request(
        request.method, url, headers=request.headers.raw, content=await request.body()
    )
    rp_resp = await client.send(rp_req, stream=True)
    return StreamingResponse(
        rp_resp.aiter_raw(),
        status_code=rp_resp.status_code,
        headers=rp_resp.headers,
        background=BackgroundTask(rp_resp.aclose),
    )


app.add_route("/titles/{path:path}", _reverse_proxy, ["GET", "POST"])

It's working fine, but I would like to replace a href links.
Is that possible?

I've tired to wrap the generator like below:

async def adjust_response(iterable):
    # Adjust hyperlinks in response.
    async for element in iterable.aiter_raw():
        yield element.decode("utf-8").replace("/admin", "/gateway/admins/SERVICE_A").encode("utf-8")

but this caused that error:

h11._util.LocalProtocolError: Too much data for declared Content-Length
Elmaelmajian answered 18/9, 2022 at 18:29 Comment(13)
Looking at the docs for StreamingResponse, it appears that the first parameter to its constructor is a generator that returns chunks of the response. If you wanted to filter the response, you could just define your own generator that wraps rp_resp.aiter_raw(). You'd pass this value to your generator's constructor when you create it. Then you'd supply your generator when creating the StreamingResponse object instead of passing rp_resp.aiter_raw(). The code in your generator would read from the original generator, would do the replacement magic, and would then yield the modified content.Intellectuality
...what would be nice is if you could find a library that would do the link replacement for you. I don't know of one off hand.Intellectuality
That was my idea originally, but I look for a cleaner solution:)Elmaelmajian
Fee free to laugh this off...I teach programming, and I like to encourage chats about various qualities of a codebase. Performance and efficiency are the two most obvious ones. Readability is my big #3. But then there are words like "elegant" and "cleaner". I have a particular fascination for what my students are thinking about when they throw around these terms. What is "cleaner" in your mind? I assume that what you actually mean is more like "coded already". Assuming that you'd have to do the same amount of coding as what I described, what would doing so more "cleanly" look like to you?Intellectuality
...or did you REALLY mean "coded already", in which case there's not much to talk about here.Intellectuality
The way I see it, if I have to write the code, it's pretty darn clean to change one line of existing code to say "here, use my filtering generator to get the content instead". If the name of the generator is chosen well ("HtmlLinkReplacementFilter"?), the code is even pretty well self documenting. If I'm just bothering you, I apologize. I was going to go on a hike, but it's raining, so I'm a bit bored.Intellectuality
Is the reason for replacing the href links to add the /titles/ prefix, as a follow up to the previous question?Conal
Also be aware that if you decide to wrap the incoming generator, you'll have to keep a bit of state, since you could get the response broken into smaller bits across the content you're meaning to replace or detect (i.e. you could get h, t, t, p, s:// as separate items). It's not as straight forward as just reading each item from the original generator and performing a replacement on that content.Hindi
@Conal Indeed, it's related to previous question. I can't modify hyperlinks on other services.Elmaelmajian
@Intellectuality thank you for sharing an opinion :) I just want to collect some knowledge and other people experience. Last time I got a hand from Chris, that completely changed my way of thinking about a problem. I do not look for already coded solution, i just couldn't find a way to read from chunk without decoding and encoding the bytes. Unfortunately, it takes much more time (1300ms) more. That is why I look for another approach to this problem.Elmaelmajian
@Hindi What i tried, is to decode and encode a chunk and i didn't run into a problem, which you mentioned. Am i missing some case or contex of this case?Elmaelmajian
@Conal It's for replacing suffixes in HTML content received as response. HTML has prefix /admin and I would like to replace it with /admins/SERVICE_A/. This hyperlink /admins/SERVICE_A/ can be handled by gateway accordingly.Elmaelmajian
@Conal No, i do not have such an access. No changes on those services are allowed.Elmaelmajian
C
2

One solution would clearly be to read from the original response generator (as mentioned in the comments section above), modify each href link, and then yield the modified content.

Another solution would be to use JavaScript to find all links in the HTML document and modify them accordingly. If you had access to the external service's HTML files, you could just add a script to modify all the href links, only if the Window.location is not pointing to the service's host (e.g., if (window.location.host != "containername:7800" ) {...}). Even though you don't have access to the external HTML files, you could still do that on server side. You can create a StaticFiles instance to serve a replace.js script file, and simply inject that script using a <script> tag in the <head> section of the HTML page (Note: if no <head> tag is provided, then find the <html> tag and create the <head></head> with the <script> in it). You can have the script run when the whole page has loaded, using window.onload event, or, preferably, when the initial HTML document has been completely loaded and parsed (without waiting for stylesheets, images, etc., to finish loading) using DOMContentLoaded event. Using this approach, you don't have to go through each chunk to modify each href link on server side, but rather inject the script and then have the replacement taking place on client side.

On a side note, if the incoming request has a rather large body that couldn't fit into RAM (for instance, if large files are included in the request) and would cause your application to slow down or even crash, then instead of reading the entire body into RAM using await request.body(), read it in chunks using Starlette's stream() method (see this answer and this answer), which returns an async bytes generator (see httpx's Streaming requests documentation as well); hence, you could use: client.build_request(..., content=request.stream()).

Working Example:

# ...
from fastapi.staticfiles import StaticFiles

app = FastAPI()
app.mount("/static-js", StaticFiles(directory="static-js"), name="static-js")

client = httpx.AsyncClient(base_url="http://containername:7800/")

async def iter_content(r):
    found = False
    async for chunk in r.aiter_raw():
        if not found:
            idx = chunk.find(bytes('<head>', 'utf-8'))
            if idx != -1:
                found = True
                b_arr = bytearray(chunk)
                b_arr[idx+6:] = bytes('<script src="/static-js/replace.js"></script>', 'utf-8') + b_arr[idx+6:]
                chunk = bytes(b_arr)
        yield chunk

async def _reverse_proxy(request: Request):
    url = httpx.URL(path=request.url.path, query=request.url.query.encode("utf-8"))
    rp_req = client.build_request(
        request.method, url, headers=request.headers.raw, content=await request.body()
    )
    rp_resp = await client.send(rp_req, stream=True)
    return StreamingResponse(
        iter_content(rp_resp),
        status_code=rp_resp.status_code,
        headers=rp_resp.headers,
        background=BackgroundTask(rp_resp.aclose),
    )

app.add_route("/titles/{path:path}", _reverse_proxy, ["GET", "POST"])

The JS script (replace.js):

document.addEventListener('DOMContentLoaded', (event) => {
   var anchors = document.getElementsByTagName("a");

   for (var i = 0; i < anchors.length; i++) {
      let path = anchors[i].pathname.replace('/admin', '/admins/SERVICE_A');
      anchors[i].href = path + anchors[i].search + anchors[i].hash;
   }
});
Conal answered 19/9, 2022 at 7:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.