I have a FastAPI application that uses Server Sent Events (SSE) for streaming the response of a generative AI model, similar to the API of OpenAI. The application is deployed using the following architecture:
- FastAPI application hosted by Gunicorn with uvicorn worker
- EKS that runs the dockerized FastAPI application
- ALB controlled by the ingress controller installed on the EKS
- API Gateway that adds an authentication layer to all of the services hosted on the EKS
When I run the FastAPI application with the SSE endpoint locally, everything works perfectly. However, when deploying the application with the above-mentioned stack, the SSE response is not streamed back, but returned when the stream completes with all the chunks at once.
After investigating, I discovered that the issue occurs when I add the API Gateway layer, which I need for authentication. The response isn't streamed anymore, and the content-length header is added when passing through the API Gateway. This makes it look like the API Gateway is waiting for the response to fully complete before adding the header and sending it back to the client.
Another problem I encountered is that the request times out after 30 seconds due to the API Gateway, while the SSE response could take longer than that.
I am looking for a solution to support SSE while keeping the authentication layer outside of the application code. Any suggestions or guidance on how to achieve this would be greatly appreciated!