Streaming ChatGPT's results with Flask and LangChain

Asked 24/3, 2023 at 20:13 Answered 11/11, 2023 at 18:12

python flask openai-api langchain large-language-model

Basically I want to achieve this with Flask and LangChain: https://www.youtube.com/watch?v=x8uwwLNxqis.

I'm building a Q&A Flask app that uses LangChain in the backend, but I'm having trouble to stream the response from ChatGPT. My chain looks like this:

chain = VectorDBQA.from_chain_type(llm=ChatOpenAI(model_name="gpt-3.5-turbo", streaming=True, chain_type="stuff", vectorstore=docsearch)
...
result = chain({"query": query})
output = result['result']

Jinja simply prints the {{ output }}, and it works fine, but the result doesn't appear in the website until the entire response is finished. I want to stream the result as it's being generated by ChatGPT.

I've tried using stream_template, but it doesn't work (it doesn't stream the result, it just prints the full response at once, although I could be doing something wrong).

I finally solved it:

https://github.com/DanteNoguez/FlaskGPT

Carven answered 24/3, 2023 at 20:13 Comment(3)

What was it that made it work? I can get it to stream in the console, but not through the responses. – Daredeviltry 24/5, 2023 at 8:10

Same for me, @SunwooYang – Hiles 27/5, 2023 at 17:37

@Hiles I ended up switching to FastAPI with websockets for this one – Daredeviltry 9/6, 2023 at 23:59

In order to make it clearer that this question has an answer and to avoid broken links in the future (common StackOverflow problem when someone leaves only a link), Dante (OP) solved it here:

https://github.com/DanteNoguez/FlaskGPT

def gen_prompt(docs, query) -> str:
    return f"""To answer the question please only use the Context given, nothing else. Do not make up answer, simply say 'I don't know' if you are not sure.
Question: {query}
Context: {[doc.page_content for doc in docs]}
Answer:
"""

def prompt(query):
     docs = docsearch.similarity_search(query, k=4)
     prompt = gen_prompt(docs, query)
     return prompt


def stream(input_text):
     completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[
            {"role": "system", "content": "You're an assistant."},
            {"role": "user", "content": f"{prompt(input_text)}"},
        ], stream=True, max_tokens=500, temperature=0)
     for line in completion:
        if 'content' in line['choices'][0]['delta']:
            yield line['choices'][0]['delta']['content']

@app.route('/completion', methods=['GET', 'POST'])
def completion_api():
    if request.method == "POST":
        data = request.form
        input_text = data['input_text']
        return Response(stream(input_text), mimetype='text/event-stream')
    else:
        return Response(None, mimetype='text/event-stream')

Dmz answered 23/7, 2023 at 15:4 Comment(0)

as of now, chains in Langchain do not stream. streaming=True is misleading, this kwarg make openai servers stream the response to your llm, but the chain does not accept the response as streaming. Even though there is chain.stream method, it does not stream. I can prove it:

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain

# streaming=True controls how openai api responds to langchain
llm=ChatOpenAI(openai_api_key="your api key",streaming=True)
prompt=ChatPromptTemplate.from_messages([("human","{content}")])

chain=LLMChain(llm=llm,prompt=prompt)

response=chain.stream(input={"content":"tell me a joke"})

# this returns generator
print(response)

for res in response:
    print(res)

this is the results of 2 print statements. second print statement should be streaming but it actually sends the full complete message:

The streamed response should look like this

Aspersion answered 11/11, 2023 at 18:12 Comment(0)

Recommended topics

Hot tags