How to stream the response from LangChain QAMapReduceChain

I am using Langchain to interact with a long piece of text. The text is split into chunks, which are passed in as the docs variable below.

I have set up a chain that streams my response, however I am finding that handleLLMNewToken is called whilst OpenAI is returning the tokens from all Generations which include the Map steps for each document.

This results in a jumbled response which is a combination of all Generations, rather than just the output Generation. I cannot find a method that enables me to only stream the final result as it's being generated.

I can access the completed prompt if I use the variable chainResult - but it is not streamable unlike the callback.

const SUMMARISE_MODEL = {
        temperature: 0.5,
        modelName: "gpt-3.5-turbo-16k",
        maxTokens: 5000,
        verbose: false,
        maxConcurrency: 2,
        streaming: true,
}

//...OTHER CODE

const chain = loadQAMapReduceChain(model, {returnIntermediateSteps: true});
const chainResult = await chain.call({
            input_documents: docs,
            question: templateString,
        }, [

            {
                handleLLMNewToken(token: string) {
                    updateResponse(token);
                },
            }
        ],);

Recommended topics

Hot tags