Unclear how Azure OpenAI endpoint /extensions/chat/completions does the retrieval behind the scenes
Asked Answered
N

3

7

I am playing around with this example chat application which retrieves most relevant documents from Cognitive Search to help the chatbot answer users' questions. The document retrieval itself is not a part of the app's code but is abstracted away when the app calls the "/deployments/{deployment-id}/extensions/chat/completions" endpoint.

I have failed to understand from the documentation how the /deployments/{deployment-id}/extensions/chat/completions endpoint interacts with Cognitive Search behind the scenes. The background is I'm trying to understand what flexibility it offers and what it would take to implement the retrieval and integration of documents into the LLM's prompt manually if we want to change something.

What Cognitive Search endpoint does the extension call and with what parameters? Here's an example of an API request I sent myself to try reproducing the top 5 results within the tool citations

curl --location 'https://[deployment].search.windows.net/indexes/[index]/docs/search?api-version=2023-07-01-Preview' \
--header 'Content-Type: application/json' \
--header 'api-key: [key]' \
--data '{  
     "queryType": [I tried full, simple and semantic here, semantic with different settings for other required parameters]
     "search": "[question text]",  
     "top": 5
   }  '

I am getting the same documents back as in the Search Explorer for Cognitive Search in the Azure portal, but they are different from what comes back from the extensions/chat/completions request. The relevance scores are sometimes the same for the same chunks, but sometimes also different. Could you unveil why that happens?

Is it correct that no embeddings are used in the document retrieval as implemented in the Azure OpenAI playground and this sample app?

Is there more system text hidden away somewhere instructing the model to look at the sources and provide references in this [doc1] format? How would we go about modifying that if we are not happy with the citation accuracy?

Nebuchadnezzar answered 19/7, 2023 at 13:9 Comment(0)
T
2

The API is in preview, and is generally quite "black box".

It does allow for a lot of different ways to search against an index (e.g, you can choose if you want semantic, vector, vectorSimpleHybrid or vectorSemanticHybrid) but it is unclear from documentation (and I think it is by design) if, say, a vector search just vectorises the last user message and does a search, or if there is some sort of langchain-esq background service that asks the LLM to generate search queries, which are then submitted to cognitive search.

You can read the documentation and cross reference the options in dataSources with the rest API documentation for cognitive search, but it will do little to explain the underlying procedures.

Taka answered 5/10, 2023 at 11:7 Comment(0)
L
1

The system role also seems to act differently here. Given to api without extension the role works as intended. Given a datasource and using extensions seems to not work at all...

Example would be "You are an english to french translator" - will translate on chat completions endpoint but not extensions.

Lyallpur answered 24/10, 2023 at 14:23 Comment(0)
C
0

Assuming the example chat repo you link provides a similar interface to the Azure "use your own data" service available from the Azure OpenAI playground, indeed it's very likely doing quite a few things behind the scenes and with limited documentation.

If you check the network requests from the developer tools, it contains various inputs that can affect the retrieval and generation:

  1. strictness: maps to Advanced settings > Strictness.
    I'm not sure how it is used to update the Azure AI search query.
  2. topNDocuments: maps to Advanced settings > Retrieved documents
  3. queryType: maps to the same field in Azure AI search. Set to semantic.
    Based on this announcement, It seems to use "advanced language models", but you can still use vector search with your own embeddings or an hybrid approach like they recommend
  4. inScope: maps to Advanced settings > Limit responses specific to your data content, set to semantic It probably updates the prompt to instruct the LLM to not use information not in the search query output.

and outputs:

  1. completion.choices[0].message.model_extra['context']['messages'][0]['content']['intent']: List typically 2-5 intents which I suspects are used to retrieve better/more results from Azure AI Search.
Calutron answered 18/12, 2023 at 14:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.