I am playing around with this example chat application which retrieves most relevant documents from Cognitive Search to help the chatbot answer users' questions. The document retrieval itself is not a part of the app's code but is abstracted away when the app calls the "/deployments/{deployment-id}/extensions/chat/completions" endpoint.
I have failed to understand from the documentation how the /deployments/{deployment-id}/extensions/chat/completions
endpoint interacts with Cognitive Search behind the scenes.
The background is I'm trying to understand what flexibility it offers and what it would take to implement the retrieval and integration of documents into the LLM's prompt manually if we want to change something.
What Cognitive Search endpoint does the extension call and with what parameters? Here's an example of an API request I sent myself to try reproducing the top 5 results within the tool citations
curl --location 'https://[deployment].search.windows.net/indexes/[index]/docs/search?api-version=2023-07-01-Preview' \
--header 'Content-Type: application/json' \
--header 'api-key: [key]' \
--data '{
"queryType": [I tried full, simple and semantic here, semantic with different settings for other required parameters]
"search": "[question text]",
"top": 5
} '
I am getting the same documents back as in the Search Explorer for Cognitive Search in the Azure portal, but they are different from what comes back from the extensions/chat/completions
request. The relevance scores are sometimes the same for the same chunks, but sometimes also different. Could you unveil why that happens?
Is it correct that no embeddings are used in the document retrieval as implemented in the Azure OpenAI playground and this sample app?
Is there more system text hidden away somewhere instructing the model to look at the sources and provide references in this [doc1] format? How would we go about modifying that if we are not happy with the citation accuracy?