I need to create a 'search engine' experience : from a short query (few words), I need to find the relevant documents in a corpus of thousands documents.
After analyzing few approaches, I got very good results with the Universal Sentence Encoder from Google. The problem is that my documents can be very long. For these very long texts it looks like the performance are decreasing so my idea was to cut the text in sentences/paragraph.
So I ended up with getting a list of vectors for each document (representing each part of the document).
My question is : is there a state-of-the-art algorithm/methodology to compute a scoring from a list of vector ? I don't really want to merge them into one as it would create the same effect than before (the relevant part would be diluted in the document). Any scoring algorithms to sum up the multiple cosine similarities between the query and the different parts of the text ?
important information : I can have short and long text. So I can have 1 up to 10 vectors for a document.