Elasticsearch: Learning from clicks (Search result ranking)
Asked Answered
A

3

6

I have read over the chapter "Learning from clicks" in the book Programming Collective Intelligence and liked the idea: The search engine there learns on which results the user clicked and use this information to improve the ranking of results.

I think it would improve the quality of the search ranking a lot in my Java/Elasticsearch application if I could learn from the user clicks.

In the book, they build a multiplayer perceptron (MLP) network to use the learned information even for new search phrases. They use Python with a SQL database to calculate the search ranking.

Has anybody implemented something like this already with Elasticsearch or knows an example project? It would be great, if I could manage the clicking information directly in Elasticsearch without needing an extra SQL database.

Absolution answered 3/11, 2014 at 12:28 Comment(4)
What is your question?Precocity
Has anybody implemented something like this already with Elasticsearch or knows an example project?Absolution
I have implemented a project like thatPrecocity
Ok ;-), I see, I should improve the phrasing of my question. Can you share the source, the architecture or your Elasticsearch schema of your project as an answer? Have you implemented it without using another database? How do you store the clicks? Is there already any public Java/Elasticsearch solution that implements this algorithm (I think, learning by clicks isn't a rarely wished feature)?Absolution
E
9

In the field of Information Retrieval (the general academic field of search and recommendations) this is more generally known as Learning to Rank. Whether its clicks, conversions, or other forms of sussing out what's a "good" or "bad" result for a keyword search, learning to rank uses either a classifier or regression process to learn what features of the query and document correlate with relevance.

Clicks?

For clicks specifically, there's reasons to be skeptical that optimizing clicks is ideal. There's a paper from Microsoft Research I'm trying to dig up that claims that in their case, clicks are only 45% correlated with relevance. Click+dwell is often a more useful general-purpose indicator of relevance.

There's also the risk of self-reinforcing bias in search, as I talk about in this blog article. There's a chance that if you're already showing a user mediocre results, and they keep clicking on those mediocre results, you'll end up reinforcing search to keep showing users mediocre results.

Beyond clicks, there's often domain-specific considerations for what you should measure. For example, clasically in e-commerce, conversions matter. Perhaps a search result click that led to such a purchase should count more. Netflix famously tries to suss out what it means when you watch a movie for 5 minutes and go back to the menu vs 30 minutes and exit. Some search use cases are informational: clicking may mean something different when you're researching and clicking many search results vs when you're shopping for a single item.

So sorry to say it's not a silver bullet. I've heard of many successful and unsuccessful attempts at doing Learning to Rank and it mostly boils down to how successful you are at measuring what your users consider relevant. The difficulty of this problem surprises a lot of peop.le

For Elasticsearch...

For Elasticsearch specifically, there's this plugin (disclaimer I'm the author). Which is documented here. Once you've figured out how to "grade" a document for a specific query (whether its clicks or something more) you can train a model that can be then fed into Elasticsearch via this plugin for your ranking.

Ethe answered 14/2, 2017 at 22:22 Comment(3)
Wow thanks, very interesting. Three years later, I still don't have a Learning-to-rank-search in my applications ;-). The plugin looks very promising.Absolution
Doug, does this do re-ranking? Or does it score all documents in the index? It seems for a huge index scoring all records with an xgboost model could be quite computationally intensive. However re-ranking would seem more feasible. Could you speak into that.Gilreath
@doug , I am trying to use the plugin but documentation seems to be missing, can you help in this question : #77256604Stroy
C
2

What you would need to do is store information about the clicks in a field inside the Elasticsearch index. Every click would result in an update of a document. Since an update action is actually a delete and insert Update API, you need to make sure your document text is stored, not only indexed. You can then use a Function Score Query to build a score function reflecting the value stored in the index.

Alternatively, you could store the information in a separate database and use a script function inside the score function to access the database. I wouldn't suggest this solution due to performance issues.

Clupeoid answered 1/12, 2014 at 9:16 Comment(1)
Thanks for your hints, but I don't think that a Function Score Query is powerful enough, because you can only access the fields of one document. An example for a score query for my use case would be super helpful for me.Absolution
S
-1

I get the point of your question. You want to build learning to rank model within Elasticsearch framework. The relevance of each doc to the query is computed online. You want to combine query and doc to compute the score, so a custom function to compute _score is needed. I am new in elasticsearch, and I'm finding a way to solve the problem.

Lucene is a more general search engine which is open to define your own scorer to compute the relevance, and I have developed several applications on it before.

This article describes the belief understanding of customizing scorer. However, on elasticsearch, I haven't found related articles. Welcome to discuss with me about your progress on elasticsearch.

Syndicalism answered 23/1, 2016 at 18:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.