MySQL match() against() - order by relevance and column?
Asked Answered
L

5

86

Okay, so I'm trying to make a full text search in multiple columns, something simple like this:

SELECT * FROM pages WHERE MATCH(head, body) AGAINST('some words' IN BOOLEAN MODE)

Now i want to order by relevance, (how many of the words are found?) which I have been able to do with something like this:

SELECT * , MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE) AS relevance 
FROM pages
WHERE MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE)
ORDER BY relevance

Now here comes the part where I get lost, I want to prioritize the relevance in the head column.

I guess I could make two relevance columns, one for head and one for body, but at that point I'd be doing somewhat the same search in the table three times, and for what i'm making this function, performance is important, since the query will both be joined and matched against other tables.

So, my main question is, is there a faster way to search for relevance and prioritize certain columns? (And as a bonus possibly even making relevance count number of times the words occur in the columns?)

Any suggestions or advice would be great.

Note: I will be running this on a LAMP-server. (WAMP in local testing)

Lur answered 7/6, 2011 at 0:56 Comment(2)
Do you really have to put MATCH...AGAINST in both the SELECT clause and in the WHERE clause? Can you not alias it in the SELECT clause and refer to the alias in the WHERE clause? I'm trying to use prepared statements and this seems redundant/strange to me.Vasectomy
No, as stated in MySQL documentation since 5.5, MATCH ... AGAINST will be computed once when both in SELECT and WHERE, so no extra overhead.Isobelisocheim
C
165

This might give the increased relevance to the head part that you want. It won't double it, but it might possibly good enough for your sake:

SELECT pages.*,
       MATCH (head, body) AGAINST ('some words') AS relevance,
       MATCH (head) AGAINST ('some words') AS title_relevance
FROM pages
WHERE MATCH (head, body) AGAINST ('some words')
ORDER BY title_relevance DESC, relevance DESC

-- alternatively:
ORDER BY title_relevance + relevance DESC

An alternative that you also want to investigate, if you've the flexibility to switch DB engine, is Postgres. It allows to set the weight of operators and to play around with the ranking.

Coessential answered 10/6, 2011 at 10:29 Comment(6)
As an aside, MySQL 5.6 supports full text searches on InnoDB tables!Wirewove
Can you provide a SQL fiddle for this?Delectate
How much of a negative impact do multiple searches have? I would need 4 matches ion my SELECT as i have 4 differen weight factors. Would that make performance much lower?Relinquish
@Relinquish I have seen on other similar questions more than one person say that there is no extra overhead with using multiple MATCH statements, due to the way MySQL works internally.Bud
Make sure you run these two. ALTER TABLE talk_webpages ADD FULLTEXT(head) and ALTER TABLE talk_webpages ADD FULLTEXT(head, body)Brimful
@Denis Thank you for this answer. even In my scenario I want title_relevance to shows as first and the rest next. but even in title_relevance I needed them to order by their posted_date (Latest should shows first). and next group should have that logic too. How to achieve that?Kristykristyn
L
16

Just adding for who might need.. Don't forget to alter the table!

ALTER TABLE table_name ADD FULLTEXT(column_name);
Lots answered 18/1, 2013 at 16:54 Comment(2)
if you execute above command more than once, it will be create multiple indexes for same column(s). So just run this command only once.Floruit
Better yet, use CREATE FULLTEXT INDEX indexname on tablename(column_name(s)). You should also really check if the index exists before you try to create it. You can check if it exists using: SELECT INDEX_NAME FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_CATALOG = 'def' AND TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'tablename' AND INDEX_NAME = 'indexname';Wriggle
W
10

I have never done so, but it seems like

MATCH (head, head, body) AGAINST ('some words' IN BOOLEAN MODE)

Should give a double weight to matches found in the head.


Just read this comment on the docs page, Thought it might be of value to you:

Posted by Patrick O'Lone on December 9 2002 6:51am

It should be noted in the documentation that IN BOOLEAN MODE will almost always return a relevance of 1.0. In order to get a relevance that is meaningful, you'll need to:

SELECT MATCH('Content') AGAINST ('keyword1 keyword2') as Relevance 
FROM table 
WHERE MATCH ('Content') AGAINST('+keyword1+keyword2' IN BOOLEAN MODE) 
HAVING Relevance > 0.2 
ORDER BY Relevance DESC 

Notice that you are doing a regular relevance query to obtain relevance factors combined with a WHERE clause that uses BOOLEAN MODE. The BOOLEAN MODE gives you the subset that fulfills the requirements of the BOOLEAN search, the relevance query fulfills the relevance factor, and the HAVING clause (in this case) ensures that the document is relevant to the search (i.e. documents that score less than 0.2 are considered irrelevant). This also allows you to order by relevance.

This may or may not be a bug in the way that IN BOOLEAN MODE operates, although the comments I've read on the mailing list suggest that IN BOOLEAN MODE's relevance ranking is not very complicated, thus lending itself poorly for actually providing relevant documents. BTW - I didn't notice a performance loss for doing this, since it appears MySQL only performs the FULLTEXT search once, even though the two MATCH clauses are different. Use EXPLAIN to prove this.

So it would seem you may not need to worry about calling the fulltext search twice, though you still should "use EXPLAIN to prove this"

Washtub answered 7/6, 2011 at 1:26 Comment(7)
Adding head twice to the match() function does not work, sadly. Maybe because the query doesn't count the number of times the words are occurring? And I've been using that page you refer to as well, but i can for some reason not make it work... I have not indexed my columns yet, and therefor can not search without the "IN BOOLEAN MODE' tag...Lur
I think a non-booleen search would return # of occurrences, but booleen does not?Washtub
I will look more into it tomorrow, but i'm going to hold for now. Thanks for the answer, we'll see if it helps me when i get a hold of this.Lur
I was having a problem using IN BOOLEAN MODE and then ordering by relevance and this solved my problem with relevance always being returned as 1. Thanks.Wivina
Generating a score field solved my issue: I was getting results, but a lot of them were complete noise. Thanks, +1Chew
i have similar problem when search "some words", sometimes results with "some" too comes in top when words with "some words" listed in the below with low relevance nt sure why ?Anglophile
.The code "keyword1 keyword2" will just output results where keyword1 is emediate keyword2, so automatically, those keywords not near each other will not be part of the results.Strongminded
S
5

I was just playing around with this, too. One way you can add extra weight is in the ORDER BY area of the code.

For example, if you were matching 3 different columns and wanted to more heavily weight certain columns:

SELECT search.*,
MATCH (name) AGAINST ('black' IN BOOLEAN MODE) AS name_match,
MATCH (keywords) AGAINST ('black' IN BOOLEAN MODE) AS keyword_match,
MATCH (description) AGAINST ('black' IN BOOLEAN MODE) AS description_match
FROM search
WHERE MATCH (name, keywords, description) AGAINST ('black' IN BOOLEAN MODE)
ORDER BY (name_match * 3  + keyword_match * 2  + description_match) DESC LIMIT 0,100;
Seel answered 19/2, 2013 at 3:33 Comment(2)
Isn't this a really heavy query?Bifocal
Move the math into the select statement and it lightens the load a lot. SELECT search.*, (MATCH (name) AGAINST ('black' IN BOOLEAN MODE) * 3) + (MATCH (keywords) AGAINST ('black' IN BOOLEAN MODE)*2 + MATCH (description) AGAINST ('black' IN BOOLEAN MODE)) AS totalScore , FROM search WHERE MATCH (name, keywords, description) AGAINST ('black' IN BOOLEAN MODE) ORDER BY totalScore DESC LIMIT 0,100;Decoct
S
1

Just to add that if you're using custom ranking, remember to use HAVING instead of WHERE to reduce the load.

SELECT MATCH(x,y) AGAINST (? IN BOOLEAN MODE) AS r1,
MATCH(z) AGAINST (? IN BOOLEAN MODE) AS r2,
...
FROM table 
HAVING (r1 + r2) > 0
ORDER BY (r1 * 3 + r2) DESC
LIMIT 10
Susurrate answered 18/8, 2021 at 21:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.