How to check against all joins when generating a score using MYSQL
Asked Answered
B

4

11

I'm using MYSQL to generate a score for each result returned by a query. The results are then ordered by the score.

The part that doesn't seem to be working properly is when I'm trying to add a score for each tag that has been searched and the result is assigned to. So lets say I do a search for the tags "example", "test and "tag" and one of my results is assigned to the tags "example", "test", "someothertag" it should come up with a score of 10 since there are 2 matches.

What is actually happening is I'm getting a score of 5 if there is a match, regardless of how many tags are matched. and 0 if no tags are matched.

Here is an example of one of the queries that is generated from a search.

        SELECT DISTINCT results.*,
                    ( 
                        5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE)) +

                        1*usefulness + 
                        10*shares 
                    ) AS score 
        FROM results
        INNER JOIN categories c on results.ID = c.RESULT_ID
        INNER JOIN tags ON results.id = tags.result_id
        WHERE c.name in ('purchase', 'condo', 'va')
        AND ( tags.name = 'self employed' OR tags.name = 'rental income' OR tags.name = 'commission income' OR tags.name = 'bankruptcy' OR tags.name = 'condo approval'  )
        AND ( results.scope = 'all' OR results.scope = 'hi' )
        AND published = 1

        GROUP BY results.ID
        having count(distinct c.c_id) = 3
        ORDER BY score DESC 
        LIMIT 8 OFFSET 0
Brutality answered 30/4, 2013 at 16:21 Comment(3)
You probably don't need to use a fulltext index to check the tag names in this query. It'd be faster and more concise to just do a direct equality comparison.Coetaneous
It will probably be more clear to people if you can write down the schema and some sample data for your tables, and a listing of what you would like the query to generate for that sample data.Vive
Groupin might result in loosing matches. Have you tried to add COUNT to your five 5*(MATCH(tags.name)... items ?Luvenialuwana
E
1

As advised by Sam Dufel, you probably do not need full text search, especially since you are using exact string comparison in your WHERE clauses.

Moreover, because of the many-to-many relationship between results and categories (assumed from the HAVING COUNT(c_id) = 3 clause), I think in no way can you join both categories and tags in the same query.

Without the GROUP BY clause, for one given result, you would get one row for each matching category. For each matching pair (result, category), you would then get one row for each matching tag.name. I don't think there is a way to deal with such a result.

What I would suggest is:

Step 1: getting results present in all three categories

SELECT results.ID
FROM results
JOIN categories ON results.id = categories.result_id
WHERE categories.name IN ('purchase', 'condo', 'va')
GROUP BY results.ID
HAVING COUNT(DISTINCT c.c_id) = 3

Step 2: computing score of any results matching at least one search string

SELECT
    DISTINCT results.*, -- DISTINCT is redundant because of the GROUP BY clause
    ( 
        5*(COUNT(tags.result_id)) + -- you actually want to count the number of matches!
        1*usefulness +  -- warning, see below 
        10*shares       -- warning, see below
    ) AS score 
FROM results
INNER JOIN tags ON results.id = tags.result_id
WHERE
    tags.name = 'self employed'
    OR tags.name = 'rental income'
    OR tags.name = 'commission income'
    OR tags.name = 'bankruptcy'
    OR tags.name = 'condo approval'
GROUP BY results.ID

Step 3: putting it all together

SELECT
    results.*,
    ( 
        5*(COUNT(tags.result_id)) +
        1*usefulness +  -- warning, see below 
        10*shares       -- warning, see below
    ) AS score 
FROM (
        SELECT results.id
        FROM results
        JOIN categories ON results.id = categories.result_id
        WHERE
            categories.name IN ('purchase', 'condo', 'va')
            AND ( results.scope = 'all' OR results.scope = 'hi' )
            AND published = 1
        GROUP BY results.id
        HAVING COUNT(DISTINCT categories.c_id) = 3
) AS results_subset
JOIN results ON results_subset.id = results.id
JOIN tags ON results.id = tags.result_id
WHERE
    tags.name = 'self employed'
    OR tags.name = 'rental income'
    OR tags.name = 'commission income'
    OR tags.name = 'bankruptcy'
    OR tags.name = 'condo approval'
GROUP BY results.ID

Notice where I chose to include the conditions WHERE on scope and published. This choice is based on the principle that filters should be stated as early as possible. You may get better performance if you place them in the outer query but it really depends on cardinalities.

A word of warning: fields usefulness and shares are neither part of the GROUP BY function not included in an aggregation function. This is allowed by MySQL but highly dangerous. If usefulness and shares belong to a table other than result (the table being GROUP'ed BY), the values returned in your query are undefined.

Elburt answered 3/5, 2013 at 15:49 Comment(1)
Thanks so much for the detailed answer and I'm sorry it's taken me so long to respond. I've been away for the last couple of days. I'll give this a go later tonight and let you know how I get on!Brutality
S
1

write it as follows:

   "sum((5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE))), 
        (5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE))) ,
        (5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE))),
        (5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE))),
        (5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE))),
      (1*usefulness), (10*shares)) as score"
Subvention answered 4/5, 2013 at 17:10 Comment(0)
J
0

You need to SUM() the Score because ONE line only matches ONE Tag.

In your Query selected multiple Rows and grouped them by ID, so you are getting the Result for ONE Row only and that would always be 5 in your case.

Journalism answered 3/5, 2013 at 11:23 Comment(0)
S
0

I think your query is much too complicated. Try this:

SELECT
    results.*,
    5 * count(distinct tags.name) + 1*usefulness + 10*shares AS score 
FROM results
JOIN categories c on results.ID = c.RESULT_ID
    AND c.name in ('purchase', 'condo', 'va')
JOIN tags ON results.id = tags.result_id
    AND tags.name in ('self employed', 'rental income', 'commission income', 'bankruptcy', 'condo approval')
WHERE results.scope in ('all', 'hi')
AND published = 1
GROUP BY 1, 2, 3, 4, 5 -- list as many numbers here as there are columns in "results" 
HAVING count(distinct c.c_id) = 3
ORDER BY score DESC 
LIMIT 8 OFFSET 0

One key problem you had was the grouping - to make it work properly, you need to either name, or reference by selected position, all columns of the results table. You haven't given the table schema, so I couldn't know what to write. I guessed 5 columns, hence the GROUP BY 1, 2, 3, 4, 5, but you need to make sure this is correct.

I tidied up your ORs by changing them to INs - doing this will allow indexes to be used on these columns, if such indexes exist ("OR" won't use an index).

I have moved some of the WHERE clause conditions up into the JOIN conditions where it made sense - this should improve performance.

Skysail answered 9/5, 2013 at 14:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.