select random value based on probability chance
Asked Answered
P

2

6

How do I select a random row from the database based on the probability chance assigned to each row.
Example:

Make        Chance  Value
ALFA ROMEO  0.0024  20000
AUDI        0.0338  35000
BMW         0.0376  40000
CHEVROLET   0.0087  15000
CITROEN     0.016   15000
........

How do I select random make name and its value based on the probability it has to be chosen.

Would a combination of rand() and ORDER BY work? If so what is the best way to do this?

Postilion answered 7/11, 2015 at 22:15 Comment(5)
Does the chance sum to 1? SELECT SUM(CHANCE) FROM table?Eduino
Maybe you find this useful: jan.kneschke.de/projects/mysql/order-by-rand and this: #4329896Raynard
@lad2025 yes, It sums to 1.0001Postilion
Check Random Weighted ChoiceEduino
How many records are there in the table that is having the record selected from it?Locris
M
8

You can do this by using rand() and then using a cumulative sum. Assuming they add up to 100%:

select t.*
from (select t.*, (@cumep := @cumep + chance) as cumep
      from t cross join
           (select @cumep := 0, @r := rand()) params
     ) t
where @r between cumep - chance and cumep
limit 1;

Notes:

  • rand() is called once in a subquery to initialize a variable. Multiple calls to rand() are not desirable.
  • There is a remote chance that the random number will be exactly on the boundary between two values. The limit 1 arbitrarily chooses 1.
  • This could be made more efficient by stopping the subquery when cumep > @r.
  • The values do not have to be in any particular order.
  • This can be modified to handle chances where the sum is not equal to 1, but that would be another question.
Mashhad answered 7/11, 2015 at 22:32 Comment(0)
H
0

I had the same requirements and was trying to write a query for SQL Sever. My answer is base on @gordon-linoff's query of using cumulative sum for probabilities.

with rand AS (SELECT id, random=RAND() FROM Tiers)
   , cumsum AS (SELECT id, cum_sum=SUM(probability) Over (Order By id) from Tiers)
Select TOP 1 t.id,
             t.name,
             t.probability
FROM Tiers t
         inner join rand r on t.id = r.id
         inner join cumsum c on t.id = c.id
WHERE c.cum_sum - r.random >= 0
ORDER BY c.cum_sum - r.random ASC

I have wrote a full summary to test this at https://github.com/AlahmadiQ8/cumulative-probability-sql

Here is a visual explanation of the cumulative sum probability

item probability cumulative
A 0.2 0.2
B 0.3 0.5
C 0.5 1

If random number x = 0.45, then most we should return item B because x is 0.2 < x <= 0.5.

                           x
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
           0.2               0.5                            1
└──────────┘ └───────────────┘ └───────────────────────────┘
      A              B                        C
Hemistich answered 23/5, 2023 at 6:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.