Frequent Itemsets & Association Rules - Apriori Algorithm
Asked Answered
C

2

6

I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,

It's best I explain the complication i'm having with an example:

Here is a transactional dataset:

t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes

The minsup for the above is 0.5 or 50%.

Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:

F1:

Milk = 4
Chicken = 4
Beer = 5
Cheese = 4

I then created my candidates for the second refinement (C2) and narrowed it down to:

F2:

{Milk, Beer} = 4

This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2? F1 to me aren't "sets".

I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:

Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence

It seems superfluous to put F1's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"?

Cowled answered 6/1, 2013 at 15:11 Comment(1)
The empty set is also a set. And there are sets that have 1 element. And they can be Frequent Item sets, without giving a useful association rule.Nasal
S
2

Itemsets with size of 1 considered frequent if their support is suitable. But here you have to consider the minimal threshold. like if your minimal threshold in your example is 2 then F1 will not be considered. But if the minimal threshold is 1 then you have to.

you can take a look here and here for more ideas and examples.

Hope that I helped.

Searchlight answered 6/1, 2013 at 16:47 Comment(2)
In this case min threshold is not specified, is it taken that F1 items are frequent? And should they be represented in the "association rules" even thought they associate to nothing other than themselves?Cowled
unfortunately yes. But, there's no use of apriori without the min threshold. because it will lead to wrong rules. the min threshold is always determined by the data analyst.Searchlight
I
0

If the minimum support threshold (minsup) is 4 / 7, then you should include single items in the set of frequent itemsets if they appear in no less than 4 transactions out of 7. So in your example, you should include them:

Milk = 4 Chicken = 4 Beer = 5 Cheese = 4

For the association rules, they have the form X ==> Y where X and Y are disjoint itemsets and it is generally assumed that X and Y are not empty sets (and this is what is assumed by Apriori). So therefore, you need at least two items to generate an association rule.

Interrogation answered 4/5, 2013 at 22:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.