Difference between Closed and open Sequential Pattern Mining Algorithms
Asked Answered
D

3

27

I want to use some algorithms to mine my log data.

I found a pattern mining framework on: http://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php

I have tried several algorithms, the BIDE+ algorithm performs the best.

The BIDE+ algorithm is for mining frequent closed sequential patterns from a sequence database.

Can someone explain the definition about "closed" sequential patterns and open ones?

Degrading answered 22/4, 2013 at 10:57 Comment(0)
H
53

Glad that you are using my SPMF software.

The support of a sequential pattern is the number of sequences that contains the sequential pattern.

A frequent sequential pattern is a pattern that appears in at least "minsup" sequences of a sequence database, where minsup is a parameter set by the user.

A frequent closed sequential pattern is a frequent sequential pattern such that it is not included in another sequential pattern having exactly the same support.

Algorithms such as PrefixSpan finds frequent sequential patterns. Algorithms such as BIDE+ finds frequent closed sequential patterns. BIDE+ is usually much faster than PrefixSpan because it uses pruning techniques to avoid generating all sequential patterns. Moreover, the set of closed patterns is usually much smaller than the set of sequential patterns so BIDE+ is also more memory efficient.

Another important thing to know is that closed sequential patterns are a compact and lossless representation of all sequential patterns. This means that the set of closed sequential patterns is usually much smaller but it is lossless, which means that it allows recovering the full set of sequential patterns (no information is loss), which is very convenient.

I can give you a simple example.

Let's consider 4 sequences:

a  b  c  d  e
a  b  d
b  e  a  
b  c  d  e

Let's say that minsup = 2.

b c is a frequent sequential patterns because it appears in two sequences (it has a support of 2). b c is not a closed sequential patterns because it is contained in a larger sequential pattern b c d having the same support.

b c d has a support of 2. It is also not a closed sequential pattern because it is contained in a larger sequential pattern b c d e having the same support. b c d e is a closed sequential pattern because there it is not included in any other sequential pattern having the same support.

By the way, you can also check my survey about sequential pattern mining. It gives a good introduction about this topic and the different algorithms.

Horacehoracio answered 26/4, 2013 at 14:54 Comment(5)
This is going to help w my dissertation. Seriously. Thank you!Debonair
First of all,thanks for your survey and SPMF,and here your explanation is quite clear,but the example may be not quite appropriate since the pattern b c appears in three sequences(1,3,4), so a minor change may be better.Manakin
@Manakin Thanks for the comment. Yes, you are right. Fixed that error. Glad you like the survey and SPMF :-)Horacehoracio
Great explanation of the closed-open concept.Subglacial
Hi, what about sequential generator patterns? what's generator? I can't find any details about it by google.Gangway
L
2

Check out this chapter on Frequent Itemsets and Frequent item sets Mining & Association Rules

Londoner answered 25/4, 2013 at 18:42 Comment(0)
W
1

Google for "closed frequent itemsets". There will be plenty of pages explaining this, as will any data mining book (look for the APRIORI algorithm).

"Closed" says that there is no larger itemset with the same support. There can be larger itemsets, but they must have lower support.

For most use cases it is either sufficient to look at maximal or at closed itemsets only.

Washboard answered 23/4, 2013 at 7:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.