Multicore-enabling FP-tree Algorithm for Frequent Pattern Mining

Posted by Yuxiong He originally on www.cilk.comon Thu, Jun 25, 2009

In data mining, association rule mining is a popular and well-researched method for discovering interesting relations between variables in large databases.  For example, the rule {onions, potatoes} => {beef} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy beef. Such information can be used as the basis for decisions about marketing activities.  Frequent pattern mining is the basis of association rule mining.   Given a list of transactions, frequent pattern mining returns a complete set of items that occur more than a threshold of times.  One of the fastest and most popular algorithms for frequent pattern mining is the FP-tree [1] algorithm.   In this blog post, I describe how to parallelize the FP-tree algorithm using Cilk++.    

The FP-tree algorithm builds a prefix tree representation of a given database of transactions, and generates a complete group of frequent item sets through recursive tree projections.   Several algorithms [2-4] have been proposed in the literature to address the problem of parallelizing FP-tree.  The algorithm in [2] constructs multiple local parallel trees, interlinks the local counters, and conducts mining jointly on these trees.  The work in [3] classifies hot and cold items, forms a hot node hash table and divides the work manually for tree construction.   These algorithms use different approaches to achieve thread-level parallelism.  That said, they all have one thing in common – they require substantial changes to the sequential algorithm in order to explore parallelism, avoid data races and obtain fair load balancing.  Below I describe a multicore-enabled parallel FP-tree implementation using Cilk++, which maintains the sequential semantics of the FP-tree algorithm, and requires relatively minor changes to the sequential code.  On a 4-core machine, this implementation achieves 2.6x speedup on tree building and 3.3x speedup on frequent set generation for our tests on 1M-transaction database.

The organization of this blog post is as follows.  First, I present a definition of the frequent pattern mining problem.   Then, I illustrate the FP-tree algorithm in three parts – preprocessing of transactions, tree construction, and pattern generation.  Section 2 to 4 describe the algorithm as well as the parallelization of each of the three steps.    Section 5 covers performance.  The sequential implementation we used is from Christian Borgelt's Webpages (http://www.borgelt.net//fpgrowth.html).  (The paper [5] describing this implementation is available on that webpage, and this post uses some examples in the paper to illustrate the sequential algorithm.)

Section 1. Frequent Pattern Mining Problem

Given a list of transactions, frequent pattern mining means to look for sets of items that occur more than a predefined threshold of times in these transactions.   More formally, it is defined [1] as:

Let I be a set of items, and a transaction database DB = { T1, T2, …, Tn}, where Ti is a transaction which contains a set of items in I.  The support (or occurrence frequency) of a pattern A, which is a set of items, is the number of transactions containing A in DB.  A, is a frequent pattern if A’s support is no less than a predefined minimum support threshold S.  Given a transaction database DB and a minimum support threshold, S, the problem of finding the complete set of frequent patterns is called the frequent pattern mining problem.

 

Section 2. Preprocessing of Transactions

FP-tree preprocesses the transaction database as follows: in an initial scan of the frequencies of the individual items (support of single element item sets) are determined.  All infrequent individual items, i.e., all items that appear in few transactions than the user-defined threshold S, are discarded from the transactions since they cannot be part of a frequent pattern.  In addition, the items in each transaction are sorted, so that they are in descending order with respect to their frequency in the database.  Each transaction is then recoded using an integer array respecting each item using its index.  Figure 1 shows an example of a transaction database before and after preprocessing.

Figure 1.  (a) A transaction database before preprocessing.  (b) Item frequency and index table. The threshold value is 3, and f, g will be discarded since their frequency is less than the threshold.  (c) Reduced transaction database with items in transactions sorted in a descending order with respect to their frequency. (d) Reduced transaction database after recoding transactions using item index.

Parallelizing the scan of transactions in database is I/O specific, and is beyond the scope of this post.   Parallelizing the sorting and recording of items in each transaction is straightforward because they are all independent tasks.   As a result, all we need to do is to change the “for” loop in the sequential code to “cilk_for” loop.

 

Section 3. FP-Tree Construction

An FP-Tree is a prefix tree for transactions.  Each path of the tree represents a set of transactions that share the same prefix, and each node corresponds to one item.  In addition, all nodes referring to the same item are linked together in a list, so that all transactions containing a specific item can easily be found and counted by traversing this list.  This list is accessible through a head element, which also records the item’s total number of occurrences in the database.  Figure 2 shows the FP-Tree for the (reduced) database shown in Figure 1.


Figure 2. FP-Tree for the reduced database shown in Figure 1.


In the sequential implementation, the initial FP-Tree is built from the preprocessed transaction database as a simple list of integer arrays (as shown in Figure 1(d)).  The FP-Tree construction includes two phases – sorting and building.    

In the sorting phase, the transactions are sorted lexicographically.  The sequential sorting uses quicksort for large data arrays, and changes to insertion sort after the data has been sorted on a macro scale.  Our parallel sorting simply cilkifies quicksort by spawning out the sorting of sub-arrays, and changes the base case of quicksort to insertion sort when the array size is small.  Figure 3(a) and Figure 3(b) show the sequential and parallel sorting codes, respectively.


Figure 3(a) Sequential quicksort with insertion sort for small scale


Figure 3(b) Parallel quicksort with insertion sort as base case.  (Major changes over sequential version are highlighted in dark-red.)

In the building phase, the sorted list is turned into an FP-Tree using the recursive procedure _build (Figure 4(a)).  At recursion depth k, the k-th item in each transaction is used to split the database into sections, and a node of the FP-tree is created and labeled with that item.   We can see that each section can be processed rather independently.  The only races occur when these sections attempt to update the header list.  Therefore, in the parallel version (Figure 4(b)), we spawn off the work of each section to expose parallelism, and use p_FPTLIST_reducer (similar to a link list reducer supporting node insertion at the head) to represent header list and prevent data races.      During the reduce operation, the link list reducer accumulates the weight and appends the item list from sections.     (To understand the basics of reducers and how to write a reducer, please refer to reducer section of the Cilk++ User Manual).


Figure 4(a) Sequential FP-Tree construction



Figure 4(b) Parallel FP-Tree construction. Major changes from sequential version are highlighted in dark-red color.

Section 4. Pattern Generation

The core operation of FP-tree pattern generation is to compute conditional projection tree of the database, that is, the database of the transactions containing a specific item, with this specific item removed.  An example of computing a projection from the FP-tree of a database is shown in Figure 5.  In this example, we are computing a conditional projection tree based on item e.  From the header list of e, we can see that there are three transactions including item e.  Going upward from the nodes in e’s header list to root, we can draw the edges in the e-based conditional projection subtree using red links.  Including all the nodes connected by these red links and removing nodes with item e, we get the conditional projection tree based on item e (Figure 5(b)).  This projection tree indicates there are 3 conditional patterns {db, dca, bc} based on e in DB, i.e., there are three transactions that include e in DB {dbe, dcae, bce}.

[Figure 5a]

[Figure 5b]
Figure 5(a) Computing item-e-based conditional projection tree from the database shown in Figure 2. (b) The resulting conditional projection tree with base e.


For pattern generation, an FP-Tree of the database is projected recursively.  The pseudocode for pattern generation is shown in Figure 6.  To parallelize the pattern generation, we can unwrap the outermost layer and run the tree-projection of different items in parallel.  If tree projection is thread-safe, our work here is done.  Although intuitively, the tasks of computing projection of the same FP-tree over different items are independent because the FP-Tree is the only shared object and projection should only involve reading of the tree nodes, because the sequential implementation (Figure 7(a)) changes member variables of the tree nodes, which results in data races for concurrent tree projection.  Our parallel implementation introduces thread-local storage for tree projection to make the projection thread-safe (see Figure 7(b)). 

Figure 6. Pseudocode for FP-Tree pattern generation

 


Figure 7(a) Tree projection in sequential code


Figure 7(b) Thread-safe tree projection in parallel code.  Major changes over sequential version are highlighted by dark-red color.

 

The changes indicated above are all we need to parallelize the FP-Tree algorithm using Cilk++ – no reinvention of a parallel algorithm, no rewriting a new parallel implementation, simply adding one reducer and some Cilk++ keywords.  So with this modest effort, a considerably complex algorithm is now multicore-enabled.

Section 5. Results

We tested the performance of this multicore-enabled FP-Tree implementation using a 1M-transaction database on a 16-core AMD Opteron machine.    As shown in Figure 8, on 4 cores, the implementation gets a 2.6x speedup on tree building and 3.3x speedup on frequent set generation; on 8 cores, it gets speedup 3.0x and 4.8x; on 16 cores, it gets 3.6x and 6.6x.   The speedup on a small number of cores is not bad, though the speedup does not scale linearly to 16 cores – primarily due to a memory bandwidth limitation.    As I mentioned in a previous blog post, a simple test to explore whether a multithreaded program has a memory bandwidth problem is to run the sequential version of the program for two cases – (1) run one copy of the sequential program on each core, and (2) run a single copy of the sequential program on the entire machine.   If the program’s execution time in case 1 is significantly larger than that in case 2, the memory bandwidth demand of the program could be high enough to become the primary constraint for program scalability.  If we run a sequential version of FP-Tree test case, the execution time on case (1) is approximately 2.4 times of the execution time on case (2).    This indicates a high memory bandwidth requirement of this application, and explains its non-linear speedup on a larger number of cores.   Moreover, there is literature [3] discussing the high cache miss rate of the FP-tree algorithm and proposing variants of the original FP-tree algorithm with better cache behavior.  It would be interesting to parallelize some cache-aware versions of FP-tree algorithm using Cilk++ and observe its scaling performance.


Figure 8.  Speedup of FP-tree implementation on 16-core machine

References

[1] J. Han, J. Pei, and Y. Yin, '' Mining Frequent Patterns without Candidate Generation'', Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), 2000.
[2] O. Zaiane, M. El-Hajj, and P. Lu. Fast Parallel Association Rules Mining without Candidacy Generation. In IEEE 2001 International Conference on Data Mining (ICDM'2001), 2001.
[3] Liu, L., Li, E., Zhang, Y., and Tang, Z. 2007. Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international Conference on Very Large Data Bases. 2007
[4] H. D. K. Moonesinghe, Moon-Jung Chung, Pang-Ning Tan. Fast Parallel Mining of Frequent Itemsets.  Michigan State University.
[5] C. Borgelt. An Implementation of the FP-growth Algorithm. Workshop Open Source Data Mining Software. 2005

For more complete information about compiler optimizations, see our Optimization Notice.

3 comments

Top
Meriem L.'s picture

HI,

Did you share the entire source code on github? and if so could you please give me a link ? that would be very useful , 

thanks a lot :)

anonymous's picture

can i get the fp-tree algorithm for frequent pattern mining in c++ or java !!!

anonymous's picture

Nice one!

But I think somewhat difficult for beginners...

And I need to know how to identify the conditional pattern base of each item?

I find a blog post which desciibe very simply about <storng><a href="http://hareenlaks.blogspot.com/2011/06/fp-tree-example-how-to-identify.html"> How to draw a FP tree</a><storng> but it not contain how to gain conditional pattern base.

Thanx...........

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.