Association Rules
Association rules mining is the method for uncovering the most
important relationships between variables. Its main application is a
store basket analysis, which aims at discovery of a relationship
between groups of products with some level of confidence.
Details
The library provides Apriori algorithm for association rule mining
[Agrawal94].
Let
be a set of items
(products) and subset
is a transaction associated with item set
I. The association rule has the form:
, where
,
, and
intersection of
X
and Y
is empty:
. The lefthandside set of
items (itemset
) X
is called antecedent, while the righthandside
itemset Y is called consequent of the rule.Let
be a set of
transactions, each associated with item set I. Item subset
has
support
s
in the transaction set D
if s
percent of transactions in D
contains X
.The association rule
in the transaction set
D
holds with
confidence c
if c
percent of transactions in D
that contain X
also
contains Y
. Confidence of the rule can be represented as conditional
probability:For a given set of transactions
, the minimum support s and minimum confidence c discover
all item sets
X
with support greater than s
and generate all
association rules
with confidence greater than c
.Therefore, the association rule discovery is decomposed into two
stages: mining (training) and discovery (prediction). The mining
stage involves generation of large item sets, that is, the sets that
have support greater than the given parameters. At the discovery
stage, the algorithm generates association rules using the large item
sets identified at the mining stage.
Batch Processing
Algorithm Input
The association rules algorithm accepts the input described below.
Pass the
Input ID
as a parameter to the methods that provide input
for your algorithm.Input ID  Input 

data  Pointer to the
numeric table t with the mining data. Each row consists of two integers:
The input can be an object of any class derived from NumericTable except PackedTriangularMatrix and PackedSymmetricMatrix. 
Algorithm Parameters
The association rules algorithm has the following parameters:
Parameter  Default Value  Description 

algorithmFPType  float  The floatingpoint type that the algorithm uses for intermediate computations. Can be float or double . 
method  defaultDense  The computation method used by the algorithm. The only method supported so far is Apriori. 
minSupport  0.01  Minimal support, a number in the [0,1) interval. 
minConfidence  0.6  Minimal confidence, a number in the [0,1) interval. 
nUniqueItems  0  The total number of unique items. If set to zero, the library automatically determines the number of unique items from the input data. 
nTransactions  0  The total number of transactions. If set to zero, the library automatically determines the number transactions from the input data. 
discoverRules  true  A flag that enables generation of the rules from large item sets. 
itemsetsOrder  itemsetsUnsorted  The sort order of returned item sets:

rulesOrder  rulesUnsorted  The sort order of returned rules:

minItemsetSize  0  A parameter that defines the minimal size of item sets to be included into the array of results. The value of zero imposes no limitations on the minimal size of item sets. 
maxItemsetSize  0  A parameter that defines the maximal size of item sets to be included into the array of results. The value of zero imposes no limitations on the maximal size of item sets. 
Algorithm Output
The association rules algorithm calculates the result described
below. Pass the
Result ID
as a parameter to the methods that access
the results of your algorithm.Result ID  Result 

largeItemsets  Pointer to the numeric table with large item sets. The number of rows in
the table equals the number of items in the large item sets. Each row
contains two integers:

largeItemsetsSupport  Pointer to the
numeric table of support values. Each row contains two integers:

antecedentItemsets  Pointer to the
numeric table that contains the
lefthandside (X) part of the association rules. Each row contains two integers:

conseqentItemsets  Pointer to the
numeric table that contains the
righthandside (Y) part of the association rules. Each row contains two integers:

confidence  Pointer to the
numeric table that contains confidence values
of rules, floatingpoint numbers between 0 and 1. Confidence value in
the ith position corresponds to the rule with the index i. 
By default, the result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from
NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and
СSRNumericTable.
 The library requires transactions and items for each transaction to be passed in the ascending order.
 Numbering of rules starts at 0.
 The library calculates the sizes of numeric tables intended for results in a call to the algorithm. Avoid allocating the memory in numeric tables intended for results because, in general, it is impossible to accurately estimate the required memory size. If the memory interfaced by the numeric tables is allocated and its amount is insufficient to store the results, the algorithm returns an error.
Examples
C++ (CPU)
Batch Processing:
Java*
Python*
Batch Processing:
Performance Considerations
To get the best overall performance of the association rules
algorithm, whenever possible use the following numeric tables and
data types:
 A SOA numeric table of type int to store features.
 A homogenous numeric table of type int to store large item sets, support values, and lefthandside and righthandside parts of association rules.
 A numeric table with the confidence values of the same data type as specified in the algorithmFPType template parameter of the class.
Optimization Notice 

Intel’s compilers may or may not optimize to the same degree for
nonIntel microprocessors for optimizations that are not unique to
Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel.
Microprocessordependent optimizations in this product are intended
for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and
Reference Guides for more information regarding the specific
instruction sets covered by this notice. Notice revision #20110804 