Association Rules
PMML2.0 Menu

Home


PMML Notice and License

General Structure

Header

Data
Dictionary


Mining
Schema


Data Flow

Transformations

Statistics

Conformance

Taxomony

Trees

Regression

General
Regression


Cluster
Models


Association Rules

Neural
Network


Naive
Bayes


Sequences

PMML 2.0 -- Association Rules

The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.

The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.


     <!ENTITY             %ELEMENT-ID              "CDATA">

An Association Rule model consists of four major parts:


     <!ELEMENT AssociationModel (Extension*, MiningSchema,
          Item+, Itemset+, AssociationRule+, Extension*)>

     <!ATTLIST AssociationModel
         modelName               CDATA                    #IMPLIED
         functionName            %MINING-FUNCTION;        #REQUIRED
         algorithmName           CDATA                    #IMPLIED
         numberOfTransactions    %INT-NUMBER;             #REQUIRED
         maxNumberOfItemsPerTA   %INT-NUMBER;             #IMPLIED
         avgNumberOfItemsPerTA   %REAL-NUMBER;            #IMPLIED
         minimumSupport          %PROB-NUMBER;            #REQUIRED
         minimumConfidence       %PROB-NUMBER;            #REQUIRED
         lengthLimit             %INT-NUMBER;             #IMPLIED
         numberOfItems           %INT-NUMBER;             #REQUIRED
         numberOfItemsets        %INT-NUMBER;             #REQUIRED
         numberOfRules           %INT-NUMBER;             #REQUIRED
     >

Attribute description:

    numberOfTransactions: The number of transactions (baskets of items) contained in the input data.

    maxNumberOfItemsPerTA: The number of items contained in the largest transaction.

    avgNumberOfItemsPerTA: The average number of items contained in a transaction.

    minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.

    minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).

    lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.

    numberOfItems: The number of different items contained in the input data.

    numberOfItemsets: The number of itemsets contained in the model.

    numberOfRules: The number of rules contained in the model.

Items contained in itemsets:


     <!ELEMENT Item EMPTY>

     <!ATTLIST Item
          id                     %ELEMENT-ID;            #REQUIRED
          value                  CDATA                   #REQUIRED
          mappedValue            CDATA                   #IMPLIED
          weight                 %REAL-NUMBER;           #IMPLIED
     >

Attribute description:

    id: An identification to uniquely identify an item.

    value: The value of the item as in the input data.

    mappedValue: Optional, a value to which the original item value is mapped.

    • For instance, this could be a product name if the original value is an EAN or SKU code.

    weight : The weight of the item. For example, the price or value of an item.


Itemsets which are contained in rules


     <!ELEMENT Itemset (Extension*, ItemRef+)>
     <!ATTLIST Itemset
          id                     %ELEMENT-ID;           #REQUIRED
          support                %PROB-NUMBER;          #IMPLIED
          numberOfItems          %INT-NUMBER;           #IMPLIED
     >

Attribute description:

    id : An identification to uniquely identify an itemset

    support : The relative support of the itemset

    numberOfItems : The number of items contained in this itemset

    Subelements: Item references to point to elements of type item.


     <!ELEMENT ItemRef EMPTY>

     <!ATTLIST ItemRef
          itemRef       %ELEMENT-ID;   #REQUIRED
     >

Attribute description:

itemRef : The id value of an item element

Rules: Elements of the form
<antecedent itemset> => <consequent itemset>


     <!ELEMENT AssociationRule ( Extension* )>

     <!ATTLIST AssociationRule
          support          %PROB-NUMBER;    #REQUIRED
          confidence       %PROB-NUMBER;    #REQUIRED
          antecedent       %ELEMENT-ID;     #REQUIRED
          consequent       %ELEMENT-ID;     #REQUIRED
     >

Attribute definitions:

support : The relative support of the rule
confidence : The confidence of the rule
antecedent : The id value of the itemset which is the antecedent of the rule
consequent : The id value of the itemset which is the consequent of the rule

Example:

Let's assume we have four transactions with the following data: t1: Cracker, Coke, Water t2: Cracker, Water t3: Cracker, Water t4: Cracker, Coke, Water


     <?xml version="1.0" ?>
     <PMML version="2.0" >
     <Header copyright="www.dmg.org"
          description="example model for association rules"/>
     <DataDictionary numberOfFields="2" >
     <DataField name="transaction" optype="categorical" />
     <DataField name="item" optype="categorical" />
     </DataDictionary>
     <AssociationModel
         functionName="associationRules"
         numberOfTransactions="4" numberOfItems="3"
         minimumSupport="0.6"     minimumConfidence="0.5"
         numberOfItemsets="3"     numberOfRules="2">
         <MiningSchema>
                <MiningField name="transaction"/>
                <MiningField name="item"/>
         </MiningSchema>

     <!-- We have three items in our input data -->
     <Item id="1" value="Cracker" />
     <Item id="2" value="Coke" />
     <Item id="3" value="Water" />

     <!-- and two frequent itemsets with a single item -->

     <Itemset id="1" support="1.0" numberOfItems="1">
        <ItemRef itemRef="1" />
     </Itemset>

     <Itemset id="2" support="1.0" numberOfItems="1">
        <ItemRef itemRef="3" />
     </Itemset>

     <!-- and one frequent itemset with two items. -->

     <Itemset id="3" support="1.0" numberOfItems="2">
        <ItemRef itemRef="1" />
        <ItemRef itemRef="3" />
     </Itemset>


     <!-- Two rules satisfy the requirements -->

     <AssociationRule support="1.0" confidence="1.0"
                      antecedent="1" consequent="2" />

     <AssociationRule support="1.0" confidence="1.0"
                      antecedent="2" consequent="1" />

    </AssociationModel>
    </PMML>
e-mail info at dmg.org