Association Rules
PMML2.0 Menu


PMML Notice and License

General Structure




Data Flow









Association Rules




PMML 2.0 -- Association Rules

The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.

The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.

     <!ENTITY             %ELEMENT-ID              "CDATA">

An Association Rule model consists of four major parts:

     <!ELEMENT AssociationModel (Extension*, MiningSchema,
          Item+, Itemset+, AssociationRule+, Extension*)>

     <!ATTLIST AssociationModel
         modelName               CDATA                    #IMPLIED
         functionName            %MINING-FUNCTION;        #REQUIRED
         algorithmName           CDATA                    #IMPLIED
         numberOfTransactions    %INT-NUMBER;             #REQUIRED
         maxNumberOfItemsPerTA   %INT-NUMBER;             #IMPLIED
         avgNumberOfItemsPerTA   %REAL-NUMBER;            #IMPLIED
         minimumSupport          %PROB-NUMBER;            #REQUIRED
         minimumConfidence       %PROB-NUMBER;            #REQUIRED
         lengthLimit             %INT-NUMBER;             #IMPLIED
         numberOfItems           %INT-NUMBER;             #REQUIRED
         numberOfItemsets        %INT-NUMBER;             #REQUIRED
         numberOfRules           %INT-NUMBER;             #REQUIRED

Attribute description:

    numberOfTransactions: The number of transactions (baskets of items) contained in the input data.

    maxNumberOfItemsPerTA: The number of items contained in the largest transaction.

    avgNumberOfItemsPerTA: The average number of items contained in a transaction.

    minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.

    minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).

    lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.

    numberOfItems: The number of different items contained in the input data.

    numberOfItemsets: The number of itemsets contained in the model.

    numberOfRules: The number of rules contained in the model.

Items contained in itemsets:

     <!ELEMENT Item EMPTY>

     <!ATTLIST Item
          id                     %ELEMENT-ID;            #REQUIRED
          value                  CDATA                   #REQUIRED
          mappedValue            CDATA                   #IMPLIED
          weight                 %REAL-NUMBER;           #IMPLIED

Attribute description:

    id: An identification to uniquely identify an item.

    value: The value of the item as in the input data.

    mappedValue: Optional, a value to which the original item value is mapped.

    • For instance, this could be a product name if the original value is an EAN or SKU code.

    weight : The weight of the item. For example, the price or value of an item.

Itemsets which are contained in rules

     <!ELEMENT Itemset (Extension*, ItemRef+)>
     <!ATTLIST Itemset
          id                     %ELEMENT-ID;           #REQUIRED
          support                %PROB-NUMBER;          #IMPLIED
          numberOfItems          %INT-NUMBER;           #IMPLIED

Attribute description:

    id : An identification to uniquely identify an itemset

    support : The relative support of the itemset

    numberOfItems : The number of items contained in this itemset

    Subelements: Item references to point to elements of type item.

     <!ELEMENT ItemRef EMPTY>

     <!ATTLIST ItemRef
          itemRef       %ELEMENT-ID;   #REQUIRED

Attribute description:

itemRef : The id value of an item element

Rules: Elements of the form
<antecedent itemset> => <consequent itemset>

     <!ELEMENT AssociationRule ( Extension* )>

     <!ATTLIST AssociationRule
          support          %PROB-NUMBER;    #REQUIRED
          confidence       %PROB-NUMBER;    #REQUIRED
          antecedent       %ELEMENT-ID;     #REQUIRED
          consequent       %ELEMENT-ID;     #REQUIRED

Attribute definitions:

support : The relative support of the rule
confidence : The confidence of the rule
antecedent : The id value of the itemset which is the antecedent of the rule
consequent : The id value of the itemset which is the consequent of the rule


Let's assume we have four transactions with the following data: t1: Cracker, Coke, Water t2: Cracker, Water t3: Cracker, Water t4: Cracker, Coke, Water

     <?xml version="1.0" ?>
     <PMML version="2.0" >
     <Header copyright=""
          description="example model for association rules"/>
     <DataDictionary numberOfFields="2" >
     <DataField name="transaction" optype="categorical" />
     <DataField name="item" optype="categorical" />
         numberOfTransactions="4" numberOfItems="3"
         minimumSupport="0.6"     minimumConfidence="0.5"
         numberOfItemsets="3"     numberOfRules="2">
                <MiningField name="transaction"/>
                <MiningField name="item"/>

     <!-- We have three items in our input data -->
     <Item id="1" value="Cracker" />
     <Item id="2" value="Coke" />
     <Item id="3" value="Water" />

     <!-- and two frequent itemsets with a single item -->

     <Itemset id="1" support="1.0" numberOfItems="1">
        <ItemRef itemRef="1" />

     <Itemset id="2" support="1.0" numberOfItems="1">
        <ItemRef itemRef="3" />

     <!-- and one frequent itemset with two items. -->

     <Itemset id="3" support="1.0" numberOfItems="2">
        <ItemRef itemRef="1" />
        <ItemRef itemRef="3" />

     <!-- Two rules satisfy the requirements -->

     <AssociationRule support="1.0" confidence="1.0"
                      antecedent="1" consequent="2" />

     <AssociationRule support="1.0" confidence="1.0"
                      antecedent="2" consequent="1" />

e-mail info at