Association Rules
PMML3.0 Menu


PMML Notice and License



General Structure










Built-in Functions

Model Composition

Model Verification

Association Rules








Text Models


Vector Machine

PMML 3.0 - Association Rules

The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.

The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.

An Association Rule model consists of four major parts:

  • Model attributes
  • Items
  • ItemSets
  • AssociationRules

  <xs:element name="AssociationModel">
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="MiningSchema"/>
        <xs:element ref="ModelStats" minOccurs="0"/>
        <xs:element ref="LocalTransformations" minOccurs="0" />
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="Item" />
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element ref="Itemset" />
          <xs:element ref="AssociationRule" />
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:attribute name="modelName" type="xs:string" />
      <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
      <xs:attribute name="algorithmName" type="xs:string" />
      <xs:attribute name="numberOfTransactions" type="INT-NUMBER" use="required" />
      <xs:attribute name="maxNumberOfItemsPerTA" type="INT-NUMBER" />
      <xs:attribute name="avgNumberOfItemsPerTA" type="REAL-NUMBER" />
      <xs:attribute name="minimumSupport" type="PROB-NUMBER" use="required" />
      <xs:attribute name="minimumConfidence" type="PROB-NUMBER" use="required" />
      <xs:attribute name="lengthLimit" type="INT-NUMBER" />
      <xs:attribute name="numberOfItems" type="INT-NUMBER" use="required" />
      <xs:attribute name="numberOfItemsets" type="INT-NUMBER" use="required" />
      <xs:attribute name="numberOfRules" type="INT-NUMBER" use="required" />

There is at most one DerivedField in the TransformationDictionary in the model. This can be a transformed item field.

An AssociationModel can contain any number of Itemsets and AssociationRules. These elements can be mixed but all Itemsets that are used in an AssociationRule element must appear before the rule element.

Here is a description of the attributes:

numberOfTransactions: The number of transactions (baskets of items) contained in the input data.

maxNumberOfItemsPerTA The number of items contained in the largest transaction.

avgNumberOfItemsPerTA: The average number of items contained in a transaction.

minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.

minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).

lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.

numberOfItems: The number of different items contained in the input data.

numberOfItemsets: The number of itemsets contained in the model.

numberOfRules: The number of rules contained in the model.

We consider items next:

  <xs:element name="Item">
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:attribute name="id" type="xs:string" use="required" />
      <xs:attribute name="value" type="xs:string" use="required" />
      <xs:attribute name="mappedValue" type="xs:string" />
      <xs:attribute name="weight" type="REAL-NUMBER" />

Here is a description of the attributes in a item:

id: An identification to uniquely identify an item.

value: The value of the item as in the input data.

mappedValue: Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.

weight : The weight of the item. For example, the price or value of an item.

Obviously the id of a Item must be unique. Furthermore the Item values must be unique too. That is, an AssocationModel must not have different instances of Item where the values of the attribute named 'value' are duplicates. The entries in mappedValue may be the same, though.

We consider itemsets next:

  <xs:element name="Itemset">
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="ItemRef" />
      <xs:attribute name="id" type="xs:string" use="required" />
      <xs:attribute name="support" type="PROB-NUMBER" />
      <xs:attribute name="numberOfItems" type="xs:nonNegativeInteger" />

Here is a description of the attributes in an item:

id: An identification to uniquely identify an itemset.

support: The relative support of the itemset.

support(set) = (number of transactions containing the set) / (total number of transactions)

numberOfItems: The number of items contained in this itemset

ItemRef: Item references to point to elements of type item.

  <xs:element name="ItemRef">
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:attribute name="itemRef" type="xs:string" use="required" />

The attribute itemRef is defined above.

We consider association rules of the form "<antecedent itemset> => <consequent itemset>" next:

  <xs:element name="AssociationRule">
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:attribute name="antecedent" type="xs:string" use="required" />
      <xs:attribute name="consequent" type="xs:string" use="required" />
      <xs:attribute name="support" type="PROB-NUMBER" use="required" />
      <xs:attribute name="confidence" type="PROB-NUMBER" use="required" />
      <xs:attribute name="lift" type="xs:float" use="optional" />
      <xs:attribute name="id" type="xs:string" use="optional" />

Here is a description of the attributes in an AssociationRule:

antecedent: The id value of the itemset which is the antecedent of the rule. We represent the itemset by the letter A.

consequent: The id value of the itemset which is the consequent of the rule. We represent the itemset by the letter C.

support: The support of the rule, that is, the relative frequency of transactions that contain A and C.

support(A->C) = support(A+C)

confidence: The confidence of the rule.

confidence(A->C) = support(A+C) / support(A)

lift: The lift value of the rule. If the XML attribute is specified explicitly in the rule, the following equation must hold true.

lift(A->C) = confidence(A->C) / support(C)

id: An identification to uniquely identify an association rule.

A very popular measure of interestingness of a rule is lift. Lift values greater than 1.0 indicate that transactions containing A tend to contain C more often than transactions that do not contain A.

Another measure of interestingness is leverage. An association with higher frequency and lower lift may be more interesting than an alternative rules with lower frequency and higher lift. The former can be more important in practice because it applies to more cases. The value can be computed by

leverage(A->C) = support(A->C) - support(A)*support(C)
This is the difference between the observed frequency of A+C and the frequencey that would be expected if A and C were independent.

A number of other measures can be defined. PMML does not provide specific attributes in a rule for leverage and measures other than support, confidence, and lift. Further measures can usually be derived from the information in the PMML model. Note that confidence and lift can be derived from support values. The attributes confidence and lift have been included in PMML because they are very common.

Here is an example of an association model:

  <?xml version="1.0" ?>
  <PMML version="3.0" >
    <Header copyright=""
          description="example model for association rules"/>
    <DataDictionary numberOfFields="2" >
      <DataField name="transaction" optype="categorical" />
      <DataField name="item" optype="categorical" />
         numberOfTransactions="4" numberOfItems="3"
         minimumSupport="0.6"     minimumConfidence="0.5"
         numberOfItemsets="3"     numberOfRules="2">
        <MiningField name="transaction" usageType="group" />
        <MiningField name="item" usageType="predicted"/>

      <!-- We have three items in our input data -->
      <Item id="1" value="Cracker" />
      <Item id="2" value="Coke" />
      <Item id="3" value="Water" />

      <!-- and two frequent itemsets with a single item -->

      <Itemset id="1" support="1.0" numberOfItems="1">
        <ItemRef itemRef="1" />

      <Itemset id="2" support="1.0" numberOfItems="1">
        <ItemRef itemRef="3" />

      <!-- and one frequent itemset with two items. -->

      <Itemset id="3" support="1.0" numberOfItems="2">
        <ItemRef itemRef="1" />
        <ItemRef itemRef="3" />

      <!-- Two rules satisfy the requirements -->

      <AssociationRule support="1.0" confidence="1.0"
          antecedent="1" consequent="2" />

      <AssociationRule support="1.0" confidence="1.0"
          antecedent="2" consequent="1" />


e-mail info at