PMML 2.1 - Association Rules
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.
An Association Rule model consists of four major parts:
- Model attributes
- Items
- ItemSets
- AssociationRules
<xs:element name="AssociationModel"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> <xs:element ref="MiningSchema" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Item" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Itemset" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="AssociationRule" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> </xs:sequence> <xs:attribute name="modelName" type="xs:string" /> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" /> <xs:attribute name="algorithmName" type="xs:string" /> <xs:attribute name="numberOfTransactions" type="INT-NUMBER" use="required" /> <xs:attribute name="maxNumberOfItemsPerTA" type="INT-NUMBER" /> <xs:attribute name="avgNumberOfItemsPerTA" type="REAL-NUMBER" /> <xs:attribute name="minimumSupport" type="PROB-NUMBER" use="required" /> <xs:attribute name="minimumConfidence" type="PROB-NUMBER" use="required" /> <xs:attribute name="lengthLimit" type="INT-NUMBER" /> <xs:attribute name="numberOfItems" type="INT-NUMBER" use="required" /> <xs:attribute name="numberOfItemsets" type="INT-NUMBER" use="required" /> <xs:attribute name="numberOfRules" type="INT-NUMBER" use="required" /> </xs:complexType> </xs:element> |
Here is a description of the attributes:
numberOfTransactions: The number of transactions (baskets of items) contained in the input data.
maxNumberOfItemsPerTA The number of items contained in the largest transaction.
avgNumberOfItemsPerTA: The average number of items contained in a transaction.
minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.
minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).
lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.
numberOfItems: The number of different items contained in the input data.
numberOfItemsets: The number of itemsets contained in the model.
numberOfRules: The number of rules contained in the model.
We consider items next:
<xs:element name="Item"> <xs:complexType> <xs:attribute name="id" type="xs:string" use="required" /> <xs:attribute name="value" type="xs:string" use="required" /> <xs:attribute name="mappedValue" type="xs:string" /> <xs:attribute name="weight" type="REAL-NUMBER" /> </xs:complexType> </xs:element> |
Here is a description of the attributes in a item:
id: An identification to uniquely identify an item.
value: The value of the item as in the input data.
mappedValue: Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.
weight : The weight of the item. For example, the price or value of an item.
We consider itemsets next:
<xs:element name="Itemset"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="ItemRef" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> </xs:sequence> <xs:attribute name="id" type="xs:string" use="required" /> <xs:attribute name="support" type="PROB-NUMBER" /> <xs:attribute name="numberOfItems" type="INT-NUMBER" /> </xs:complexType> </xs:element> |
Here is a description of the attributes in a item:
id: An identification to uniquely identify an itemset
support: The relative support of the itemset
numberOfItems: The number of items contained in this itemset
ItemRef: Item references to point to elements of type item.
<xs:element name="ItemRef"> <xs:complexType> <xs:attribute name="itemRef" type="xs:string" use="required" /> </xs:complexType> </xs:element> |
The attribute itemRef is defined above.
We consider association rules of the form "<antecedent itemset> => <consequent itemset>" next:
<xs:element name="AssociationRule"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> </xs:sequence> <xs:attribute name="support" type="PROB-NUMBER" use="required" /> <xs:attribute name="confidence" type="PROB-NUMBER" use="required" /> <xs:attribute name="antecedent" type="xs:string" use="required" /> <xs:attribute name="consequent" type="xs:string" use="required" /> </xs:complexType> </xs:element> |
Here is a description of the attributes in an AssociationRule:
support: The relative support of the rule
confidence: The confidence of the rule
antecedent: The id value of the itemset which is the antecedent of the rule
consequent: The id value of the itemset which is the consequent of the rule
Here is an example of an association model:
<?xml version="1.0" ?>
<PMML version="2.0" >
<Header copyright="www.dmg.org"
description="example model for association rules"/>
<DataDictionary numberOfFields="2" >
<DataField name="transaction" optype="categorical" />
<DataField name="item" optype="categorical" />
</DataDictionary>
<AssociationModel
functionName="associationRules"
numberOfTransactions="4" numberOfItems="3"
minimumSupport="0.6" minimumConfidence="0.5"
numberOfItemsets="3" numberOfRules="2">
<MiningSchema>
<MiningField name="transaction"/>
<MiningField name="item"/>
</MiningSchema>
<!-- We have three items in our input data -->
<Item id="1" value="Cracker" />
<Item id="2" value="Coke" />
<Item id="3" value="Water" />
<!-- and two frequent itemsets with a single item -->
<Itemset id="1" support="1.0" numberOfItems="1">
<ItemRef itemRef="1" />
</Itemset>
<Itemset id="2" support="1.0" numberOfItems="1">
<ItemRef itemRef="3" />
</Itemset>
<!-- and one frequent itemset with two items. -->
<Itemset id="3" support="1.0" numberOfItems="2">
<ItemRef itemRef="1" />
<ItemRef itemRef="3" />
</Itemset>
<!-- Two rules satisfy the requirements -->
<AssociationRule support="1.0" confidence="1.0"
antecedent="1" consequent="2" />
<AssociationRule support="1.0" confidence="1.0"
antecedent="2" consequent="1" />
</AssociationModel>
</PMML>
|