PMML 2.1 - Association Rules
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.
An Association Rule model consists of four major parts:
- Model attributes
- Items
- ItemSets
- AssociationRules
<xs:element name="AssociationModel"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> <xs:element ref="MiningSchema" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Item" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Itemset" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="AssociationRule" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> </xs:sequence> <xs:attribute name="modelName" type="xs:string" /> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" /> <xs:attribute name="algorithmName" type="xs:string" /> <xs:attribute name="numberOfTransactions" type="INT-NUMBER" use="required" /> <xs:attribute name="maxNumberOfItemsPerTA" type="INT-NUMBER" /> <xs:attribute name="avgNumberOfItemsPerTA" type="REAL-NUMBER" /> <xs:attribute name="minimumSupport" type="PROB-NUMBER" use="required" /> <xs:attribute name="minimumConfidence" type="PROB-NUMBER" use="required" /> <xs:attribute name="lengthLimit" type="INT-NUMBER" /> <xs:attribute name="numberOfItems" type="INT-NUMBER" use="required" /> <xs:attribute name="numberOfItemsets" type="INT-NUMBER" use="required" /> <xs:attribute name="numberOfRules" type="INT-NUMBER" use="required" /> </xs:complexType> </xs:element> |
Here is a description of the attributes:
numberOfTransactions: The number of transactions (baskets of items) contained in the input data.
maxNumberOfItemsPerTA The number of items contained in the largest transaction.
avgNumberOfItemsPerTA: The average number of items contained in a transaction.
minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.
minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).
lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.
numberOfItems: The number of different items contained in the input data.
numberOfItemsets: The number of itemsets contained in the model.
numberOfRules: The number of rules contained in the model.
We consider items next:
<xs:element name="Item"> <xs:complexType> <xs:attribute name="id" type="xs:string" use="required" /> <xs:attribute name="value" type="xs:string" use="required" /> <xs:attribute name="mappedValue" type="xs:string" /> <xs:attribute name="weight" type="REAL-NUMBER" /> </xs:complexType> </xs:element> |
Here is a description of the attributes in a item:
id: An identification to uniquely identify an item.
value: The value of the item as in the input data.
mappedValue: Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.
weight : The weight of the item. For example, the price or value of an item.
We consider itemsets next:
<xs:element name="Itemset"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="ItemRef" /> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> </xs:sequence> <xs:attribute name="id" type="xs:string" use="required" /> <xs:attribute name="support" type="PROB-NUMBER" /> <xs:attribute name="numberOfItems" type="INT-NUMBER" /> </xs:complexType> </xs:element> |
Here is a description of the attributes in a item:
id: An identification to uniquely identify an itemset
support: The relative support of the itemset
numberOfItems: The number of items contained in this itemset
ItemRef: Item references to point to elements of type item.
<xs:element name="ItemRef"> <xs:complexType> <xs:attribute name="itemRef" type="xs:string" use="required" /> </xs:complexType> </xs:element> |
The attribute itemRef is defined above.
We consider association rules of the form "<antecedent itemset> => <consequent itemset>" next:
<xs:element name="AssociationRule"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" /> </xs:sequence> <xs:attribute name="support" type="PROB-NUMBER" use="required" /> <xs:attribute name="confidence" type="PROB-NUMBER" use="required" /> <xs:attribute name="antecedent" type="xs:string" use="required" /> <xs:attribute name="consequent" type="xs:string" use="required" /> </xs:complexType> </xs:element> |
Here is a description of the attributes in an AssociationRule:
support: The relative support of the rule
confidence: The confidence of the rule
antecedent: The id value of the itemset which is the antecedent of the rule
consequent: The id value of the itemset which is the consequent of the rule
Here is an example of an association model:
<?xml version="1.0" ?> <PMML version="2.0" > <Header copyright="www.dmg.org" description="example model for association rules"/> <DataDictionary numberOfFields="2" > <DataField name="transaction" optype="categorical" /> <DataField name="item" optype="categorical" /> </DataDictionary> <AssociationModel functionName="associationRules" numberOfTransactions="4" numberOfItems="3" minimumSupport="0.6" minimumConfidence="0.5" numberOfItemsets="3" numberOfRules="2"> <MiningSchema> <MiningField name="transaction"/> <MiningField name="item"/> </MiningSchema> <!-- We have three items in our input data --> <Item id="1" value="Cracker" /> <Item id="2" value="Coke" /> <Item id="3" value="Water" /> <!-- and two frequent itemsets with a single item --> <Itemset id="1" support="1.0" numberOfItems="1"> <ItemRef itemRef="1" /> </Itemset> <Itemset id="2" support="1.0" numberOfItems="1"> <ItemRef itemRef="3" /> </Itemset> <!-- and one frequent itemset with two items. --> <Itemset id="3" support="1.0" numberOfItems="2"> <ItemRef itemRef="1" /> <ItemRef itemRef="3" /> </Itemset> <!-- Two rules satisfy the requirements --> <AssociationRule support="1.0" confidence="1.0" antecedent="1" consequent="2" /> <AssociationRule support="1.0" confidence="1.0" antecedent="2" consequent="1" /> </AssociationModel> </PMML> |