PMML 2.0 -- Association Rules
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.
<!ENTITY %ELEMENT-ID "CDATA">
|
An Association Rule model consists of four major parts:
<!ELEMENT AssociationModel (Extension*, MiningSchema,
Item+, Itemset+, AssociationRule+, Extension*)>
<!ATTLIST AssociationModel
modelName CDATA #IMPLIED
functionName %MINING-FUNCTION; #REQUIRED
algorithmName CDATA #IMPLIED
numberOfTransactions %INT-NUMBER; #REQUIRED
maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED
avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED
minimumSupport %PROB-NUMBER; #REQUIRED
minimumConfidence %PROB-NUMBER; #REQUIRED
lengthLimit %INT-NUMBER; #IMPLIED
numberOfItems %INT-NUMBER; #REQUIRED
numberOfItemsets %INT-NUMBER; #REQUIRED
numberOfRules %INT-NUMBER; #REQUIRED
>
|
Attribute description:
numberOfTransactions: The number of transactions (baskets of items) contained in the input data.
maxNumberOfItemsPerTA: The number of items contained in the largest transaction.
avgNumberOfItemsPerTA: The average number of items contained in a transaction.
minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.
minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).
lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.
numberOfItems: The number of different items contained in the input data.
numberOfItemsets: The number of itemsets contained in the model.
numberOfRules: The number of rules contained in the model.
Items contained in itemsets:
<!ELEMENT Item EMPTY>
<!ATTLIST Item
id %ELEMENT-ID; #REQUIRED
value CDATA #REQUIRED
mappedValue CDATA #IMPLIED
weight %REAL-NUMBER; #IMPLIED
>
|
Attribute description:
- For instance, this could be a product name if the original value is an EAN or SKU code.
id: An identification to uniquely identify an item.
value: The value of the item as in the input data.
mappedValue: Optional, a value to which the original item value is mapped.
weight : The weight of the item. For example, the price or value of an item.
Itemsets which are contained in rules
<!ELEMENT Itemset (Extension*, ItemRef+)>
<!ATTLIST Itemset
id %ELEMENT-ID; #REQUIRED
support %PROB-NUMBER; #IMPLIED
numberOfItems %INT-NUMBER; #IMPLIED
>
|
Attribute description:
id : An identification to uniquely identify an itemset
support : The relative support of the itemset
numberOfItems : The number of items contained in this itemset
Subelements: Item references to point to elements of type item.
<!ELEMENT ItemRef EMPTY>
<!ATTLIST ItemRef
itemRef %ELEMENT-ID; #REQUIRED
>
|
Attribute description:
itemRef : The id value of an item element
Rules: Elements of the form
<antecedent itemset> =>
<consequent itemset>
<!ELEMENT AssociationRule ( Extension* )>
<!ATTLIST AssociationRule
support %PROB-NUMBER; #REQUIRED
confidence %PROB-NUMBER; #REQUIRED
antecedent %ELEMENT-ID; #REQUIRED
consequent %ELEMENT-ID; #REQUIRED
>
|
Attribute definitions:
support : The relative support of the ruleconfidence : The confidence of the rule
antecedent : The id value of the itemset which is the antecedent of the rule
consequent : The id value of the itemset which is the consequent of the rule
Example:
Let's assume we have four transactions with the following data: t1: Cracker, Coke, Water t2: Cracker, Water t3: Cracker, Water t4: Cracker, Coke, Water
<?xml version="1.0" ?>
<PMML version="2.0" >
<Header copyright="www.dmg.org"
description="example model for association rules"/>
<DataDictionary numberOfFields="2" >
<DataField name="transaction" optype="categorical" />
<DataField name="item" optype="categorical" />
</DataDictionary>
<AssociationModel
functionName="associationRules"
numberOfTransactions="4" numberOfItems="3"
minimumSupport="0.6" minimumConfidence="0.5"
numberOfItemsets="3" numberOfRules="2">
<MiningSchema>
<MiningField name="transaction"/>
<MiningField name="item"/>
</MiningSchema>
<!-- We have three items in our input data -->
<Item id="1" value="Cracker" />
<Item id="2" value="Coke" />
<Item id="3" value="Water" />
<!-- and two frequent itemsets with a single item -->
<Itemset id="1" support="1.0" numberOfItems="1">
<ItemRef itemRef="1" />
</Itemset>
<Itemset id="2" support="1.0" numberOfItems="1">
<ItemRef itemRef="3" />
</Itemset>
<!-- and one frequent itemset with two items. -->
<Itemset id="3" support="1.0" numberOfItems="2">
<ItemRef itemRef="1" />
<ItemRef itemRef="3" />
</Itemset>
<!-- Two rules satisfy the requirements -->
<AssociationRule support="1.0" confidence="1.0"
antecedent="1" consequent="2" />
<AssociationRule support="1.0" confidence="1.0"
antecedent="2" consequent="1" />
</AssociationModel>
</PMML>
|