PMML 1.1 -- DTD of Association Rules Model
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.
<!ENTITY % ELEMENT-ID "CDATA"> |
An Association Rule model consists of four major parts:
<!ELEMENT AssociationModel (Extension*, AssocInputStats, AssocItem+, AssocItemset+, AssocRule+)> <!ATTLIST AssociationModel modelName CDATA #IMPLIED > |
Basic information of the input data:
<!ELEMENT AssocInputStats EMPTY> <!ATTLIST AssocInputStats numberOfTransactions %INT-NUMBER; #REQUIRED maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED minimumSupport %PROB-NUMBER; #REQUIRED minimumConfidence %PROB-NUMBER; #REQUIRED lengthLimit %INT-NUMBER; #IMPLIED numberOfItems %INT-NUMBER; #REQUIRED numberOfItemsets %INT-NUMBER; #REQUIRED numberOfRules %INT-NUMBER; #REQUIRED > |
Attribute description:
numberOfTransactions : The number of transactions (baskets of items) contained in the input data.
maxNumberOfItemsPerTA : The number of items contained in the largest transaction.
avgNumberOfItemsPerTA : The average number of items contained in a transaction.
minimumSupport : The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.
minimumConfidence : The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).
lengthLimit : The maximum number of items contained in a rule which was used to limit the number of rules.
numberOfItems : The number of different items contained in the input data.
numberOfItemsets : The number of itemsets contained in the model.
numberOfRules : The number of rules contained in the model.
Items contained in itemsets
<!ELEMENT AssocItem EMPTY> <!ATTLIST AssocItem id %ELEMENT-ID; #REQUIRED value CDATA #REQUIRED mappedValue CDATA #IMPLIED weight %REAL-NUMBER; #IMPLIED > |
Attribute description:
id : An identification to uniquely identify an item.
value : The value of the item as in the input data.
mappedValue : Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.
weight : The weight of the item. For example, the price or value of an item.
Itemsets which are contained in rules
<!ELEMENT AssocItemset (Extension*, AssocItemRef+)> <!ATTLIST AssocItemset id %ELEMENT-ID; #REQUIRED support %PROB-NUMBER; #REQUIRED numberOfItems %INT-NUMBER; #REQUIRED > |
Attribute description:
id : An identification to uniquely identify an itemset.
support : The relative support of the itemset.
numberOfItems : The number of items contained in this itemset.
Subelements : Item references to point to elements of type item.
<!ELEMENT AssocItemRef EMPTY> <!ATTLIST AssocItemRef itemRef %ELEMENT-ID; #REQUIRED > |
Attribute description:
itemRef : The id value of an item element.
Rules: Elements of the form <antecedent itemset> => <consequent itemset>
<!ELEMENT AssocRule( Extension* )> <!ATTLIST AssocRule support %PROB-NUMBER; #REQUIRED confidence %PROB-NUMBER; #REQUIRED antecedent %ELEMENT-ID; #REQUIRED consequent %ELEMENT-ID; #REQUIRED > |
Attribute definitions:
support : The relative support of the rule.
confidence : The confidence of the rule.
antecedent : The id value of the itemset which is the antecedent of the rule.
consequent : The id value of the itemset which is the consequent of the rule.
Example:
Let's assume we have four transactions with the following data:
t1: Cracker, Coke, Water
t2: Cracker, Water
t3: Cracker, Water
t4: Cracker, Coke, Water
<?xml version="1.0" ?>
<PMML version="1.1">
<Header copyright="www.dmg.org"
description="example model for association rules"/>
<DataDictionary numberOfFields="1"/>
<DataField name="item" optype="categorical"/>
</DataDictionary>
<AssociationModel>
<AssocInputStats numberOfTransactions="4"
numberOfItems="3" minimumSupport="0.6"
minimumConfidence="0.5" numberOfItemsets="3"
numberOfRules="2"/>
<!-- We have three items in our input data -->
<AssocItem id="1"value="Cracker"/>
<AssocItem id="2"value="Coke"/>
<AssocItem id="3"value="Water"/>
<!-- and two frequent itemsets with a single item -->
<AssocItemset id="1"support="1.0"
numberOfItems="1"/>
<AssocItemRef itemRef="1"/>
</AssocItemset>
<AssocItemset id="2" support="1.0"
numberOfItems="1"/>
<AssocItemRef itemRef="3"/>
</AssocItemset>
<!-- and one frequent itemset with two items. -->
<AssocItemset id="3" support="1.0"
numberOfItems="2"/>
<AssocItemRef itemRef="1"/>
<AssocItemRef itemRef="3"/>
</AssocItemset>
<!-- Two rules satisfy the requirements -->
<AssocRule support="1.0" confidence="1.0"
antecedent="1" consequent="2"/>
<AssocRule support="1.0" confidence="1.0"
antecedent="2" consequent="1"/>
</AssociationModel>
</PMML>
|