PMML 1.1 -- DTD of Association Rules Model
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.
<!ENTITY % ELEMENT-ID "CDATA"> |
An Association Rule model consists of four major parts:
<!ELEMENT AssociationModel (Extension*, AssocInputStats, AssocItem+, AssocItemset+, AssocRule+)> <!ATTLIST AssociationModel modelName CDATA #IMPLIED > |
Basic information of the input data:
<!ELEMENT AssocInputStats EMPTY> <!ATTLIST AssocInputStats numberOfTransactions %INT-NUMBER; #REQUIRED maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED minimumSupport %PROB-NUMBER; #REQUIRED minimumConfidence %PROB-NUMBER; #REQUIRED lengthLimit %INT-NUMBER; #IMPLIED numberOfItems %INT-NUMBER; #REQUIRED numberOfItemsets %INT-NUMBER; #REQUIRED numberOfRules %INT-NUMBER; #REQUIRED > |
Attribute description:
numberOfTransactions : The number of transactions (baskets of items) contained in the input data.
maxNumberOfItemsPerTA : The number of items contained in the largest transaction.
avgNumberOfItemsPerTA : The average number of items contained in a transaction.
minimumSupport : The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.
minimumConfidence : The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).
lengthLimit : The maximum number of items contained in a rule which was used to limit the number of rules.
numberOfItems : The number of different items contained in the input data.
numberOfItemsets : The number of itemsets contained in the model.
numberOfRules : The number of rules contained in the model.
Items contained in itemsets
<!ELEMENT AssocItem EMPTY> <!ATTLIST AssocItem id %ELEMENT-ID; #REQUIRED value CDATA #REQUIRED mappedValue CDATA #IMPLIED weight %REAL-NUMBER; #IMPLIED > |
Attribute description:
id : An identification to uniquely identify an item.
value : The value of the item as in the input data.
mappedValue : Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.
weight : The weight of the item. For example, the price or value of an item.
Itemsets which are contained in rules
<!ELEMENT AssocItemset (Extension*, AssocItemRef+)> <!ATTLIST AssocItemset id %ELEMENT-ID; #REQUIRED support %PROB-NUMBER; #REQUIRED numberOfItems %INT-NUMBER; #REQUIRED > |
Attribute description:
id : An identification to uniquely identify an itemset.
support : The relative support of the itemset.
numberOfItems : The number of items contained in this itemset.
Subelements : Item references to point to elements of type item.
<!ELEMENT AssocItemRef EMPTY> <!ATTLIST AssocItemRef itemRef %ELEMENT-ID; #REQUIRED > |
Attribute description:
itemRef : The id value of an item element.
Rules: Elements of the form <antecedent itemset> => <consequent itemset>
<!ELEMENT AssocRule( Extension* )> <!ATTLIST AssocRule support %PROB-NUMBER; #REQUIRED confidence %PROB-NUMBER; #REQUIRED antecedent %ELEMENT-ID; #REQUIRED consequent %ELEMENT-ID; #REQUIRED > |
Attribute definitions:
support : The relative support of the rule.
confidence : The confidence of the rule.
antecedent : The id value of the itemset which is the antecedent of the rule.
consequent : The id value of the itemset which is the consequent of the rule.
Example:
Let's assume we have four transactions with the following data:
t1: Cracker, Coke, Water
t2: Cracker, Water
t3: Cracker, Water
t4: Cracker, Coke, Water
<?xml version="1.0" ?> <PMML version="1.1"> <Header copyright="www.dmg.org" description="example model for association rules"/> <DataDictionary numberOfFields="1"/> <DataField name="item" optype="categorical"/> </DataDictionary> <AssociationModel> <AssocInputStats numberOfTransactions="4" numberOfItems="3" minimumSupport="0.6" minimumConfidence="0.5" numberOfItemsets="3" numberOfRules="2"/> <!-- We have three items in our input data --> <AssocItem id="1"value="Cracker"/> <AssocItem id="2"value="Coke"/> <AssocItem id="3"value="Water"/> <!-- and two frequent itemsets with a single item --> <AssocItemset id="1"support="1.0" numberOfItems="1"/> <AssocItemRef itemRef="1"/> </AssocItemset> <AssocItemset id="2" support="1.0" numberOfItems="1"/> <AssocItemRef itemRef="3"/> </AssocItemset> <!-- and one frequent itemset with two items. --> <AssocItemset id="3" support="1.0" numberOfItems="2"/> <AssocItemRef itemRef="1"/> <AssocItemRef itemRef="3"/> </AssocItemset> <!-- Two rules satisfy the requirements --> <AssocRule support="1.0" confidence="1.0" antecedent="1" consequent="2"/> <AssocRule support="1.0" confidence="1.0" antecedent="2" consequent="1"/> </AssociationModel> </PMML> |