PMML 2.0 -- Association Rules
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.
<!ENTITY %ELEMENT-ID "CDATA"> |
An Association Rule model consists of four major parts:
<!ELEMENT AssociationModel (Extension*, MiningSchema, Item+, Itemset+, AssociationRule+, Extension*)> <!ATTLIST AssociationModel modelName CDATA #IMPLIED functionName %MINING-FUNCTION; #REQUIRED algorithmName CDATA #IMPLIED numberOfTransactions %INT-NUMBER; #REQUIRED maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED minimumSupport %PROB-NUMBER; #REQUIRED minimumConfidence %PROB-NUMBER; #REQUIRED lengthLimit %INT-NUMBER; #IMPLIED numberOfItems %INT-NUMBER; #REQUIRED numberOfItemsets %INT-NUMBER; #REQUIRED numberOfRules %INT-NUMBER; #REQUIRED > |
Attribute description:
numberOfTransactions: The number of transactions (baskets of items) contained in the input data.
maxNumberOfItemsPerTA: The number of items contained in the largest transaction.
avgNumberOfItemsPerTA: The average number of items contained in a transaction.
minimumSupport: The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules.
minimumConfidence: The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent)).
lengthLimit: The maximum number of items contained in a rule which was used to limit the number of rules.
numberOfItems: The number of different items contained in the input data.
numberOfItemsets: The number of itemsets contained in the model.
numberOfRules: The number of rules contained in the model.
Items contained in itemsets:
<!ELEMENT Item EMPTY> <!ATTLIST Item id %ELEMENT-ID; #REQUIRED value CDATA #REQUIRED mappedValue CDATA #IMPLIED weight %REAL-NUMBER; #IMPLIED > |
Attribute description:
- For instance, this could be a product name if the original value is an EAN or SKU code.
id: An identification to uniquely identify an item.
value: The value of the item as in the input data.
mappedValue: Optional, a value to which the original item value is mapped.
weight : The weight of the item. For example, the price or value of an item.
Itemsets which are contained in rules
<!ELEMENT Itemset (Extension*, ItemRef+)> <!ATTLIST Itemset id %ELEMENT-ID; #REQUIRED support %PROB-NUMBER; #IMPLIED numberOfItems %INT-NUMBER; #IMPLIED > |
Attribute description:
id : An identification to uniquely identify an itemset
support : The relative support of the itemset
numberOfItems : The number of items contained in this itemset
Subelements: Item references to point to elements of type item.
<!ELEMENT ItemRef EMPTY> <!ATTLIST ItemRef itemRef %ELEMENT-ID; #REQUIRED > |
Attribute description:
itemRef : The id value of an item element
Rules: Elements of the form
<antecedent itemset> =>
<consequent itemset>
<!ELEMENT AssociationRule ( Extension* )> <!ATTLIST AssociationRule support %PROB-NUMBER; #REQUIRED confidence %PROB-NUMBER; #REQUIRED antecedent %ELEMENT-ID; #REQUIRED consequent %ELEMENT-ID; #REQUIRED > |
Attribute definitions:
support : The relative support of the ruleconfidence : The confidence of the rule
antecedent : The id value of the itemset which is the antecedent of the rule
consequent : The id value of the itemset which is the consequent of the rule
Example:
Let's assume we have four transactions with the following data: t1: Cracker, Coke, Water t2: Cracker, Water t3: Cracker, Water t4: Cracker, Coke, Water
<?xml version="1.0" ?> <PMML version="2.0" > <Header copyright="www.dmg.org" description="example model for association rules"/> <DataDictionary numberOfFields="2" > <DataField name="transaction" optype="categorical" /> <DataField name="item" optype="categorical" /> </DataDictionary> <AssociationModel functionName="associationRules" numberOfTransactions="4" numberOfItems="3" minimumSupport="0.6" minimumConfidence="0.5" numberOfItemsets="3" numberOfRules="2"> <MiningSchema> <MiningField name="transaction"/> <MiningField name="item"/> </MiningSchema> <!-- We have three items in our input data --> <Item id="1" value="Cracker" /> <Item id="2" value="Coke" /> <Item id="3" value="Water" /> <!-- and two frequent itemsets with a single item --> <Itemset id="1" support="1.0" numberOfItems="1"> <ItemRef itemRef="1" /> </Itemset> <Itemset id="2" support="1.0" numberOfItems="1"> <ItemRef itemRef="3" /> </Itemset> <!-- and one frequent itemset with two items. --> <Itemset id="3" support="1.0" numberOfItems="2"> <ItemRef itemRef="1" /> <ItemRef itemRef="3" /> </Itemset> <!-- Two rules satisfy the requirements --> <AssociationRule support="1.0" confidence="1.0" antecedent="1" consequent="2" /> <AssociationRule support="1.0" confidence="1.0" antecedent="2" consequent="1" /> </AssociationModel> </PMML> |