|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
| ||||||||||||||||||||||||
PMML 4.2 - Associations RulesThe Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product or set of products is often bought in combination with a certain set of other products, also known as Market Basket Analysis. An Association Rule model typically has two variables: one for grouping records together into transactions (usageType="group") and another that uniquely identifies each record (usageType="active"). The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type. An Association Rule model consists of four major parts:
<xs:element name="AssociationModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="Item" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Itemset" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="AssociationRule" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string"/> <xs:attribute name="numberOfTransactions" type="INT-NUMBER" use="required"/> <xs:attribute name="maxNumberOfItemsPerTA" type="INT-NUMBER"/> <xs:attribute name="avgNumberOfItemsPerTA" type="REAL-NUMBER"/> <xs:attribute name="minimumSupport" type="PROB-NUMBER" use="required"/> <xs:attribute name="minimumConfidence" type="PROB-NUMBER" use="required"/> <xs:attribute name="lengthLimit" type="INT-NUMBER"/> <xs:attribute name="numberOfItems" type="INT-NUMBER" use="required"/> <xs:attribute name="numberOfItemsets" type="INT-NUMBER" use="required"/> <xs:attribute name="numberOfRules" type="INT-NUMBER" use="required"/> <xs:attribute name="isScorable" type="xs:boolean" default="true"/> </xs:complexType> </xs:element> An AssociationModel can contain any number of Itemsets and AssociationRules. Note, however, that all Itemsets must be listed before any of the rules. Here is a description of the attributes:
We consider items next: <xs:element name="Item"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="id" type="xs:string" use="required"/> <xs:attribute name="value" type="xs:string" use="required"/> <xs:attribute name="mappedValue" type="xs:string"/> <xs:attribute name="weight" type="REAL-NUMBER"/> </xs:complexType> </xs:element> Here is a description of the attributes in an item: Obviously the id of an Item must be unique. Furthermore the Item values must be unique too. That is, an AssocationModel must not have different instances of Item where the values of the value attribute are duplicates. The entries in mappedValue may be the same, though. We consider itemsets next: <xs:element name="Itemset"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element minOccurs="0" maxOccurs="unbounded" ref="ItemRef"/> </xs:sequence> <xs:attribute name="id" type="xs:string" use="required"/> <xs:attribute name="support" type="PROB-NUMBER"/> <xs:attribute name="numberOfItems" type="xs:nonNegativeInteger"/> </xs:complexType> </xs:element> Here is a description of the attributes in an Itemset:
<xs:element name="ItemRef"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="itemRef" type="xs:string" use="required"/> </xs:complexType> </xs:element> Here is a description of the attributes in an ItemRef: itemRef: Contains the identification of an item. We consider association rules of the form "<antecedent itemset> => <consequent itemset>" next: <xs:element name="AssociationRule"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="antecedent" type="xs:string" use="required"/> <xs:attribute name="consequent" type="xs:string" use="required"/> <xs:attribute name="support" type="PROB-NUMBER" use="required"/> <xs:attribute name="confidence" type="PROB-NUMBER" use="required"/> <xs:attribute name="lift" type="xs:float" use="optional"/> <xs:attribute name="leverage" type="xs:float" use="optional"/> <xs:attribute name="affinity" type="PROB-NUMBER" use="optional"/> <xs:attribute name="id" type="xs:string" use="optional"/> </xs:complexType> </xs:element> Here is a description of the attributes in an AssociationRule (note that the formulae listed below must hold true for the attributes included in the rule):
These statistics and their calculation are described visually in the chart below:
Here is an example of an association model: <PMML xmlns="https://www.dmg.org/PMML-4_2" version="4.2"> <Header copyright="www.dmg.org" description="example model for association rules"/> <DataDictionary numberOfFields="2"> <DataField name="transaction" optype="categorical" dataType="string"/> <DataField name="item" optype="categorical" dataType="string"/> </DataDictionary> <AssociationModel functionName="associationRules" numberOfTransactions="4" numberOfItems="3" minimumSupport="0.6" minimumConfidence="0.5" numberOfItemsets="3" numberOfRules="2"> <MiningSchema> <MiningField name="transaction" usageType="group"/> <MiningField name="item" usageType="active"/> </MiningSchema> <Output> <!-- There are nine outputs defined for this model --> <!-- that return the top three highest confidence --> <!-- "exclusiveRecommendation" results (selecting --> <!-- rules where the items in the input itemset --> <!-- appear in the antecedent but do not appear in --> <!-- the consequent). For each of these three --> <!-- rules, there are three available outputs: --> <!-- rule: for example, "Cracker -> Water" --> <!-- consequent: for example, "Water" --> <!-- entityId: for example, 1 --> <OutputField name="Rule (Highest Confidence)" rankBasis="confidence" rank="1" algorithm="exclusiveRecommendation" feature="rule" dataType="string" optype="categorical"/> <OutputField name="Recommendation (Highest Confidence)" rankBasis="confidence" rank="1" algorithm="exclusiveRecommendation" feature="consequent" dataType="string" optype="categorical"/> <OutputField name="Rule Id (Highest Confidence)" rankBasis="confidence" rank="1" algorithm="exclusiveRecommendation" feature="entityId" dataType="double" optype="continuous"/> <OutputField name="Rule (2nd Highest Confidence)" rankBasis="confidence" rank="2" algorithm="exclusiveRecommendation" feature="rule" dataType="string" optype="categorical"/> <OutputField name="Recommendation (2nd Highest Confidence)" rankBasis="confidence" rank="2" algorithm="exclusiveRecommendation" feature="consequent" dataType="string" optype="categorical"/> <OutputField name="Rule Id (2nd Highest Confidence)" rankBasis="confidence" rank="2" algorithm="exclusiveRecommendation" feature="entityId" dataType="double" optype="continuous"/> <OutputField name="Rule (3rd Highest Confidence)" rankBasis="confidence" rank="3" algorithm="exclusiveRecommendation" feature="rule" dataType="string" optype="categorical"/> <OutputField name="Recommendation (3rd Highest Confidence)" rankBasis="confidence" rank="3" algorithm="exclusiveRecommendation" feature="consequent" dataType="string" optype="categorical"/> <OutputField name="Rule Id (3rd Highest Confidence)" rankBasis="confidence" rank="3" algorithm="exclusiveRecommendation" feature="entityId" dataType="double" optype="continuous"/> </Output> <!-- We have three items in our input data --> <Item id="1" value="Cracker"/> <Item id="2" value="Coke"/> <Item id="3" value="Water"/> <!-- and two frequent itemsets with a single item --> <Itemset id="1" support="1.0" numberOfItems="1"> <ItemRef itemRef="1"/> </Itemset> <Itemset id="2" support="1.0" numberOfItems="1"> <ItemRef itemRef="3"/> </Itemset> <!-- and one frequent itemset with two items. --> <Itemset id="3" support="1.0" numberOfItems="2"> <ItemRef itemRef="1"/> <ItemRef itemRef="3"/> </Itemset> <!-- Two rules satisfy the requirements --> <AssociationRule support="1.0" confidence="1.0" antecedent="1" consequent="2"/> <AssociationRule support="1.0" confidence="1.0" antecedent="2" consequent="1"/> </AssociationModel> </PMML> Scoring ProcedureThe scoring procedure has as input an association model and an itemset. It determines all rules defined within the input model which are associated with the input itemset, based on the algorithm specified within a specific OutputField:
Let us consider a sample model with the following rules: rule 1: Cracker -> Water Now, let's apply the model to the following input itemsets, using each of the possible algorithms, to see which rules satisfy the various input itemsets:
For instance, if we apply "exclusiveRecommendation" to input itemset #3, "Water, Coke", rule 2 is found to match because its antecedent "Water" is a subset of the input itemset, while its consequent "Cracker" is not. Rule 1 does not match because its consequent "Water" is included in the input itemset; also, rule 4 does not match because its antecedent "Cracker AND Water" is not a subset of the input itemset. If we apply "exclusiveRecommnedation" to input itemset #5, "Cracker, Water, Banana, Apple", rule 4 is found to match because its antecedent "Cracker AND Water" is a subset of the input itemset, while its consequent "Nachos" is not. Rule 5 also matches because its consequent "Pear AND Banana" is not a subset of the input itemset. Using input itemset #5 again, if we apply "ruleAssociation", rule 2 is found to match since both the antecedent and consequent are subsets of the input itemset. Rule 5 is not found to match as its consequent "Pear AND Banana" is not a subset of the input itemset. See the chapter on Outputs for details on the various types of outputs that can be returned by Association Rules models. |
||||||||||||||||||||||||
|