## PMML 3.0 - Naive Bayes

Naive Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output) independence, from evidence given by one or more predictor (input) fields.

Given a categorical target field T with possible values
T_{1},...T_{m}, and predictor fields
I_{1},...I_{n}, with values (in the current record) of
I_{1*},...I_{n*}, the probability that the target T has value
T_{i}, given the values of the predictors, is derived as follows:

*P(T*

_{i}| I_{1*},...I_{n*})*= P(T*by Bayes' theorem

_{i}) P(I_{1*},...I_{n*}|T_{i}) / P(I_{1*},...I_{n*})*~ P(T*by the conditional independence assumption

_{i}) Product_{j}P(I_{j*}| T_{i}) / P(I_{1*},...I_{n*})*= P(T*

_{i}) Product_{j}P(I_{j*}| T_{i}) / Sum_{k}( P(T_{k}) Product_{j}P(I_{j*}| T_{k}))*= L*, defining likelihood

_{i}/ Sum_{k}L_{k}*L*

_{k}= P(T_{k}) Product_{j}P(I_{j*}| T_{k})*L*

_{i}= P(T_{i}) Product_{j}P(I_{j*}| T_{i})*= (count[T*

_{i}] / Sum_{k}count[T_{k}]) Product_{j}(count[I_{j}*T_{i}] / Sum_{k}count[T_{k}]) / (count[T_{i}] / Sum_{k}count[T_{k}] )*~ count[Ti*removing factors of Sum

_{]}Product_{j}(count[I_{j*}T_{i}] / count[T_{i}])_{k}count[T

_{k}] common to all L

A count of zero requires special attention. Without adjustment, a count of zero
would exercise an absolute veto over a likelihood in which that count appears
as a factor. Therefore, the Bayes model incorporates a threshold parameter that
specifies a default (usually very small) probability to use in lieu of
P(I_{j*} | T_{k}) when count[I_{j*}T_{i}] is
zero.

A second adaptation to missing values in the training data, involves the
denominator count[T_{i}] in the conditional-probability terms. Accuracy
improves if the denominator for P(I_{j*} | T_{i}) is replaced
by the sum Sum_{k} count[I_{jk}T_{i}], that is, the
sum of the counts of co-occurrences of target value T_{i} with any
(non-missing) value of item I_{j}.

Naive Bayes models require that each field (whether target or predictor) be
discretized so that for each field, only a small, finite number of values are
considered by the model.

In sum, a Naive Bayes model requires the following parameter and counts:

- An attribute threshold specifies the probability to use in lieu of
P(I
_{j*}| T_{k}) when count[I_{j*}T_{i}] is zero. - An element TargetValueCounts lists, for each value T
_{i}of the target field, the number of occurrences of that target value in the training data, i.e. count[T_{i}]. - For each predictor field I
_{i}, for each discrete value I_{ij}of that field, an element PairCounts lists, for each value T_{k}of the target field, the number of occurrences of that predictor value jointly with that target value, i.e. count[I_{ij}T_{k}].

Target value t1 t2 t3 ... count[t1] count[t2] count[t3] ... Input1 i11 count[i11,t1] count[i11,t2] count[i11,t3] ... i12 count[i12,t1] count[i12,t2] count[i12,t3] ... ... ... ... ... ... Input2 i21 count[i21,t1] count[i21,t2] count[i21,t3] ... i22 count[i22,t1] count[i22,t2] count[i22,t3] ... i23 count[i23,t1] count[i23,t2] count[i23,t3] ... ... ... ... ... ... Input3 ... ... ... ... ...

### Scoring procedure

Given an input vector like (i12,i23,i31) the probability for class t1 is computed as

*P(t1|i12,i23,i31) = L1 / (L1 + L2 + L3)*

*L1 = count[t1] * count[i12,t1]/count[t1] * count[i23,t1]/count[t1] * count[i31,t1]/count[t1]**L2 = count[t2] * count[i12,t2]/count[t2] * count[i23,t2]/count[t2] * count[i31,t2]/count[t2]**L3 = count[t3] * count[i12,t3]/count[t3] * count[i23,t3]/count[t2] * count[i31,t3]/count[t3]*

When scoring, missing values are simply ignored. That is, the
conditional-probability factor associated with a missing predictor field is
omitted. For example, given an input vector with missing values (-,i23,-) the
probability for class t1 is computed as

*P(t1|-,i23,-) = L1 / (L1 + L2 + L3)*

*L1 = count[t1] * count[i23,t1]/count[t1]**L2 = count[t2] * count[i23,t2]/count[t2]**L3 = count[t3] * count[i23,t3]/count[t2]*

### XSD

<xs:element name="NaiveBayesModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0" /> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="Targets" minOccurs="0" /> <xs:element ref="LocalTransformations" minOccurs="0" /> <xs:element ref="BayesInputs" /> <xs:element ref="BayesOutput" /> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string" /> <xs:attribute name="threshold" type="REAL-NUMBER" use="required"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" /> <xs:attribute name="algorithmName" type="xs:string" /> </xs:complexType> </xs:element> |

#### Bayes Inputs

The BayesInputs element contains several BayesInput elements.

<xs:element name="BayesInputs"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element maxOccurs="unbounded" ref="BayesInput" /> </xs:sequence> </xs:complexType> </xs:element> |

#### Bayes Input

Each BayesInput also contains the counts pairing the discrete values of that field with those of the target field. Each BayesInput for a continuous field also defines how the continuous values are encoded as discrete bins. (Discretization is achieved using DerivedField; only the Discretize mapping for DerivedField may be invoked here.)

<xs:element name="BayesInput"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element minOccurs="0" ref="DerivedField" /> <xs:element maxOccurs="unbounded" ref="PairCounts" /> </xs:sequence> <xs:attribute name="fieldName" type="xs:string" use="required" /> </xs:complexType> </xs:element> |

#### Bayes Output

BayesOutput contains the counts associated with the values of the target field.

<xs:element name="BayesOutput"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="TargetValueCounts" /> </xs:sequence> <xs:attribute name="fieldName" type="xs:string" use="required" /> </xs:complexType> </xs:element> |

#### Pair Counts

PairCounts lists, for a field I_{i}'s discrete value I_{ij},
the TargetValueCounts that pair the value I_{ij} with each value of the target
field.

<xs:element name="PairCounts"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="TargetValueCounts" /> </xs:sequence> <xs:attribute name="value" type="xs:string" use="required" /> </xs:complexType> </xs:element> |

#### Target Value Counts

TargetValueCounts lists the counts associated with each value of the target field. However, a TargetValueCount whose count is zero may be omitted.

Within BayesOutput, TargetValueCounts lists the total count of occurrences of each target value.

Within PairCounts, TargetValueCounts lists, for each target value, the count of the joint occurrences of that target value with a particular discrete input value.

<xs:element name="TargetValueCounts"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element maxOccurs="unbounded" ref="TargetValueCount" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="TargetValueCount"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="value" type="xs:string" use="required" /> <xs:attribute name="count" type="REAL-NUMBER" use="required" /> </xs:complexType> </xs:element> |

#### Scoring procedure, example

Given an input vector (gender="male", no of claims = "2", domicile= (missing), age of car = "1") the probability for class "1000" is computed as

*P("1000"| "male", "2", -,"1" ) = L2 / (L0 + L1 + L2 + L3 + L4)*

*L0 = 8723 * 4273/8723 * 225/8723 * 830/8723**L1 = 2557 * 1321/2557 * 10/2557 * 182/2557**L2 = 1530 * 780/1530 * 9/1530 * 51/1530**L3 = 709 * 405/709 * .001 * 26/709**L4 = 100 * 42/100 * 10/100 * 6/100*