Naive Bayes
 PMML3.2 Menu Home PMML Notice and License Changes Conformance Interoperability General Structure Header Data Dictionary Mining Schema Transformations Statistics Taxomony Targets Output Functions Built-in Functions Model Composition Model Verification Association Rules Cluster Models General Regression Naive Bayes Neural Network Regression Ruleset Sequences Text Models Trees Vector Machine

## PMML 3.2 - Naïve Bayes

Naïve Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output) independence, from evidence given by one or more predictor (input) fields.

Given a categorical target field T with possible values T1,...Tm, and predictor fields I1,...In, with values (in the current record) of I1*,...In*, the probability that the target T has value Ti, given the values of the predictors, is derived as follows:

P(Ti | I1*,...In*)

= P(Ti) P(I1*,...In* | Ti) / P(I1*,...In*) by Bayes' theorem

~ P(Ti) ProductjP(Ij* | Ti) / P(I1*,...In*) by the conditional independence assumption

= P(Ti) ProductjP(Ij* | Ti) / Sumk ( P(Tk) Productj P(Ij* | Tk))

= Li / Sumk Lk, defining likelihood Lk = P(Tk) Productj P(Ij* | Tk)

Li = P(Ti) Productj P(Ij* | Ti)

= (count[Ti] / Sumk count[Tk]) Productj (count[Ij*Ti] / Sumk count[Tk]) / (count[Ti] / Sumk count[Tk] )

~ count[Ti] Productj(count[Ij* Ti] / count[Ti]) removing factors of Sumkcount[Tk] common to all L

A count of zero requires special attention. Without adjustment, a count of zero would exercise an absolute veto over a likelihood in which that count appears as a factor. Therefore, the Bayes model incorporates a threshold parameter that specifies a default (usually very small) probability to use in lieu of P(Ij* | Tk) when count[Ij*Ti] is zero.

A second adaptation to missing values in the training data, involves the denominator count[Ti] in the conditional-probability terms. Accuracy improves if the denominator for P(Ij* | Ti) is replaced by the sum Sumk count[IjkTi], that is, the sum of the counts of co-occurrences of target value Ti with any (non-missing) value of item Ij.

Naïve Bayes models require that each field (whether target or predictor) be discretized so that for each field, only a small, finite number of values are considered by the model.

In sum, a Naïve Bayes model requires the following parameters and counts:

• An attribute threshold specifies the probability to use in lieu of P(Ij* | Tk) when count[Ij*Ti] is zero.
• An element TargetValueCounts lists, for each value Ti of the target field, the number of occurrences of that target value in the training data, i.e. count[Ti].
• For each predictor field Ii, for each discrete value Iij of that field, an element PairCounts lists, for each value Tk of the target field, the number of occurrences of that predictor value jointly with that target value, i.e. count[IijTk].
A NaiveBayesModel essentially defines a set of matrices. For each input field there is a matrix which contains the frequency counts of an input value with respect to a target value.

Target value
t1t2t3...
count[t1]count[t2]count[t3] ...
Input1i11count[i11,t1]count[i11,t2] count[i11,t3]...
i12count[i12,t1]count[i12,t2]count[i12,t3] ...
...............
Input2i21count[i21,t1]count[i21,t2] count[i21,t3]...
i22count[i22,t1]count[i22,t2]count[i22,t3] ...
i23count[i23,t1]count[i23,t2]count[i23,t3] ...
...............
Input3...............

### XSD

 ``` ```

#### Bayes Inputs

The BayesInputs element contains several BayesInput elements.

 ``` ```

#### Bayes Input

Each BayesInput also contains the counts pairing the discrete values of that field with those of the target field. Each BayesInput for a continuous field also defines how the continuous values are encoded as discrete bins. (Discretization is achieved using DerivedField; only the Discretize mapping for DerivedField may be invoked here)

 ``` ```

#### Bayes Output

BayesOutput contains the counts associated with the values of the target field.

 ``` ```

#### Pair Counts

PairCounts lists, for a field Ii's discrete value Iij, the TargetValueCounts that pair the value Iij with each value of the target field.

 ``` ```

#### Target Value Counts

TargetValueCounts lists the counts associated with each value of the target field. However, a TargetValueCount whose count is zero may be omitted.

Within BayesOutput, TargetValueCounts lists the total count of occurrences of each target value.

Within PairCounts, TargetValueCounts lists, for each target value, the count of the joint occurrences of that target value with a particular discrete input value.

 ``` ```

### Scoring procedure

Given an input vector like (i12,i23,i31) the probability for class t1 is computed as

P(t1 | i12,i23,i31) = L1 / (L1 + L2 + L3)
with
L1 = count[t1] * count[i12,t1]/count[t1] * count[i23,t1]/count[t1] * count[i31,t1]/count[t1]

L2 = count[t2] * count[i12,t2]/count[t2] * count[i23,t2]/count[t2] * count[i31,t2]/count[t2]

L3 = count[t3] * count[i12,t3]/count[t3] * count[i23,t3]/count[t3] * count[i31,t3]/count[t3]

When scoring, missing values are simply ignored. That is, the conditional-probability factor associated with a missing predictor field is omitted. For example, given an input vector with missing values (-,i23,-) the probability for class t1 is computed as

P(t1 | -,i23,-) = L1 / (L1 + L2 + L3)
with
L1 = count[t1] * count[i23,t1]/count[t1]

L2 = count[t2] * count[i23,t2]/count[t2]

L3 = count[t3] * count[i23,t3]/count[t3]

#### Scoring procedure, example

 ```
```

Given an input vector (gender="male", no of claims = "2", domicile= (missing), age of car = "1") the probability for class "1000" is computed as

P("1000" | "male", "2", -,"1" ) = L2 / (L0 + L1 + L2 + L3 + L4)
with
L0 = 8723 * 4273/8723 * 225/8723 * 830/8723

L1 = 2557 * 1321/2557 * 10/2557 * 182/2557

L2 = 1530 * 780/1530 * 9/1530 * 51/1530

L3 = 709 * 405/709 * .001 * 26/709

L4 = 100 * 42/100 * 10/100 * 6/100

 e-mail info at dmg.org