PMML 2.1 XMLSchema -- Naive
Bayes Models
Naive Bayes uses Bayes' Theorem,
combined with a ("naive") presumption of conditional independence, to predict
the value of a target (output) independence, from evidence given by one
or more predictor (input) fields.
Given a categorical target
field T with possible values T1,...Tm, and predictor
fields I1,...In, with values (in the current record)
of I1*,...In*, the probability that the target T
has value Ti , given the values of the predictors, is derived
as follows:
P(Ti | I1*,...In*)
= P(Ti ) P(I1*,...In*
|Ti)
/ P(I1*,...In*), by Bayes' theorem
~ P(Ti ) ProductjP(Ij*
|Ti)
/ P(I1*,...In*), by the conditional independence
assumption
= P(Ti ) ProductjP(Ij*
|
Ti) / Sumk ( P(Tk) Productj
P(Ij*| Tk))
= Li / Sumk
Lk, defining likelihood Lk = P(Tk) Productj
P(Ij*| Tk)
Li
= P(Ti ) Productj P(Ij* | Ti)
= (count[Ti]
/ Sumk count[Tk]) Productj (count[Ij*Ti]
/ Sumk count[Tk]) / (count[Ti] / Sumk
count[Tk]
)
~ count[Ti]
Productj(count[Ij* Ti] / count[Ti]),
removing factors of Sumkcount[Tk] common to all L
A count of zero requires
special attention. Without adjustment, a count of zero would exercise an
absolute veto over a likelihood in which that count appears as a factor.
Therefore, the Bayes model incorporates a threshold parameter that specifies
a default (usually very small) probability to use in lieu of P(Ij*|
Tk) when count[Ij*Ti] is zero.
A second adaptation to missing
values
in the training data, involves the denominator count[Ti] in
the conditional-probability terms. Accuracy improves if the denominator
for P(Ij*| Ti) is replaced by the sum Sumk count[IjkTi],
that is, the sum of the counts of co-occurrences of target value Ti
with any (non-missing) value of item Ij.
Naive Bayes models require
that each field (whether target or predictor) be discretized so that for
each field, only a small, finite number of values are considered by the
model.
In sum, a Naive Bayes model
requires the following parameter and counts:
-
An attribute threshold
specifies the probability to use in lieu of P(Ij*| Tk)
when count[Ij*Ti] is zero.
-
An element TargetValueCounts
lists, for each value Ti of the target field, the number of
occurrences of that target value in the training data, i.e. count[Ti].
-
For each predictor field Ii,
for each discrete value Iij of that field, an element PairCounts
lists, for each value Tk of the target field, the number of
occurrences of that predictor value jointly with that target value, i.e.
count[IijTk].
A NaiveBayesModel essentially
defines a set of matrices. For each input field there is a matrix which
contains the frequency counts of an input value with respect to a target
value.
|
Target value |
|
t1 |
t2 |
t3 |
... |
|
count[t1] |
count[t2] |
count[t3] |
... |
|
|
|
|
|
Input1 |
i11 |
count[i11,t1] |
count[i11,t2] |
count[i11,t3] |
... |
i12 |
count[i12,t1] |
count[i12,t2] |
count[i12,t3] |
... |
... |
... |
... |
... |
... |
|
|
|
|
|
Input2 |
i21 |
count[i21,t1] |
count[i21,t2] |
count[i21,t3] |
... |
i22 |
count[i22,t1] |
count[i22,t2] |
count[i22,t3] |
... |
i23 |
count[i23,t1] |
count[i23,t2] |
count[i23,t3] |
... |
... |
... |
... |
... |
... |
|
|
|
|
|
Input3 |
... |
... |
... |
... |
... |
Scoring procedure
Given an input vector like (i12,i23,i31)
the probability for class t1 is computed as
P(t1|i12,i23,i31) = L1 /
(L1 + L2 + L3)
with
L1 = count[t1]
* count[i12,t1]/count[t1] * count[i23,t1]/count[t1] * count[i31,t1]/count[t1]
L2 = count[t2]
* count[i12,t2]/count[t2] * count[i23,t2]/count[t2] * count[i31,t2]/count[t2]
L3 = count[t3]
* count[i12,t3]/count[t3] * count[i23,t3]/count[t2] * count[i31,t3]/count[t3]
When scoring, missing values
are simply ignored. That is, the conditional-probability factor associated
with a missing predictor field is omitted. For example, given an input
vector with missing values (-,i23,-) the probability for class t1 is computed
as
P(t1|-,i23,-) = L1 / (L1
+ L2 + L3)
with
L1 = count[t1]
* count[i23,t1]/count[t1]
L2 = count[t2]
* count[i23,t2]/count[t2]
L3 = count[t3]
* count[i23,t3]/count[t2]
XSD
<xs:element name="NaiveBayesModel">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
<xs:element ref="MiningSchema" />
<xs:element minOccurs="0" ref="ModelStats" />
<xs:element ref="BayesInputs" />
<xs:element ref="BayesOutput" />
<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" />
<xs:attribute name="threshold" type="REAL-NUMBER" use="required" />
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
<xs:attribute name="algorithmName" type="xs:string" />
</xs:complexType>
</xs:element>
|
Bayes
Inputs
The BayesInputs element contains
several BayesInput elements.
<xs:element name="BayesInputs">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="BayesInput" />
</xs:sequence>
</xs:complexType>
</xs:element>
|
Bayes
Input
Each BayesInput also contains
the counts pairing the discrete values of that field with those of the
target field. Each BayesInput for a continous field also defines how the
continuous values are encoded as discrete bins. (Discretization is achieved
using DerivedField; only the Discretize mapping for DerivedField may be
invoked here.)
<xs:element name="BayesInput">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
<xs:element minOccurs="0" ref="DerivedField" />
<xs:element maxOccurs="unbounded" ref="PairCounts" />
</xs:sequence>
<xs:attribute name="fieldName" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
|
Bayes
Output
BayesOutput contains the
counts associated with the values of the target field.
<xs:element name="BayesOutput">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
<xs:element ref="TargetValueCounts" />
</xs:sequence>
<xs:attribute name="fieldName" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
|
Pair
Counts
PairCounts lists, for a field
Ii's discrete value Iij, the TargetValueCounts
that pair the value Iij with each value of the target field.
<xs:element name="PairCounts">
<xs:complexType>
<xs:sequence>
<xs:element ref="TargetValueCounts" />
</xs:sequence>
<xs:attribute name="value" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
|
Target
Value Counts
TargetValueCounts lists the
counts associated with each value of the target field. However, a TargetValueCount
whose count is zero may be omitted.
Within BayesOutput,
TargetValueCounts lists the total count of occurrences of each target value.
Within PairCounts,
TargetValueCounts lists, for each target value, the count of the joint
occurrences of that target value with a particular discrete input value.
<xs:element name="TargetValueCounts">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="TargetValueCount" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="TargetValueCount">
<xs:complexType>
<xs:attribute name="value" type="xs:string" use="required" />
<xs:attribute name="count" type="REAL-NUMBER" use="required" />
</xs:complexType>
</xs:element>
|
Scoring procedure, example
Given an input vector (gender="male",
no of claims = "2", domicile= (missing), age of car = "1") the probability
for class "1000" is computed as
P("1000"| "male", "2", -,"1"
) = L2 / (L0 + L1 + L2 + L3 + L4)
with
L0 = 8723 *
4273/8723 * 225/8723 * 830/8723
L1 = 2557 *
1321/2557 * 10/2557 * 182/2557
L2 = 1530 *
780/1530 * 9/1530 * 51/1530
L3 = 709 *
405/709 * .001 * 26/709
L4 = 100 *
42/100 * 10/100 * 6/100