PMML 4.0 - Support Vector Machine
Support Vector Machine Models
The description of Support
Vector Machine (SVM) models assumes some familiarity with the SVM
theory. In
this specification, Support Vector Machine models for classification
and
regression are considered. A Support Vector Machine is a function f
which is
defined in the space spanned by the kernel basis functions
K(x,xi) of
the support vectors xi:
f(x) = Sum_(i=1)n
αi*K(x,xi) + b.
Here n is the number of all
support vectors, αi are the basis coefficients and b is the
absolute
coefficient. In an equivalent interpretation, n could also be
considered as the
total number of all training vectors xi. Then the support vectors are
the
subset of all those vectors xi whose coefficients αi are greater
than
zero. The term Support Vector (SV) has also a geometrical
interpretation
because these vectors really support the discrimination function
f(x) = 0 in the mechanical interpretation.
Since a PMML document may contain some SVM models, for instance for
multiclass problems or for trees with SVM nodes, which often share common
support vectors, it is useful to store the SVs only in one place of the PMML
document. The specification supports this by introducing a common
VectorDictionary.
<xs:element name="SupportVectorMachineModel">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="MiningSchema"/>
<xs:element ref="Output" minOccurs="0" />
<xs:element ref="ModelStats" minOccurs="0"/>
<xs:element ref="ModelExplanation" minOccurs="0"/>
<xs:element ref="Targets" minOccurs="0" />
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:sequence>
<xs:choice>
<xs:element ref="LinearKernelType"/>
<xs:element ref="PolynomialKernelType"/>
<xs:element ref="RadialBasisKernelType"/>
<xs:element ref="SigmoidKernelType"/>
</xs:choice>
</xs:sequence>
<xs:element ref="VectorDictionary"/>
<xs:element ref="SupportVectorMachine" maxOccurs="unbounded"/>
<xs:element ref="ModelVerification" minOccurs="0"/>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" use="optional"/>
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
<xs:attribute name="algorithmName" type="xs:string" use="optional"/>
<xs:attribute name="threshold" type="REAL-NUMBER" use="optional" default="0"/>
<xs:attribute name="svmRepresentation" type="SVM-REPRESENTATION" use="optional" default="SupportVectors"/>
<xs:attribute name="classificationMethod" type="SVM-CLASSIFICATION-METHOD" use="optional" default="OneAgainstAll"/>
</xs:complexType>
</xs:element>
|
The attribute modelName
specifies the name of the SVM model.
The attribute functionName
could be either classification or regression depending on the SVM type.
The attribute svmRepresentation defines whether the SVM function is defined via support vectors or
via the coefficients of the hyperplane for the case of linear kernel functions.
The attribute classificationMethod defines which method is to be used in case of multi-class
classification tasks. It can be either OneAgainstAll or OneAgainstOne.
This attribute is not required for binary classification.
The attribute threshold defines a discrimination boundary to be used in case of binary classification or
whenever attribute classificationMethod is defined as OneAgainstOne for multi-class classification tasks.
Since SVMs require numeric attributes which also could be normalized,
transformations are often applied which can be performed in the
LocalTransformations element.
For each active MiningField,
an element of type UnivariateStats (see ModelStats)
holds information about the overall (background) population. This
includes(required) DiscrStats
or ContStats,
which include possible field values and interval boundaries.
Optionally, statistical information is included for the background data.
The KERNEL_TYPE
defines the function space of the
SVM solution through the choice of the basis functions.
The VectorDictionary
element holds all support vectors from all support vector
machines.
SVM Multi-Class Classification Methods
The two most popular methods for multi-class classification
are one-against-all (also known as one-against-rest) and one-against-one.
Depending on the method used, the number of SVMs built will differ.
The SVM classification method specifies which of both methods is used:
<xs:simpleType name="SVM-CLASSIFICATION-METHOD">
<xs:restriction base="xs:string">
<xs:enumeration value="OneAgainstAll"/>
<xs:enumeration value="OneAgainstOne"/>
</xs:restriction>
</xs:simpleType>
|
SVM Representation
Usually the SVM model uses
support vectors to define the model function. However, for the case of
a linear
function (linear kernel type) the function is a linear hyperplane that
can be
more efficiently expressed using the coefficients of all mining fields.
In this case, no support vectors are required at all, and hence
SupportVectors will be absent and only the Coefficients element is
necessary.
The SVM representation specifies which of both representations is used:
<xs:simpleType name="SVM-REPRESENTATION">
<xs:restriction base="xs:string">
<xs:enumeration value="SupportVectors"/>
<xs:enumeration value="Coefficients"/>
</xs:restriction>
</xs:simpleType>
|
Kernel
Types
The kernel defines the type
of the basis functions of the SVM model. There exists a huge number of
kernel
types. The most popular ones are:
LinearKernelType: linear basis functions which lead to a hyperplane as classifier
- K(x,y) = <x,y>
PolynomialKernelType: polynomial basis functions which lead to a polynome classifier
- K(x,y) = (gamma*<x,y>+coef0)degree
RadialBasisKernelType: radial basis functions, the most common kernel type
- K(x,y) = exp(-gamma*||x - y||2)
SigmoidKernelType: sigmoid kernel functions for some models of Neural Network type
- K(x,y) = tanh(gamma*<x,y>+coef0)
<xs:element name="LinearKernelType">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="description" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="PolynomialKernelType">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="description" type="xs:string" use="optional"/>
<xs:attribute name="gamma" type="REAL-NUMBER" use="optional" default="1"/>
<xs:attribute name="coef0" type="REAL-NUMBER" use="optional" default="1"/>
<xs:attribute name="degree" type="REAL-NUMBER" use="optional" default="1"/>
</xs:complexType>
</xs:element>
<xs:element name="RadialBasisKernelType">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="description" type="xs:string" use="optional"/>
<xs:attribute name="gamma" type="REAL-NUMBER" use="optional" default="1"/>
</xs:complexType>
</xs:element>
<xs:element name="SigmoidKernelType">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="description" type="xs:string" use="optional"/>
<xs:attribute name="gamma" type="REAL-NUMBER" use="optional" default="1"/>
<xs:attribute name="coef0" type="REAL-NUMBER" use="optional" default="1"/>
</xs:complexType>
</xs:element>
|
Additional information about the kernel can be entered in the free type attribute
description.
As already mentioned, a vector dictionary was introduced to store all
support vectors. The VectorDictionary is a general container of vectors
and could, in principle, also be used for models other than Support Vector
Machine.
<xs:simpleType name="VECTOR-ID">
<xs:restriction base="xs:string"/>
</xs:simpleType>
<xs:element name="VectorDictionary">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="VectorFields"/>
<xs:element ref="VectorInstance" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="numberOfVectors" type="INT-NUMBER" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="VectorFields">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="FieldRef" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="numberOfFields" type="INT-NUMBER" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="VectorInstance">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:choice>
<xs:element ref="REAL-SparseArray"/>
<xs:group ref="REAL-ARRAY"/>
</xs:choice>
</xs:sequence>
<xs:attribute name="id" type="VECTOR-ID" use="required"/>
</xs:complexType>
</xs:element>
|
The VectorDictionary
contains the set of support vectors which are of the type
VectorInstance.
If present, the attribute numberOfVectors must be equal to the number of
vectors
contained in the dictionary.
VectorFields defines which entries in the vectors correspond to which fields. The sequence of the fields as given in VectorFields corresponds to the entries in the vectors. Fields referenced can be from the MiningSchema, TransformationDictionary or LocalTransformations. numberOfFields gives the number of entries in VectorFields, which corresponds to the dimension of the vectors in the VectorDictionary.
The elements VectorInstance
represent support vectors and are referenced by the id-attribute.
They do not contain the value of the predicted mining field.
The VectorInstance is
a data vector given in sparse array format.
The order of the values corresponds to that of the VectorFields. The sizes of the sparse arrays must match the number of fields included in the VectorFields element.
Notice that the sparse
representation is an important issue because SVMs are usually able to
handle very high-dimensional data whereas the number of support vectors tends
to be small.
<xs:element name="SupportVectorMachine">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="SupportVectors" minOccurs="0"/>
<xs:element ref="Coefficients"/>
</xs:sequence>
<xs:attribute name="targetCategory" type="xs:string" use="optional"/>
<xs:attribute name="alternateTargetCategory" type="xs:string" use="optional"/>
<xs:attribute name="threshold" type="REAL-NUMBER" use="optional"/>
</xs:complexType>
</xs:element>
|
SupportVectors
holds the support vectors as references towards VectorDictionary used by the
respective SVM instance.
For storing the SVM coefficients, the element
Coefficients is used.
Both are combined in the element SupportVectorMachine, which
holds a single instance of an SVM.
The attribute targetCategory is required for classification models and gives the corresponding class label. This attribute is to be
used for classification models implementing the one-against-all method. In this method, for n classes,
there are exactly n SupportVectorMachine elements. The SVM with the smallest value determines the
predicted class label.
The attribute alternateTargetCategory is required in case of binary classification models with only
one SupportVectorMachine element. It is also required in case of multi-class classification models implementing
the one-against-one method. In this method, for n classes, there are exactly n(n-1)/2 SupportVectorMachine elements
where each SVM is trained on data from two classes. The first class is represented by the targetCategory attribute and the
second class by the alternateTargetCategory attribute. The predicted class label is determined based on a voting scheme in which the
category with the maximum number of votes wins. In case of a tie, the predicted class label is the first category with maximal number of votes.
For both cases (binary classification and multi-class classification with one-against-one), the corresponding class labels are
determined by comparing the numeric prediction with the threshold. If smaller than the threshold, it corresponds to the targetCategory
attribute, whereas if bigger or equal, it corresponds to the alternateTargetCategory attribute.
Note that each SupportVectorMachine element may have its own threshold that overrides the default.
The element SupportVectors
contains all support vectors required for the respective SVM instance.
<xs:element name="SupportVectors">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="SupportVector" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="numberOfSupportVectors" type="INT-NUMBER" use="optional"/>
<xs:attribute name="numberOfAttributes" type="INT-NUMBER" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="SupportVector">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="vectorId" type="VECTOR-ID" use="required"/>
</xs:complexType>
</xs:element>
|
The support vectors are represented by the element SupportVector
which only has the attribute vectorId - the reference to the support
vector in VectorDictionary.
If numberOfSupportVectors is specified, then it must
match the number of SupportVector elements. If numberOfAttributes is
specified, then it must match the number of attributes in the support vectors
(which all must have the same length). If one of these requirements is not fulfilled,
then the PMML is not valid.
The element Coefficients
is used to store the support vector coefficients αi and b.
<xs:element name="Coefficients">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="Coefficient" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="numberOfCoefficients" type="INT-NUMBER" use="optional"/>
<xs:attribute name="absoluteValue" type="REAL-NUMBER" use="optional" default="0"/>
</xs:complexType>
</xs:element>
<xs:element name="Coefficient">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="value" type="REAL-NUMBER" use="optional" default="0"/>
</xs:complexType>
</xs:element>
|
Each coefficient
αi is described by the element Coefficient and the number
of
coefficients corresponds to that of the support vectors. Hence the
attribute numberOfCoefficients
is equal to the number of support vectors. The attribute absoluteValue
contains the value of the absolute coefficient b.
Example Model
This example
shows a classification SVM for the simple XOR data set. All vectors are
support vectors.
<?xml version="1.0" encoding="UTF-8"?>
<PMML version="4.0" xmlns="https://www.dmg.org/PMML-4_0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header copyright="DMG.org" />
<DataDictionary numberOfFields="3">
<DataField name="x1" optype="continuous" dataType="double" />
<DataField name="x2" optype="continuous" dataType="double" />
<DataField name="class" optype="categorical" dataType="string">
<Value value="no" />
<Value value="yes" />
</DataField>
</DataDictionary>
<SupportVectorMachineModel modelName="SVM XOR Model" algorithmName="supportVectorMachine"
functionName="classification" svmRepresentation="SupportVectors">
<MiningSchema>
<MiningField name="x1" />
<MiningField name="x2" />
<MiningField name="class" usageType="predicted" />
</MiningSchema>
<RadialBasisKernelType gamma="1.0" description="Radial basis kernel type" />
<VectorDictionary numberOfVectors="4">
<VectorFields numberOfFields="2">
<FieldRef field="x1"/>
<FieldRef field="x2"/>
</VectorFields>
<VectorInstance id="mv0">
<!-- vector x1=0, x2=0 -->
<REAL-SparseArray n="2" />
</VectorInstance>
<VectorInstance id="mv1">
<!-- vector x1=0, x2=1 -->
<REAL-SparseArray n="2">
<Indices>2</Indices>
<REAL-Entries>1.0</REAL-Entries>
</REAL-SparseArray>
</VectorInstance>
<VectorInstance id="mv2">
<!-- vector x1=1, x2=0 -->
<REAL-SparseArray n="2">
<Indices>1</Indices>
<REAL-Entries>1.0</REAL-Entries>
</REAL-SparseArray>
</VectorInstance>
<VectorInstance id="mv3">
<!-- vector x1=1, x2=1 -->
<REAL-SparseArray n="2">
<Indices>1 2</Indices>
<REAL-Entries>1.0 1.0</REAL-Entries>
</REAL-SparseArray>
</VectorInstance>
</VectorDictionary>
<SupportVectorMachine targetCategory="no" alternateTargetCategory="yes">
<SupportVectors numberOfAttributes="2" numberOfSupportVectors="4">
<SupportVector vectorId="mv0" />
<SupportVector vectorId="mv1" />
<SupportVector vectorId="mv2" />
<SupportVector vectorId="mv3" />
</SupportVectors>
<Coefficients absoluteValue="0" numberOfCoefficients="4">
<Coefficient value="-1.0" />
<Coefficient value="1.0" />
<Coefficient value="1.0" />
<Coefficient value="-1.0" />
</Coefficients>
</SupportVectorMachine>
</SupportVectorMachineModel>
</PMML>
|
Scoring procedure, example
Consider the same example as above in order to illustrate the scoring
procedure of the Support Vector Machine. Given the first support vector
as input vector
x = mv0 = (x1=0.0, x2=0.0)
we calculate as follows:
f(x) = Sum_(i=1)n αi*K(x,xi) + b
= -1.0*K(x,mv0) + 1.0*K(x,mv1) + 1.0*K(x,mv2) -1.0*K(x,mv3) + 0
= -1.0*exp(-1.0*||x - mv0||2) + 1.0*exp(-1.0*||x - mv1||2) +
1.0*exp(-1.0*||x - mv2||2) -1.0*exp(-1.0*||x - mv3||2) + 0
= -1.0*exp(-1.0*|| (0,0)T - (0,0)T ||2) +
1.0*exp(-1.0*|| (0,0)T - (0,1)T ||2) +
1.0*exp(-1.0*|| (0,0)T - (1,0)T ||2) -1.0*exp(-1.0*||
(0,0)T - (1,1)T ||2) + 0
= -1.0*exp(-1.0*|| (0,0)T ||2) + 1.0*exp(-1.0*||
(0,-1)T ||2) +
1.0*exp(-1.0*|| (-1,0)T ||2) -1.0*exp(-1.0*|| (-1,-1)T
||2) + 0
= -1.0*exp(0.0) + 1.0*exp(-1.0) + 1.0*exp(-1.0) -1.0*exp(-2.0) + 0
f(x) = -0.399576.
In the same way, the scoring of the other support vectors delivers
f(x = mv1) = 0.399576
f(x = mv2) = 0.399576
f(x = mv3) = -0.399576
thus reasonably approximating the training data.
A classification with a threshold of 0 would assign the vectors mv0 and mv3 to
class no and the vectors mv1 and mv2 to class yes delivering an exact
classification of the training data.