Data Mining Group - PMML 4.0 - Support Vector Machine

The description of Support Vector Machine (SVM) models assumes some familiarity with the SVM theory. In this specification, Support Vector Machine models for classification and regression are considered. A Support Vector Machine is a function f which is defined in the space spanned by the kernel basis functions K(x,x_i) of the support vectors x_i:

f(x) = Sum_(i=1)ⁿ α_i*K(x,x_i) + b.

Here n is the number of all support vectors, α_i are the basis coefficients and b is the absolute coefficient. In an equivalent interpretation, n could also be considered as the total number of all training vectors x_i. Then the support vectors are the subset of all those vectors x_i whose coefficients α_i are greater than zero. The term Support Vector (SV) has also a geometrical interpretation because these vectors really support the discrimination function f(x) = 0 in the mechanical interpretation.

Since a PMML document may contain some SVM models, for instance for multiclass problems or for trees with SVM nodes, which often share common support vectors, it is useful to store the SVs only in one place of the PMML document. The specification supports this by introducing a common VectorDictionary.


  <xs:element name="SupportVectorMachineModel">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="MiningSchema"/>
        <xs:element ref="Output" minOccurs="0" />
        <xs:element ref="ModelStats" minOccurs="0"/>
        <xs:element ref="ModelExplanation" minOccurs="0"/>
        <xs:element ref="Targets" minOccurs="0" />
        <xs:element ref="LocalTransformations" minOccurs="0" />
        <xs:sequence>
          <xs:choice>
            <xs:element ref="LinearKernelType"/>
            <xs:element ref="PolynomialKernelType"/>
            <xs:element ref="RadialBasisKernelType"/>
            <xs:element ref="SigmoidKernelType"/>
          </xs:choice>
        </xs:sequence>
        <xs:element ref="VectorDictionary"/>
        <xs:element ref="SupportVectorMachine" maxOccurs="unbounded"/>
        <xs:element ref="ModelVerification" minOccurs="0"/>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="modelName" type="xs:string" use="optional"/>
      <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
      <xs:attribute name="algorithmName" type="xs:string" use="optional"/>
      <xs:attribute name="threshold" type="REAL-NUMBER" use="optional" default="0"/>
      <xs:attribute name="svmRepresentation" type="SVM-REPRESENTATION" use="optional" default="SupportVectors"/>
      <xs:attribute name="classificationMethod" type="SVM-CLASSIFICATION-METHOD" use="optional" default="OneAgainstAll"/>
    </xs:complexType>
  </xs:element>

The attribute modelName specifies the name of the SVM model.

The attribute functionName could be either classification or regression depending on the SVM type.

The attribute svmRepresentation defines whether the SVM function is defined via support vectors or via the coefficients of the hyperplane for the case of linear kernel functions.

The attribute classificationMethod defines which method is to be used in case of multi-class classification tasks. It can be either OneAgainstAll or OneAgainstOne. This attribute is not required for binary classification.

The attribute threshold defines a discrimination boundary to be used in case of binary classification or whenever attribute classificationMethod is defined as OneAgainstOne for multi-class classification tasks.

Since SVMs require numeric attributes which also could be normalized, transformations are often applied which can be performed in the LocalTransformations element.

For each active MiningField, an element of type UnivariateStats (see ModelStats) holds information about the overall (background) population. This includes(required) DiscrStats or ContStats, which include possible field values and interval boundaries. Optionally, statistical information is included for the background data.

The KERNEL_TYPE defines the function space of the SVM solution through the choice of the basis functions.

The VectorDictionary element holds all support vectors from all support vector machines.

SVM Multi-Class Classification Methods

The two most popular methods for multi-class classification are one-against-all (also known as one-against-rest) and one-against-one. Depending on the method used, the number of SVMs built will differ.

The SVM classification method specifies which of both methods is used:


  <xs:simpleType name="SVM-CLASSIFICATION-METHOD">
    <xs:restriction base="xs:string">
      <xs:enumeration value="OneAgainstAll"/>
      <xs:enumeration value="OneAgainstOne"/>
    </xs:restriction>
  </xs:simpleType>

SVM Representation

Usually the SVM model uses support vectors to define the model function. However, for the case of a linear function (linear kernel type) the function is a linear hyperplane that can be more efficiently expressed using the coefficients of all mining fields. In this case, no support vectors are required at all, and hence SupportVectors will be absent and only the Coefficients element is necessary.

The SVM representation specifies which of both representations is used:


  <xs:simpleType name="SVM-REPRESENTATION">
    <xs:restriction base="xs:string">
      <xs:enumeration value="SupportVectors"/>
      <xs:enumeration value="Coefficients"/>
    </xs:restriction>
  </xs:simpleType>

Kernel Types

The kernel defines the type of the basis functions of the SVM model. There exists a huge number of kernel types. The most popular ones are:

LinearKernelType: linear basis functions which lead to a hyperplane as classifier
K(x,y) = <x,y>

PolynomialKernelType: polynomial basis functions which lead to a polynome classifier
K(x,y) = (gamma*<x,y>+coef0)^degree

RadialBasisKernelType: radial basis functions, the most common kernel type
K(x,y) = exp(-gamma*||x - y||²)

SigmoidKernelType: sigmoid kernel functions for some models of Neural Network type
K(x,y) = tanh(gamma*<x,y>+coef0)


  <xs:element name="LinearKernelType">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="description" type="xs:string" use="optional"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="PolynomialKernelType">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="description" type="xs:string" use="optional"/>
      <xs:attribute name="gamma" type="REAL-NUMBER" use="optional" default="1"/>
      <xs:attribute name="coef0" type="REAL-NUMBER" use="optional" default="1"/>
      <xs:attribute name="degree" type="REAL-NUMBER" use="optional" default="1"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="RadialBasisKernelType">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="description" type="xs:string" use="optional"/>
      <xs:attribute name="gamma" type="REAL-NUMBER" use="optional" default="1"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="SigmoidKernelType">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="description" type="xs:string" use="optional"/>
      <xs:attribute name="gamma" type="REAL-NUMBER" use="optional" default="1"/>
      <xs:attribute name="coef0" type="REAL-NUMBER" use="optional" default="1"/>
    </xs:complexType>
  </xs:element>

Additional information about the kernel can be entered in the free type attribute description.

Support Vectors

As already mentioned, a vector dictionary was introduced to store all support vectors. The VectorDictionary is a general container of vectors and could, in principle, also be used for models other than Support Vector Machine.


  <xs:simpleType name="VECTOR-ID">
    <xs:restriction base="xs:string"/>
  </xs:simpleType>

  <xs:element name="VectorDictionary">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="VectorFields"/>
        <xs:element ref="VectorInstance" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="numberOfVectors" type="INT-NUMBER" use="optional"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="VectorFields">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="FieldRef" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="numberOfFields" type="INT-NUMBER" use="optional"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="VectorInstance">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:choice>
          <xs:element ref="REAL-SparseArray"/>
          <xs:group ref="REAL-ARRAY"/>
        </xs:choice>
      </xs:sequence>
      <xs:attribute name="id" type="VECTOR-ID" use="required"/>
    </xs:complexType>
  </xs:element>

The VectorDictionary contains the set of support vectors which are of the type VectorInstance. If present, the attribute numberOfVectors must be equal to the number of vectors contained in the dictionary.

VectorFields defines which entries in the vectors correspond to which fields. The sequence of the fields as given in VectorFields corresponds to the entries in the vectors. Fields referenced can be from the MiningSchema, TransformationDictionary or LocalTransformations. numberOfFields gives the number of entries in VectorFields, which corresponds to the dimension of the vectors in the VectorDictionary.

The elements VectorInstance represent support vectors and are referenced by the id-attribute. They do not contain the value of the predicted mining field.

The VectorInstance is a data vector given in sparse array format. The order of the values corresponds to that of the VectorFields. The sizes of the sparse arrays must match the number of fields included in the VectorFields element.

Notice that the sparse representation is an important issue because SVMs are usually able to handle very high-dimensional data whereas the number of support vectors tends to be small.


  <xs:element name="SupportVectorMachine">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="SupportVectors" minOccurs="0"/>
        <xs:element ref="Coefficients"/>
      </xs:sequence>
      <xs:attribute name="targetCategory" type="xs:string" use="optional"/>
      <xs:attribute name="alternateTargetCategory" type="xs:string" use="optional"/>
      <xs:attribute name="threshold" type="REAL-NUMBER" use="optional"/>
    </xs:complexType>
  </xs:element>

SupportVectors holds the support vectors as references towards VectorDictionary used by the respective SVM instance. For storing the SVM coefficients, the element Coefficients is used. Both are combined in the element SupportVectorMachine, which holds a single instance of an SVM.

The attribute targetCategory is required for classification models and gives the corresponding class label. This attribute is to be used for classification models implementing the one-against-all method. In this method, for n classes, there are exactly n SupportVectorMachine elements. The SVM with the smallest value determines the predicted class label.

The attribute alternateTargetCategory is required in case of binary classification models with only one SupportVectorMachine element. It is also required in case of multi-class classification models implementing the one-against-one method. In this method, for n classes, there are exactly n(n-1)/2 SupportVectorMachine elements where each SVM is trained on data from two classes. The first class is represented by the targetCategory attribute and the second class by the alternateTargetCategory attribute. The predicted class label is determined based on a voting scheme in which the category with the maximum number of votes wins. In case of a tie, the predicted class label is the first category with maximal number of votes. For both cases (binary classification and multi-class classification with one-against-one), the corresponding class labels are determined by comparing the numeric prediction with the threshold. If smaller than the threshold, it corresponds to the targetCategory attribute, whereas if bigger or equal, it corresponds to the alternateTargetCategory attribute.

Note that each SupportVectorMachine element may have its own threshold that overrides the default.

The element SupportVectors contains all support vectors required for the respective SVM instance.


  <xs:element name="SupportVectors">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="SupportVector" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="numberOfSupportVectors" type="INT-NUMBER" use="optional"/>
      <xs:attribute name="numberOfAttributes" type="INT-NUMBER" use="optional"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="SupportVector">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="vectorId" type="VECTOR-ID" use="required"/>
    </xs:complexType>
  </xs:element>

The support vectors are represented by the element SupportVector which only has the attribute vectorId - the reference to the support vector in VectorDictionary. If numberOfSupportVectors is specified, then it must match the number of SupportVector elements. If numberOfAttributes is specified, then it must match the number of attributes in the support vectors (which all must have the same length). If one of these requirements is not fulfilled, then the PMML is not valid.

Support Vector Coefficients

The element Coefficients is used to store the support vector coefficients α_i and b.


  <xs:element name="Coefficients">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="Coefficient" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="numberOfCoefficients" type="INT-NUMBER" use="optional"/>
      <xs:attribute name="absoluteValue" type="REAL-NUMBER" use="optional" default="0"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="Coefficient">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="value" type="REAL-NUMBER" use="optional" default="0"/>
    </xs:complexType>
  </xs:element>

Each coefficient α_i is described by the element Coefficient and the number of coefficients corresponds to that of the support vectors. Hence the attribute numberOfCoefficients is equal to the number of support vectors. The attribute absoluteValue contains the value of the absolute coefficient b.

Example Model

This example shows a classification SVM for the simple XOR data set. All vectors are support vectors.


  <?xml version="1.0" encoding="UTF-8"?>
  <PMML version="4.0" xmlns="https://www.dmg.org/PMML-4_0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Header copyright="DMG.org" />
    <DataDictionary numberOfFields="3">
      <DataField name="x1" optype="continuous" dataType="double" />
      <DataField name="x2" optype="continuous" dataType="double" />
      <DataField name="class" optype="categorical" dataType="string">
        <Value value="no" />
        <Value value="yes" />
      </DataField>
    </DataDictionary>
    <SupportVectorMachineModel modelName="SVM XOR Model" algorithmName="supportVectorMachine"
             functionName="classification" svmRepresentation="SupportVectors">
      <MiningSchema>
        <MiningField name="x1" />
        <MiningField name="x2" />
        <MiningField name="class" usageType="predicted" />
      </MiningSchema>
      <RadialBasisKernelType gamma="1.0" description="Radial basis kernel type" />
      <VectorDictionary numberOfVectors="4">
        <VectorFields numberOfFields="2">
          <FieldRef field="x1"/>
          <FieldRef field="x2"/>
        </VectorFields>
        <VectorInstance id="mv0">
          <!-- vector x1=0, x2=0 -->
          <REAL-SparseArray n="2" />
        </VectorInstance>
        <VectorInstance id="mv1">
        <!-- vector x1=0, x2=1 -->
          <REAL-SparseArray n="2">
            <Indices>2</Indices>
            <REAL-Entries>1.0</REAL-Entries>
          </REAL-SparseArray>
        </VectorInstance>
        <VectorInstance id="mv2">
        <!-- vector x1=1, x2=0 -->
          <REAL-SparseArray n="2">
            <Indices>1</Indices>
            <REAL-Entries>1.0</REAL-Entries>
          </REAL-SparseArray>
        </VectorInstance>
        <VectorInstance id="mv3">
        <!-- vector x1=1, x2=1 -->
          <REAL-SparseArray n="2">
            <Indices>1 2</Indices>
            <REAL-Entries>1.0 1.0</REAL-Entries>
          </REAL-SparseArray>
        </VectorInstance>
      </VectorDictionary>
      <SupportVectorMachine targetCategory="no" alternateTargetCategory="yes">
        <SupportVectors numberOfAttributes="2" numberOfSupportVectors="4">
          <SupportVector vectorId="mv0" />
          <SupportVector vectorId="mv1" />
          <SupportVector vectorId="mv2" />
          <SupportVector vectorId="mv3" />
        </SupportVectors>
        <Coefficients absoluteValue="0" numberOfCoefficients="4">
          <Coefficient value="-1.0" />
          <Coefficient value="1.0" />
          <Coefficient value="1.0" />
          <Coefficient value="-1.0" />
        </Coefficients>
      </SupportVectorMachine>
    </SupportVectorMachineModel>
  </PMML>

Scoring procedure, example

Consider the same example as above in order to illustrate the scoring procedure of the Support Vector Machine. Given the first support vector as input vector

x = mv₀ = (x₁=0.0, x₂=0.0)

we calculate as follows:

f(x) = Sum_(i=1)ⁿ α_i*K(x,x_i) + b

= -1.0*K(x,mv₀) + 1.0*K(x,mv₁) + 1.0*K(x,mv₂) -1.0*K(x,mv₃) + 0

= -1.0*exp(-1.0*||x - mv₀||²) + 1.0*exp(-1.0*||x - mv₁||²) + 1.0*exp(-1.0*||x - mv₂||²) -1.0*exp(-1.0*||x - mv₃||²) + 0

= -1.0*exp(-1.0*|| (0,0)^T - (0,0)^T ||²) + 1.0*exp(-1.0*|| (0,0)^T - (0,1)^T ||²) + 1.0*exp(-1.0*|| (0,0)^T - (1,0)^T ||²) -1.0*exp(-1.0*|| (0,0)^T - (1,1)^T ||²) + 0

= -1.0*exp(-1.0*|| (0,0)^T ||²) + 1.0*exp(-1.0*|| (0,-1)^T ||²) + 1.0*exp(-1.0*|| (-1,0)^T ||²) -1.0*exp(-1.0*|| (-1,-1)^T ||²) + 0

= -1.0*exp(0.0) + 1.0*exp(-1.0) + 1.0*exp(-1.0) -1.0*exp(-2.0) + 0

f(x) = -0.399576.

In the same way, the scoring of the other support vectors delivers

f(x = mv₁) = 0.399576
f(x = mv₂) = 0.399576
f(x = mv₃) = -0.399576

thus reasonably approximating the training data.

A classification with a threshold of 0 would assign the vectors mv₀ and mv₃ to class no and the vectors mv₁ and mv₂ to class yes delivering an exact classification of the training data.

e-mail

info at dmg.org