Data Mining Group - Neural Network Models

PMML 2.1 -- Neural Network Models

Neural Network Models for Backpropagation

The description of neural network models assumes that the reader has a general knowledge of artificial neural network technology. A neural network has one or more input nodes and one or more neurons. Some neuron's outputs are the output of the network. The network is defined by the neurons and their connections, aka weights. All neurons are organized into layers; the sequence of layers defines the order in which the activations are computed. All output activations for neurons in some layer L are evaluated before computation proceeds to the next layer L+1. Note that this allows for recurrent networks where outputs of neurons in layer L+i can be used as input in layer L where L+i > L. The model does not define a specific evaluation order for neurons within a layer.

Each neuron receives one or more input values, each coming via a network connection, and sends only one output value. Most layers consist of elements Neuron but a layer can also consist of elements RBFNeuron that are used to represent radial basis function units. All incoming connections for a certain neuron are contained in the corresponding Neuron element. Each connection Con of the element Neuron stores the ID of a node it comes from and the weight. A bias weight coefficient or a width of a radial babis function unit may be stored as an attribute of Neuron element.

All neurons in the network are assumed to have the same (default) activation function, although each individual layer and even each neuron may have its own activation and threshold that override the default. Given a fixed neuron j, and W_i representing the weight on the connection from neuron i, the activation for neuron j is computed using up to three steps as follows

A linear combination is computed (unless the activation function is radialBasis): Z = Sum( W_i * output(i) ) + bias

The activation function is applied to the result of step 1: output(j) = activation( Z )

A normalization method softmax ( p_j = exp(y_j) / Sum_i(exp(y_i) ) ) or simplemax ( p_j = y_j / Sum_i(y_i) ) can be applied to the computed activation values. The attribute 'normalizationMethod' is defined for each layer, with default value none ( p_j = y_j ). Softmax normalization is most often applied to the output layer of a classification network to get the probabilities of all answers. Simplemax normalization is often applied to the hidden layer consisting of elements with radial basis activation function to get a "normalized RBF" activation.

Activation functions are:

threshold:

activation(Z) = if Z > threshold then 1 else 0

logistic:

activation(Z) = 1 / (1 + exp(-Z))

tanh:

activation(Z) = (1-exp(-2Z)/(1+exp(-2Z))

identity:

activation(Z) = Z

exponential:

activation(Z) = exp(Z)

reciprocal:

activation(Z) = 1/Z

square:

activation(Z) = Z*Z

Guass:

activation(Z) = exp(-(Z*Z))

sine:

activation(Z) = sin(Z)

Elliott:

activation(Z) = Z/(1+|Z|)

arctan:

activation(Z) = 2 * arctan(Z)/Pi

radialBasis activation is computed as follows:

activation = exp( -(Sum_i (output(i)-W_i)² )/(2*Width²)) where the sum is taken over all input units, W_i are the coordinates of the center stored in Con elements in place of the weights, Width is the width for the radial basis function unit stored in Neuron element.

XSD

<xs:element name="NeuralNetwork">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	<xs:element ref="MiningSchema" />
	<xs:element minOccurs="0" ref="ModelStats" />
	<xs:element ref="NeuralInputs" />
	<xs:element maxOccurs="unbounded" ref="NeuralLayer" />
	<xs:element minOccurs="0" ref="NeuralOutputs" />
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
      </xs:sequence>
      <xs:attribute name="modelName" type="xs:string" />
      <xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
      <xs:attribute name="algorithmName" type="xs:string" />
      <xs:attribute name="activationFunction" type="ACTIVATION-FUNCTION" use="required" />
      <xs:attribute name="threshold" type="REAL-NUMBER" />
      <xs:attribute name="numberOfLayers" type="xs:nonNegativeInteger" />
    </xs:complexType>
  </xs:element>

  <xs:element name="NeuralInputs">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	<xs:element maxOccurs="unbounded" ref="NeuralInput" />
      </xs:sequence>
      <xs:attribute name="numberOfInputs" type="xs:nonNegativeInteger" />
    </xs:complexType>
  </xs:element>

<xs:element name="NeuralLayer">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	<xs:element maxOccurs="unbounded" ref="Neuron" />
      </xs:sequence>
      <xs:attribute name="numberOfNeurons" type="xs:nonNegativeInteger" />
      <xs:attribute name="activationFunction" type="ACTIVATION-FUNCTION" />
      <xs:attribute name="normalizationMethod" default="none">
	<xs:simpleType>
	  <xs:restriction base="xs:string">
	    <xs:enumeration value="none" />
	    <xs:enumeration value="simplemax" />
	    <xs:enumeration value="softmax" />
	  </xs:restriction>
	</xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

<xs:element name="NeuralOutputs">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	<xs:element maxOccurs="unbounded" ref="NeuralOutput" />
      </xs:sequence>
      <xs:attribute name="numberOfOutputs" type="xs:nonNegativeInteger" />
    </xs:complexType>
  </xs:element>

NeuralInput defines how input fields are normalized so that the values can be processed in the neural network. For example, string values must be encoded as numeric values.

NeuralOutput defines how the output of the neural network must be interpreted.

<xs:simpleType name="ACTIVATION-FUNCTION">
    <xs:restriction base="xs:string">
      <xs:enumeration value="threshold" />
      <xs:enumeration value="logistic" />
      <xs:enumeration value="tanh" />
      <xs:enumeration value="identity" />
      <xs:enumeration value="exponential" />
      <xs:enumeration value="reciprocal" />
      <xs:enumeration value="square" />
      <xs:enumeration value="Gauss" />
      <xs:enumeration value="sine" />
      <xs:enumeration value="Elliott" />
      <xs:enumeration value="arctan" />
      <xs:enumeration value="radialBasis" />
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="NN-NEURON-ID">
    <xs:restriction base="xs:string" />
  </xs:simpleType>
  <xs:simpleType name="NN-NEURON-IDREF">
    <xs:restriction base="xs:string" />
  </xs:simpleType>

NN-NEURON-ID is just a string which identifies a neuron. The string is not necessarily an XML ID because a PMML document may contain multiple network models where neurons in different models can have the same identifier. Within a model, though, all neurons (elements of NeuralInput and Neuron) must have a unique identifier.

Neural Network Input Neurons

An input neuron represents the normalized value for an input field. A numeric input field is usually mapped to a single input neuron while a categorical input field is usually mapped to a set of input neurons using some fan-out function. The elements NormContinuos and NormDiscrete are defined in the Transformation Dictionary. The element DerivedField is the general container for these transformations.

<xs:element name="NeuralInput">
    <xs:complexType>
      <xs:sequence>
	 <xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	 <xs:element ref="DerivedField" />
      </xs:sequence>
      <xs:attribute name="id" type="NN-NEURON-ID" use="required" />
    </xs:complexType>
  </xs:element>

Restrictions: A numeric input field must not appear more than once in the input layer. Similarly, a pair of categorical input field together with an input value must not appear more than once in the input layer.

Neural Network Neurons

<xs:element name="Neuron">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	<xs:element maxOccurs="unbounded" ref="Con" />
      </xs:sequence>
      <xs:attribute name="id" type="NN-NEURON-ID" use="required" />
      <xs:attribute name="bias" type="REAL-NUMBER" />
      <xs:attribute name="activationFunction" type="ACTIVATION-FUNCTION" />
      <xs:attribute name="threshold" type="REAL-NUMBER" />
      <xs:attribute name="width" type="REAL-NUMBER" />
    </xs:complexType>
  </xs:element>

Neuron contains an identifier which must be unique in all layers, its attribute threshold has default value 0. If no activationFunction is given then the default activation function of the NeuralLayer element applies, and if it is not specified then the default activation function of the NeuralNetwork element applies. The attribute 'bias' implicitly defines a connection to a bias unit where the unit's value is 1.0 and the weight is the value of 'bias'. The attribute 'width' must be specified if and only if the activation function for the element is 'radialBasis'.

Weighted connection between neural net nodes are represented by Con elements.

<xs:element name="Con">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
      </xs:sequence>
      <xs:attribute name="from" type="NN-NEURON-IDREF" use="required" />
      <xs:attribute name="weight" type="REAL-NUMBER" use="required" />
    </xs:complexType>
  </xs:element>

Con elements are always part of a Neuron. They define the connections coming into that parent element. The neuron identified by 'from' may be part of any layer.

NN-NEURON-IDs of all nodes must be unique across the combined set of NeuralInput and Neuron nodes. The 'from' attributes of connections and NeuralOutputs refer to these identifiers.

Neural Network Output Neurons

In parallel to input neurons, there are output neurons which are connected to input fields via some normalization. While the activation of an input neuron is defined by the value of the corresponding input field, the activation of an output neuron is computed by the activation function. Therefore, an output neuron is defined by a 'Neuron'. In networks with supervised learning the computed activation of the output neurons is compared with the normalized values of the corresponding target fields; these values are often called 'teach values'. The difference between the neuron's activation and the normalized target field determines the prediction error. For scoring the normalization for the target field is used to denormalize the predicted value in the output neuron. Therefore, each instance of 'Neuron' which represent an output neuron, is additionally connected to a normalized field. Note that the scoring procedure must apply the inverse of the normalization in order to map the neuron activation to a value in the original domain.

Connect a neuron's output to the output of the network.

<xs:element name="NeuralOutput">
    <xs:complexType>
      <xs:sequence>
	<xs:element minOccurs="0" maxOccurs="unbounded" ref="Extension" />
	<xs:element ref="DerivedField" />
      </xs:sequence>
      <xs:attribute name="outputNeuron" type="NN-NEURON-IDREF" use="required" />
    </xs:complexType>
  </xs:element>

For neural value prediction with back propagation, the output layer contains a single neuron, this is denormalized giving the predicted value.

For neural classification with backpropagation, the output layer contains one or more neurons. The neuron with maximal activation determines the predicted class label. If there is no unique neuron with maximal activation then the predicted value is undefined.

Conformance

backward connections from level N to level M with M <= N or connections between non-adjacent layers are not in core.
variable values for activationFunction per Neuron are not in core, per layer are in core.
only the following activation functions are in core: threshold, logistic, tanh, identity, radialBasis

Example model

<?xml version="1.0" ?>
<PMML version="2.1">
     <Header copyright="DMG.org"/>
     <DataDictionary numberOfFields="5">
          <DataField name="gender" optype="categorical">
               <Value value="  female"/>
               <Value value="    male"/>
          </DataField>
          <DataField name="no of claims" optype="categorical">
               <Value value="       0"/>
               <Value value="       1"/>
               <Value value="       3"/>
               <Value value="     &gt; 3"/>
               <Value value="       2"/>
          </DataField>
          <DataField name="domicile" optype="categorical">
               <Value value="suburban"/>
               <Value value="   urban"/>
               <Value value="   rural"/>
          </DataField>
          <DataField name="age of car" optype="continuous"/>
          <DataField name="amount of claims" optype="continuous"/>
     </DataDictionary>
     <NeuralNetwork modelName="Neural Insurance"
                functionName="regression"
                activationFunction="logistic"
                numberOfLayers="2">
        <MiningSchema>
          <MiningField name="gender"/>
          <MiningField name="no of claims"/>
          <MiningField name="domicile"/>
          <MiningField name="age of car"/>
          <MiningField name="amount of claims" usageType="predicted"/>
        </MiningSchema>
        <NeuralInputs numberOfInputs="10">
          <NeuralInput id="0">
               <DerivedField>
                    <NormContinuous field="age of car">
                         <LinearNorm orig="0.01" norm="0"/>
                         <LinearNorm orig="3.07897" norm="0.5"/>
                         <LinearNorm orig="11.44" norm="1"/>
                    </NormContinuous>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="1">
               <DerivedField>
                    <NormDiscrete field="gender" value="    male"/>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="2">
               <DerivedField>
                    <NormDiscrete field="no of claims" value="       0"/>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="3">
               <DerivedField>
                    <NormDiscrete field="no of claims" value="       1"/>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="4">
               <DerivedField>
                    <NormDiscrete field="no of claims" value="       3"/>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="5">
               <DerivedField>
                    <NormDiscrete field="no of claims" value="     &gt; 3"/>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="6">
               <DerivedField>
                    <NormDiscrete field="no of claims" value="       2"/>
               </DerivedField>
          </NeuralInput>
          <NeuralInput id="7">
               <DerivedField>
                    <NormDiscrete field="domicile" value="suburban"/>
               </DerivedField>
           </NeuralInput>
           <NeuralInput id="8">
               <DerivedField>
                    <NormDiscrete field="domicile" value="   urban"/>
               </DerivedField>
           </NeuralInput>
           <NeuralInput id="9">
               <DerivedField>
                    <NormDiscrete field="domicile" value="   rural"/>
               </DerivedField>
           </NeuralInput>
        </NeuralInputs>
        <NeuralLayer numberOfNeurons="3">
           <Neuron id="10">
               <Con from="0" weight="-2.08148"/>
               <Con from="1" weight="3.69657"/>
               <Con from="2" weight="-1.89986"/>
               <Con from="3" weight="5.61779"/>
               <Con from="4" weight="0.427558"/>
               <Con from="5" weight="-1.25971"/>
               <Con from="6" weight="-6.55549"/>
               <Con from="7" weight="-4.62773"/>
               <Con from="8" weight="1.97525"/>
               <Con from="9" weight="-1.0962"/>
           </Neuron>
           <Neuron id="11">
               <Con from="0" weight="-0.698997"/>
               <Con from="1" weight="-3.54943"/>
               <Con from="2" weight="-3.29632"/>
               <Con from="3" weight="-1.20931"/>
               <Con from="4" weight="1.00497"/>
               <Con from="5" weight="0.033502"/>
               <Con from="6" weight="1.12016"/>
               <Con from="7" weight="0.523197"/>
               <Con from="8" weight="-2.96135"/>
               <Con from="9" weight="-0.398626"/>
           </Neuron>
           <Neuron id="12">
               <Con from="0" weight="0.904057"/>
               <Con from="1" weight="1.75084"/>
               <Con from="2" weight="2.51658"/>
               <Con from="3" weight="-0.151895"/>
               <Con from="4" weight="-2.88008"/>
               <Con from="5" weight="0.920063"/>
               <Con from="6" weight="-3.30742"/>
               <Con from="7" weight="-1.72251"/>
               <Con from="8" weight="-1.13156"/>
               <Con from="9" weight="-0.758563"/>
           </Neuron>
        </NeuralLayer>
        <NeuralLayer numberOfNeurons="1">
           <Neuron id="13">
               <Con from="10" weight="0.76617"/>
               <Con from="11" weight="-1.5065"/>
               <Con from="12" weight="0.999797"/>
           </Neuron>
        </NeuralLayer>
        <NeuralOutputs numberOfOutputs="1">
           <NeuralOutput outputNeuron="13">
               <DerivedField>
                    <NormContinuous field="amount of claims">
                         <LinearNorm orig="0" norm="0.1"/>
                         <LinearNorm orig="1291.68" norm="0.5"/>
                         <LinearNorm orig="5327.26" norm="0.9"/>
                        </NormContinuous>
               </DerivedField>
           </NeuralOutput>
        </NeuralOutputs>
     </NeuralNetwork>
</PMML>