PMML 4.0 - Neural Network Models
Neural Network Models for Backpropagation
The description of neural network models assumes that the reader has a
general knowledge of artificial neural network technology. A neural network has
one or more input nodes and one or more neurons. Some neurons' outputs are the
output of the network. The network is defined by the neurons and their
connections, aka weights. All neurons are organized into layers; the sequence of
layers defines the order in which the activations are computed. All output
activations for neurons in some layer L are evaluated before computation
proceeds to the next layer L+1. Note that this allows for recurrent networks
where outputs of neurons in layer L+i can be used as input in layer L where L+i
> L. The model does not define a specific evaluation order for neurons within
a layer.
Each neuron receives one or more input values, each coming via a network
connection, and sends only one output value. All incoming connections for a
certain neuron are contained in the corresponding Neuron element. Each
connection Con of the element Neuron stores the ID of a node it comes
from and the weight. A bias weight coefficient or a width of a radial basis function unit
may be stored as an attribute of Neuron element.
All neurons in the network are assumed to have the same (default) activation
function, although each individual layer may have its own activation and
threshold that override the default. Given a fixed neuron j, and Wi representing
the weight on the connection from neuron i, the activation for neuron j is
computed using up to three steps as follows
- Compute a linear combination or euclidean distance using input activations and weights Wi.
The input activations to the current neuron are the outputs of the connected neurons.
Z = see below
- The activation function is applied to the result of step 1:
output(j) = activation( Z )
- A normalization method softmax
( pj = exp(yj) / Sumi(exp(yi) ) )
or simplemax ( pj = yj / Sumi(yi) )
can be applied to the computed activation values. The attribute normalizationMethod
is defined for the network with
default value none ( pj = yj ),
but can be specified for each layer as well. Softmax normalization is most
often applied to the output layer of a classification network to get the probabilities of all
answers. Simplemax normalization is often applied to the hidden layer consisting of elements
with radial basis activation function to get a "normalized RBF" activation.
There are two groups of activation functions.
- Group 1 uses a linear combination of weights and input activations.
Z = Sum( Wi * output(i) ) + bias
Activation functions are:
- threshold:
- activation(Z) = 1 if Z > threshold else 0
- logistic:
- activation(Z) = 1 / (1 + exp(-Z))
- tanh:
- activation(Z) = (1-exp(-2Z)/(1+exp(-2Z))
- identity:
- activation(Z) = Z
- exponential:
- activation(Z) = exp(Z)
- reciprocal:
- activation(Z) = 1/Z
- square:
- activation(Z) = Z*Z
- Gauss:
- activation(Z) = exp(-(Z*Z))
- sine:
- activation(Z) = sin(Z)
- cosine:
- activation(Z) = cos(Z)
- Elliott:
- activation(Z) = Z/(1+|Z|)
- arctan:
- activation(Z) = 2 * arctan(Z)/Pi
- Group 2 computes a euclidean distance between weights and input activations (= outputs of other neurons)
Z = (Sumi (output(i)-Wi)2 )/(2*width2)
where the sum is taken over all input units, Wi are the coordinates of the center
stored in Con elements in place of the weights, width is
a positive number describing
the width for the radial basis function unit stored
either in Neuron element or in NeuralLayer or even in NeuralNetwork.
The only activation function in this group is 'radialBasis'.
- radialBasis:
-
activation = exp( f * log(altitude) - Z )
where f is the fan-in of each unit in the layer, that is the
number of other units feeding into that unit, excluding bias, and the
altitude is a positive number stored in Neuron or
NeuralLayer or NeuralNetwork.
The default is altitude="1.0", for that value the activation function reduces
to the simple exp(-Z).
XSD
<xs:element name="NeuralNetwork">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="MiningSchema"/>
<xs:element ref="Output" minOccurs="0" />
<xs:element ref="ModelStats" minOccurs="0"/>
<xs:element ref="ModelExplanation" minOccurs="0"/>
<xs:element ref="Targets" minOccurs="0" />
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:element ref="NeuralInputs" />
<xs:element maxOccurs="unbounded" ref="NeuralLayer" />
<xs:element minOccurs="0" ref="NeuralOutputs" />
<xs:element ref="ModelVerification" minOccurs="0"/>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" />
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
<xs:attribute name="algorithmName" type="xs:string" />
<xs:attribute name="activationFunction" type="ACTIVATION-FUNCTION" use="required" />
<xs:attribute name="normalizationMethod" type="NN-NORMALIZATION-METHOD" default="none"/>
<xs:attribute name="threshold" type="REAL-NUMBER" default="0" />
<xs:attribute name="width" type="REAL-NUMBER" />
<xs:attribute name="altitude" type="REAL-NUMBER" default="1.0" />
<xs:attribute name="numberOfLayers" type="xs:nonNegativeInteger" />
</xs:complexType>
</xs:element>
<xs:element name="NeuralInputs">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element maxOccurs="unbounded" ref="NeuralInput" />
</xs:sequence>
<xs:attribute name="numberOfInputs" type="xs:nonNegativeInteger" />
</xs:complexType>
</xs:element>
<xs:element name="NeuralLayer">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element maxOccurs="unbounded" ref="Neuron" />
</xs:sequence>
<xs:attribute name="numberOfNeurons" type="xs:nonNegativeInteger" />
<xs:attribute name="activationFunction" type="ACTIVATION-FUNCTION" />
<xs:attribute name="threshold" type="REAL-NUMBER" />
<xs:attribute name="width" type="REAL-NUMBER" />
<xs:attribute name="altitude" type="REAL-NUMBER" />
<xs:attribute name="normalizationMethod" type="NN-NORMALIZATION-METHOD" />
</xs:complexType>
</xs:element>
<xs:element name="NeuralOutputs">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element maxOccurs="unbounded" ref="NeuralOutput" />
</xs:sequence>
<xs:attribute name="numberOfOutputs" type="xs:nonNegativeInteger" />
</xs:complexType>
</xs:element>
|
NeuralInput defines how input fields are normalized so that the values
can be processed in the neural network. For example, string values must be
encoded as numeric values.
NeuralOutput defines how the output of the neural network must be
interpreted.
<xs:simpleType name="ACTIVATION-FUNCTION">
<xs:restriction base="xs:string">
<xs:enumeration value="threshold" />
<xs:enumeration value="logistic" />
<xs:enumeration value="tanh" />
<xs:enumeration value="identity" />
<xs:enumeration value="exponential" />
<xs:enumeration value="reciprocal" />
<xs:enumeration value="square" />
<xs:enumeration value="Gauss" />
<xs:enumeration value="sine" />
<xs:enumeration value="cosine" />
<xs:enumeration value="Elliott" />
<xs:enumeration value="arctan" />
<xs:enumeration value="radialBasis" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="NN-NORMALIZATION-METHOD">
<xs:restriction base="xs:string">
<xs:enumeration value="none" />
<xs:enumeration value="simplemax" />
<xs:enumeration value="softmax" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="NN-NEURON-ID">
<xs:restriction base="xs:string" />
</xs:simpleType>
<xs:simpleType name="NN-NEURON-IDREF">
<xs:restriction base="xs:string" />
</xs:simpleType>
|
NN-NEURON-ID is just a string which identifies a neuron. The string is not
necessarily an XML ID because a PMML document may contain multiple network
models where neurons in different models can have the same identifier. Within a
model, though, all neurons (elements of NeuralInput and Neuron)
must have a unique identifier.
Neural Network Input Neurons
An input neuron represents the normalized value for an input field. A numeric
input field is usually mapped to a single input neuron while a categorical input
field is usually mapped to a set of input neurons using some fan-out function.
The normalization is defined using the elements NormContinuous and NormDiscrete
defined in the Transformation
Dictionary. The element DerivedField is the general container
for these transformations.
<xs:element name="NeuralInput">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="DerivedField" />
</xs:sequence>
<xs:attribute name="id" type="NN-NEURON-ID" use="required" />
</xs:complexType>
</xs:element>
|
Restrictions: A numeric input field must not appear more than once in the
input layer. Similarly, a pair of categorical input field together with an input
value must not appear more than once in the input layer.
Neural Network Neurons
<xs:element name="Neuron">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element maxOccurs="unbounded" ref="Con" />
</xs:sequence>
<xs:attribute name="id" type="NN-NEURON-ID" use="required" />
<xs:attribute name="bias" type="REAL-NUMBER" />
<xs:attribute name="width" type="REAL-NUMBER" />
<xs:attribute name="altitude" type="REAL-NUMBER" />
</xs:complexType>
</xs:element>
|
Neuron contains an identifier id which must be unique in all layers.
The attribute bias implicitly defines a connection to a bias unit where the
unit's value is 1.0 and the weight is the value of bias.
The activation function and normalization method for Neuron can be defined in
NeuralLayer. If either one is not
defined for the layer then the default one specified for NeuralNetwork
applies.
If the activation function is radialBasis, the attribute width
must be specified either in Neuron, NeuralLayer or
NeuralNetwork. Again, width specified in Neuron
will override a respective value from NeuralLayer, and in turn will
override a value given in NeuralNetwork.
Weighted connections between neural net nodes are represented by Con elements.
<xs:element name="Con">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="from" type="NN-NEURON-IDREF" use="required" />
<xs:attribute name="weight" type="REAL-NUMBER" use="required" />
</xs:complexType>
</xs:element>
|
Con elements are always part of a Neuron. They define the connections
coming into that parent element. The neuron identified by from may be
part of any layer.
NN-NEURON-IDs of all nodes must be unique across the combined set of
NeuralInput and Neuron nodes. The from attributes of connections and
NeuralOutputs refer to these identifiers.
In parallel to input neurons, there are output neurons which are connected to
input fields via some normalization. While the activation of an input neuron is
defined by the value of the corresponding input field, the activation of an
output neuron is computed by the activation function. Therefore, an output
neuron is defined by a Neuron. In networks with supervised learning the
computed activation of the output neurons is compared with the normalized values
of the corresponding target fields; these values are often called teach
values. The difference between the neuron's activation and the normalized
target field determines the prediction error. For scoring the normalization for
the target field is used to denormalize the predicted value in the output
neuron. Therefore, each instance of Neuron which represent an output
neuron, is additionally connected to a normalized field. Note that the scoring
procedure must apply the inverse of the normalization in order to map the neuron
activation to a value in the original domain.
Connect a neuron's output to the output of the network.
<xs:element name="NeuralOutput">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="DerivedField" />
</xs:sequence>
<xs:attribute name="outputNeuron" type="NN-NEURON-IDREF" use="required" />
</xs:complexType>
</xs:element>
|
For neural value prediction with back propagation, the output layer contains
a single neuron, this is denormalized giving the predicted value.
For neural classification with backpropagation, the output layer contains
one or more neurons. The neuron with maximal activation determines the predicted
class label. If there is no unique neuron with maximal activation then the
predicted value is the first output neuron with maximal activation.
Example model
<?xml version="1.0" ?>
<PMML version="4.0" xmlns="https://www.dmg.org/PMML-4_0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header copyright="DMG.org"/>
<DataDictionary numberOfFields="5">
<DataField name="gender" optype="categorical" dataType="string">
<Value value=" female"/>
<Value value=" male"/>
</DataField>
<DataField name="no of claims" optype="categorical" dataType="string">
<Value value=" 0"/>
<Value value=" 1"/>
<Value value=" 3"/>
<Value value=" > 3"/>
<Value value=" 2"/>
</DataField>
<DataField name="domicile" optype="categorical" dataType="string">
<Value value="suburban"/>
<Value value=" urban"/>
<Value value=" rural"/>
</DataField>
<DataField name="age of car" optype="continuous" dataType="double"/>
<DataField name="amount of claims" optype="continuous" dataType="integer"/>
</DataDictionary>
<NeuralNetwork modelName="Neural Insurance"
functionName="regression"
activationFunction="logistic"
numberOfLayers="2">
<MiningSchema>
<MiningField name="gender"/>
<MiningField name="no of claims"/>
<MiningField name="domicile"/>
<MiningField name="age of car"/>
<MiningField name="amount of claims" usageType="predicted"/>
</MiningSchema>
<NeuralInputs numberOfInputs="10">
<NeuralInput id="0">
<DerivedField optype="continuous" dataType="double">
<NormContinuous field="age of car">
<LinearNorm orig="0.01" norm="0"/>
<LinearNorm orig="3.07897" norm="0.5"/>
<LinearNorm orig="11.44" norm="1"/>
</NormContinuous>
</DerivedField>
</NeuralInput>
<NeuralInput id="1">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="gender" value=" male"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="2">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="no of claims" value=" 0"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="3">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="no of claims" value=" 1"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="4">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="no of claims" value=" 3"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="5">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="no of claims" value=" > 3"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="6">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="no of claims" value=" 2"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="7">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="domicile" value="suburban"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="8">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="domicile" value=" urban"/>
</DerivedField>
</NeuralInput>
<NeuralInput id="9">
<DerivedField optype="continuous" dataType="double">
<NormDiscrete field="domicile" value=" rural"/>
</DerivedField>
</NeuralInput>
</NeuralInputs>
<NeuralLayer numberOfNeurons="3">
<Neuron id="10">
<Con from="0" weight="-2.08148"/>
<Con from="1" weight="3.69657"/>
<Con from="2" weight="-1.89986"/>
<Con from="3" weight="5.61779"/>
<Con from="4" weight="0.427558"/>
<Con from="5" weight="-1.25971"/>
<Con from="6" weight="-6.55549"/>
<Con from="7" weight="-4.62773"/>
<Con from="8" weight="1.97525"/>
<Con from="9" weight="-1.0962"/>
</Neuron>
<Neuron id="11">
<Con from="0" weight="-0.698997"/>
<Con from="1" weight="-3.54943"/>
<Con from="2" weight="-3.29632"/>
<Con from="3" weight="-1.20931"/>
<Con from="4" weight="1.00497"/>
<Con from="5" weight="0.033502"/>
<Con from="6" weight="1.12016"/>
<Con from="7" weight="0.523197"/>
<Con from="8" weight="-2.96135"/>
<Con from="9" weight="-0.398626"/>
</Neuron>
<Neuron id="12">
<Con from="0" weight="0.904057"/>
<Con from="1" weight="1.75084"/>
<Con from="2" weight="2.51658"/>
<Con from="3" weight="-0.151895"/>
<Con from="4" weight="-2.88008"/>
<Con from="5" weight="0.920063"/>
<Con from="6" weight="-3.30742"/>
<Con from="7" weight="-1.72251"/>
<Con from="8" weight="-1.13156"/>
<Con from="9" weight="-0.758563"/>
</Neuron>
</NeuralLayer>
<NeuralLayer numberOfNeurons="1">
<Neuron id="13">
<Con from="10" weight="0.76617"/>
<Con from="11" weight="-1.5065"/>
<Con from="12" weight="0.999797"/>
</Neuron>
</NeuralLayer>
<NeuralOutputs numberOfOutputs="1">
<NeuralOutput outputNeuron="13">
<DerivedField optype="continuous" dataType="double">
<NormContinuous field="amount of claims">
<LinearNorm orig="0" norm="0.1"/>
<LinearNorm orig="1291.68" norm="0.5"/>
<LinearNorm orig="5327.26" norm="0.9"/>
</NormContinuous>
</DerivedField>
</NeuralOutput>
</NeuralOutputs>
</NeuralNetwork>
</PMML>
|