PMML Normalization
PMML1.1 Menu

Home


PMML Notice and License

PMML Conformance

Header

Data Dictionary

Mining Schema

Statistics

Normalization

Tree Model

General Regression

General Structure

Association Rules

Neural Network

Center and Distribution - based Clustering

PMML 1.1 DTD

Download PMML v1.1 (zip)

PMML 1.1 -- DTD for Normalization


This DTD subset for normalization provides a basic framework for mapping input values to specific value ranges, usually the numeric range [0 .. 1]. It is used by the DTD for neural networks. Similar instances are also used in regression models. The general advice for PMML models is: if there is any need for normalization then the representation of these mappings should look like the elements defined below. There is no need to use exactly the same elements, but for ease of presentation and implementation it is recommended to use the same basic structure.


<!ENTITY % NORM-INPUT "( NormContinuous | NormDiscrete )" >

<!ELEMENT NormContinuous ( Extension*, LinearNorm* ) >
<!ATTLIST NormContinuous
field %FIELD-NAME; #REQUIRED
>

NormContinuous defines how to normalize an input field. field must refer to a field in the data dictionary. If LinearNorm is missing then the input field is not normalized.


<!ELEMENT LinearNorm EMPTY >
<!ATTLIST LinearNorm
orig %NUMBER; #REQUIRED
norm %NUMBER; #REQUIRED
>

LinearNorm* defines a sequence of points for a stepwise linear interpolation function. The sequence must contain at least two elements. To simplify processing, the sequence must be sorted by ascending original values. Within NormContinuous the elements LinearNorm must be strictly sorted by ascending value of ' orig'. Given two points (a1, b1) and (a2, b2) such that there is no other point (a3, b3) with a1<a3<a2, then the normalized value is

b1+ ( x-a1)/(a2-a1)*(b2-b1) for a1 <= x <= a2

Missing input values are mapped to missing output. If the input value is not within the range [a1..an] then it is treated as an outlier, the specific method for outlier treatment must be provided by the caller, eg, an outlier could be mapped to a missing value or it could be mapped as the minimal or maximal value.


<!ELEMENT NormDiscrete ( Extension* ) >
<!ATTLIST NormDiscrete
field %FIELD-NAME; #REQUIRED
method ( indicator | thermometer ) #FIXED "indicator"
value CDATA #REQUIRED
>


An element (f, v) defines that the unit has value 1.0 if the value of input field f is v, otherwise it is 0. The set of NormDiscrete instances which refer to a certain input field define a fan-out function which maps a single input field to a set of normalized fields. Missing input values are mapped to missing output.

PMML 1.1 supports only one kind of discrete normalization, future versions could support other techniques such as thermometer encoding. Thermometer encoding can be used for ordinal values, the output is 1.0 if the value of input field f is greater or equal v, otherwise it is 0.0. Futhermore there could also be a linear index mapping for ordinal values: given an ordering (a1, a2, ..., an), then the normalized value for value ai is the number i.


e-mail info at dmg.org