PMML 4.0 - Built-in functions
Almost all programming languages come with a set of predefined functions that perform
low-level operations. PMML has a similar set of functions.
- +, -, * and /
- min, max, sum and avg
- log10, ln, sqrt, abs, exp, pow, threshold, floor, ceil, round
- isMissing, isNotMissing
- equal, notEqual, lessThan, lessOrEqual, greaterThan, greaterOrEqual, isIn, isNotIn
- and, or
- not
- isIn, isNotIn
- if
- uppercase
- substring
- trimBlanks
- formatNumber
- formatDatetime
- dateDaysSinceYear
- dateSecondsSinceYear
- dateSecondsSinceMidnight
The definitions of functions in PMML generally follow the design
of functions and operators in
XQuery.
Further ideas are taken from
MathML
,
XPath
,
Java Date formats
.
Functions for simple arithmetics.
Pseudo-declaration of PMML built-in function +:
<DefineFunction name="+" optype="continuous">
<ParameterField name="a" optype="continuous"/>
<ParameterField name="b" optype="continuous"/>
... implementation built-in ...
</DefineFunction>
|
The functions
-,
*, and
/ are defined in the same way and have two parameters.
Example: Return the difference between input fields named A,
B.
<Apply function="-">
<FieldRef field="A"/>
<FieldRef field="B"/>
</Apply>
|
Assuming
A=2.5 and
B=4 the result corresponding to this
Apply element is
-1.5. Note: If one of
the input fields of a simple arithmentic function is a missing
value, the result evaluates to missing value.
Returns an aggregation of a variable number of input fields.
Pseudo-declaration of PMML built-in function min:
<DefineFunction name="min" optype="continuous">
The function takes a variable number
of <FieldRef/> as parameters
... implementation built-in ...
</DefineFunction>
|
The aggregation functions max, sum, and avg are defined in the same way.
Note that the number of input parameters is variable but these functions do not aggregate
values coming from multiple input records.
Example: Return the minimum value of input fields named A,
B, and C.
<Apply function="min">
<FieldRef field="A"/>
<FieldRef field="B"/>
<FieldRef field="C"/>
</Apply>
|
Assuming
A=2.5 and
B=4 and
C=1.5 the result
corresponding to this
Apply element is
1.5.
Missing values in the input to an aggregate
function are simply ignored. It should be noted that this has an effect on how
avg is computed in particular. In the above example, if
B
is a missing value, the result corresponding to applying
avg
on
A,
B and
C is
2.
Further mathematical functions.
Pseudo-declaration of PMML built-in function log10:
<DefineFunction name="log10" optype="continuous">
<ParameterField name="x" optype="continuous"/>
... implementation built-in ...
</DefineFunction>
|
The function
log10 returns the logarithm to the base 10.
The functions
ln (natural log),
sqrt (square root),
abs (absolute value),
exp (exponential) are defined in the same way.
Semantics are as usual.
See also
MathML.
Example: Return the logarithm to the base 10 of an input field A.
<Apply function="log10">
<FieldRef field="A"/>
</Apply>
|
Assuming
A=2.5 the result corresponding to this
Apply element is
approx.
0.397940008672038.
Pseudo-declaration of PMML built-in functions pow and floor:
<DefineFunction name="pow" optype="continuous">
<ParameterField name="x" optype="continuous"/>
<ParameterField name="y" optype="continuous"/>
... implementation built-in ...
</DefineFunction>
<DefineFunction name="floor" datatype="integer">
<ParameterField name="x" optype="continuous"/>
... implementation built-in ...
</DefineFunction>
|
The function
pow(x,y) returns the number
x raised to the power
y,
threshold(x,y)
returns 1 if
x>y and
0 otherwise. Functions
floor, ceil, round return an integer
obtained by rounding the numeric argument down, up, and to the closest integer respectively. See section on
respective functionality in
Targets for examples.
Example: Return the cube of an input field A.
<Apply function="pow">
<FieldRef field="A"/>
<Constant dataType="integer">3</Constant>
</Apply>
|
Assuming
A=5.0 the result corresponding to this
Apply element is
125.0.
Functions for boolean operations.
Return true or false. Result is dependent on applying
either function to a single input parameter.
Pseudo-declaration of PMML built-in function isMissing:
<DefineFunction name="isMissing" dataType="boolean">
<ParameterField name="input"/>
... implementation built-in ...
</DefineFunction>
|
Example: Check if field Str is missing. If so, returns true, else false
<Apply function="isMissing">
<FieldRef field="Str"/>
</Apply>
|
Further boolean functions.
Return true or false. Result is dependent on applying
either function to two input parameters.
Pseudo-declaration of PMML built-in function lessThan:
<DefineFunction name="lessThan" dataType="boolean">
<ParameterField name="x"/>
<ParameterField name="y"/>
... implementation built-in ...
</DefineFunction>
|
Example: Check if field A is less than field B. If so, returns true, else false.
<Apply function="lessThan">
<FieldRef field="A"/>
<FieldRef field="B"/>
</Apply>
|
Further boolean functions.
Evaluate the results of two or more boolean values.
- and: True only if all input values are true, false otherwise.
- or: True if a single input value is true, false only if all input values are false.
Pseudo-declaration of PMML built-in function and:
<DefineFunction name="and" dataType="boolean">
The function takes a variable number
of fields as parameters
... implementation built-in ...
</DefineFunction>
|
Example: Check if field A is less than 3 and field B is less than 4. If so, returns true, else false.
<Apply function="and">
<Apply function="lessThan">
<FieldRef field="A"/>
<Constant dataType="integer">3</Constant>
</Apply>
<Apply function="lessThan">
<FieldRef field="B"/>
<Constant dataType="integer">4</Constant>
</Apply>
</Apply>
|
Further boolean function.
Negates input boolean value.
Pseudo-declaration of PMML built-in function not:
<DefineFunction name="not" dataType="boolean">
<ParameterField name="x" dataType="boolean"/>
... implementation built-in ...
</DefineFunction>
|
Example: Check if field A is not less than B (i.e. greater or equal to B). If so, returns true, else false.
<Apply function="not">
<Apply function="lessThan">
<FieldRef field="A"/>
<FieldRef field="B"/>
</Apply>
</Apply>
|
Further boolean functions.
Evaluates if a field value is contained in a given list of values.
- isIn: True if the field value is contained in list of values.
- isNotIn: True if the field value is not contained in list of values.
Pseudo-declaration of PMML built-in function isIn:
<DefineFunction name="isIn" dataType="boolean">
<ParameterField name="x"/>
The list takes a variable number
of fields as parameters
... implementation built-in ...
</DefineFunction>
|
Example: Check if field color is in (red, green, blue). If so, returns true, else false.
<Apply function="isIn">
<FieldRef field="color"/>
<Constant dataType="string">red</Constant>
<Constant dataType="string">green</Constant>
<Constant dataType="string">blue</Constant>
</Apply>
|
Implements IF-THEN-ELSE logic. The ELSE part is optional.
Pseudo-declaration of PMML built-in function if:
<DefineFunction name="if">
<ParameterField name="x" dataType="boolean"/>
<ParameterField name="A"/> THEN part is required
<ParameterField name="B"/> ELSE part is optional
... implementation built-in ...
</DefineFunction>
|
Example: Check if field color is in (red, green, blue). If so, returns "primary", else "other".
<Apply function="if">
<Apply function="isIn">
<FieldRef field="color"/>
<Constant dataType="string">red</Constant>
<Constant dataType="string">green</Constant>
<Constant dataType="string">blue</Constant>
</Apply>
<Constant dataType="string">primary</Constant>
<Constant dataType="string">other</Constant>
</Apply>
|
Returns a string where all lowercase characters in the input string
are replaced by their uppercase variants.
Pseudo-declaration of PMML built-in function uppercase:
<DefineFunction name="uppercase" dataType="string">
<ParameterField name="input" dataType="string"/>
... implementation built-in ...
</DefineFunction>
|
The function uppercase uses the Unicode definitions for
classifying characters as uppercase / lowercase.
See XQuery
fn:upper-case
Example: Return the field Str with all characters in upper case.
<Apply function="uppercase">
<FieldRef field="Str"/>
</Apply>
|
Assuming
Str="aBc9" the result corresponding to this
Apply element
is
"ABC9".
Extracts a substring from an input string.
Pseudo-declaration of PMML built-in function substring:
<DefineFunction name="substring" dataType="string">
<ParameterField name="input" dataType="string"/>
<ParameterField name="startPos" dataType="integer"/>
<ParameterField name="length" dataType="integer"/>
...
See XQuery fn:substring
...
</DefineFunction>
|
startPos
and
length
must be positive integers.
The first character of a string is located at position 1 (not position 0).
Example: Return the 3 characters of field Str beginning at position 2.
<Apply function="substring">
<FieldRef field="Str"/>
<Constant dataType="integer">2</Constant>
<Constant dataType="integer">3</Constant>
</Apply>
|
Assuming
Str="aBc9x" the result corresponding to this
Apply element
is
"Bc9".
Returns a string where leading and trailing characters in the input string
are removed. Note that trailing blanks in PMML, by definition, are not
significant when strings are compared.
Pseudo-declaration of PMML built-in function trimBlanks:
<DefineFunction name="trimBlanks" dataType="string">
<ParameterField name="input" dataType="string"/>
... implementation built-in ...
</DefineFunction>
|
Blanks include tab and newline characters.
Use definitions according to Unicode.
Example: Trim blanks of field Str.
<Apply function="trimBlanks">
<FieldRef field="Str"/>
</Apply>
|
Assuming
Str=" aBc9x " the result corresponding to this
Apply element
is
"aBc9x".
Formats numbers according to a pattern.
The pattern uses the Posix descriptors as used, e.g., in the C function printf.
Pseudo-declaration of PMML built-in function formatNumber:
<DefineFunction name="formatNumber" dataType="string">
<ParameterField name="input" optype="continuous"/>
<ParameterField name="pattern" dataType="string"/>
... implementation built-in ...
</DefineFunction>
|
Example: Convert a number in the field Num into a string of
length 3 with leading blanks.
<Apply function="formatNumber">
<FieldRef field="Num"/>
<Constant>%3d</Constant/>
</Apply>
|
Assuming
Num=2 the result corresponding to this
Apply element is the
string
" 2".
Formats date and time value according to a pattern.
The pattern is a Posix descriptors as used, e.g., in the C function strftime or the Unix command date.
Pseudo-declaration of PMML built-in function formatDatetime:
<DefineFunction name="formatDatetime" optype="categorical">
<ParameterField name="input" optype="ordinal"/>
<ParameterField name="pattern" dataType="string"/>
... implementation built-in ...
</DefineFunction>
|
input
must be a date or time or dateTime.
Example: Format a date value as 'Month/Day/Year'.
<DerivedField name="StartDateUS" optype="categorical">
<Apply function="formatDatetime">
<FieldRef field="StartDate"/>
<Constant>%m/%d/%y</Constant>
</Apply>
</DerivedField>
|
With
StartDate
being the date August 20th, 2004 the result is
StartDateUS="08/20/04"
.
Function for transforming dates into integers.
The type
dateDaysSinceYear is a variant of the
type
date where the values are represented as the number of days
since Year-01-01. The date January 1 of
Year is represented by the
number 0. January 2 of
Year is represented by 1, February 1 of
Year is represented by 31, etc. Dates before January 1 of
Year are represented as negative numbers. For example, values of
type
dateDaysSince[1960] are the number of days since January 1, 1960.
The date January 1, 1960 is represented by the number 0.
For example, the date April 1, 2003 can be converted to the value 15796 of
type
dateDaysSince[1960].
Pseudo-declaration of PMML built-in function dateDaysSinceYear:
<DefineFunction name="dateDaysSinceYear" optype="continuous">
<ParameterField name="input" optype="ordinal"/>
<ParameterField name="referenceYear" optype="continuous"/>
</DefineFunction>
|
input
must be of datatype
date or
dateTime.
Example: Calculate days since 1970.
<DerivedField name="PurchaseDateDays" optype="continuous">
<Apply function="dateDaysSinceYear"/>
<FieldRef field="PurchaseDate"/>
<Constant>1970</Constant>
</Apply>
</DerivedField>
|
Function for transforming dates into integers.
The type
dateSecondsSinceYear is a variant of the type date where the
values are represented as the number of seconds since midnight starting
the first day of
Year (which is represented by 0). 1 minute
after midnight on January 1 of
Year is represented by 60,
1 hour after midnight on January 1 of
Year is represented by 3600, etc.
Times before January 1 of
Year are represented as negative numbers.
For example, values of type
dateSecondsSince[1960] are the number of
seconds since the midnight starting January 1, 1960. 30 minutes and 3 seconds
after 3 o'clock in the morning of January 3, 1960 can be converted to the
value 185403 of type
dateSecondsSince[1960].
Pseudo-declaration of PMML built-in function dateSecondsSinceYear:
<DefineFunction name="dateSecondsSinceYear" optype="continuous">
<ParameterField name="input" optype="ordinal"/>
<ParameterField name="referenceYear" optype="continuous"/>
</DefineFunction>
|
input
must be of datatype
date or
dateTime.
If input is of datatype
date, it is assumed that
the time is 00:00:00 at this date.
Example: Create a new field PurchaseDateSeconds from the
PurchaseDate attribute relative to the year 1970.
<DerivedField name="PurchaseDateSeconds" optype="continuous">
<Apply function="dateSecondsinceYear"/>
<FieldRef field="PurchaseDate"/>
<Constant>1970</Constant>
</Apply>
</DerivedField>
|
Function for transforming dates into integers.
For example, Midnight returns a value of 0, 1 second after
midnight (00:00:01) would return a value of 1, one minute after midnight
would return a value of 60, etc. 23 minutes and 30 seconds after 5 o'clock
in the morning should return 19410.
Pseudo-declaration of PMML built-in function dateSecondsSinceMidnight:
<DefineFunction name="dateSecondsSinceMidnight" optype="continuous">
<ParameterField name="input" optype="ordinal"/>
</DefineFunction>
|
input
must be of datatype time or dateTime.
Example: Create a new field PurchaseDateSeconds from the
PurchaseDate attribute relative to midnight.
<DerivedField name="PurchaseDateSeconds" optype="continuous">
<Apply function="dateSecondsSinceMidnight"/>
<FieldRef field="PurchaseDate"/>
</Apply>
</DerivedField>
|