PMML 3.2 - Built-in functions
Almost all programming languages come with a set of predefined functions that perform low-level operations. PMML has a similar set of functions.
- +, -, * and /
- min, max, sum and avg
- log10, ln, sqrt, abs, exp, pow, threshold, floor, ceil, round
- uppercase
- substring
- trimBlanks
- formatNumber
- formatDatetime
- dateDaysSinceYear
- dateSecondsSinceYear
- dateSecondsSinceMidnight
The definitions of functions in PMML generally follow the design of functions and operators in XQuery. Further ideas are taken from MathML , XPath, , Java Date formats .
+, -, * and /
Functions for simple arithmetics.
Pseudo-declaration of PMML built-in function +:
<DefineFunction name="+" optype="continuous"> <ParameterField name="a" optype="continuous"> <ParameterField name="b" optype="continuous"> ... implementation built-in ... </DefineFunction> |
Example: Return the difference between input fields named A, B.
<Apply function="-"> <FieldRef field="A"/> <FieldRef field="B"/> </Apply> |
min, max, sum and avg
Returns an aggregation of a variable number of input fields.
Pseudo-declaration of PMML built-in function min:
<DefineFunction name="min" optype="continuous"> The function takes a variable number of <FieldRef/> as parameters ... implementation built-in ... </DefineFunction> |
Example: Return the minimum value of input fields named A, B, and C.
<Apply function="min"> <FieldRef field="A"/> <FieldRef field="B"/> <FieldRef field="C"/> </Apply> |
log10, ln, sqrt, abs, exp, pow, threshold, floor, ceil, round
Further mathematical functions.
Pseudo-declaration of PMML built-in function log10:
<DefineFunction name="log10" optype="continuous"> <ParameterField name="x" optype="continuous"> ... implementation built-in ... </DefineFunction> |
Example: Return the logarithm to the base 10 of an input field A.
<Apply function="log10"> <FieldRef field="A"/> </Apply> |
Pseudo-declaration of PMML built-in functions pow and floor:
<DefineFunction name="pow" optype="continuous"> <ParameterField name="x" optype="continuous"> <ParameterField name="y" optype="continuous"> ... implementation built-in ... </DefineFunction> <DefineFunction name="floor" datatype="integer"> <ParameterField name="x" optype="continuous"> ... implementation built-in ... </DefineFunction> |
Example: Return the cube of an input field A.
<Apply function="pow"> <FieldRef field="A"/> <Constant dataType="integer">3</Constant> </Apply> |
uppercase
Returns a string where all lowercase characters in the input string are replaced by their uppercase variants.
Pseudo-declaration of PMML built-in function uppercase:
<DefineFunction name="uppercase" dataType="string"> <ParameterField name="input" dataType="string"> ... implementation built-in ... </DefineFunction> |
Example: Return the field Str with all characters in upper case.
<Apply function="uppercase"> <FieldRef field="Str"/> </Apply> |
substring
Extracts a substring from an input string.
Pseudo-declaration of PMML built-in function substring:
<DefineFunction name="substring" dataType="string"> <ParameterField name="input" dataType="string"/> <ParameterField name="startPos" dataType="integer"/> <ParameterField name="length" dataType="integer"/> ... See XQuery fn:substring ... </DefineFunction> |
startPos
and length
must be positive integers.
The first character of a string is located at position 1 (not position 0).
Example: Return the 3 characters of field Str beginning at position 2.
<Apply function="substring"> <FieldRef field="Str"/> <Constant dataType="integer">2</Constant/> <Constant dataType="integer">3</Constant/> </Apply> |
trimBlanks
Returns a string where leading and trailing characters in the input string are removed. Note that trailing blanks in PMML, by definition, are not significant when strings are compared.
Pseudo-declaration of PMML built-in function trimBlanks:
<DefineFunction name="trimBlanks" dataType="string"> <ParameterField name="input" dataType="string"> ... implementation built-in ... </DefineFunction> |
Example: Trim blanks of field Str.
<Apply function="trimBlanks"> <FieldRef field="Str"/> </Apply> |
formatNumber
Formats numbers according to a pattern. The pattern uses the Posix descriptors as used, e.g., in the C function printf.Pseudo-declaration of PMML built-in function formatNumber:
<DefineFunction name="formatNumber" dataType="string"> <ParameterField name="input" optype="continuous"/> <ParameterField name="pattern" dataType="string"> ... implementation built-in ... </DefineFunction> |
Example: Convert a number in the field Num into a string of length 3 with leading blanks.
<Apply function="formatNumber"> <FieldRef field="Num"/> <Constant>%3d</Constant/> </Apply> |
formatDatetime
Formats date and time value according to a pattern. The pattern is a Posix descriptors as used, e.g., in the C function strftime or the Unix command date. See, e.g., Posix datetime descriptorsPseudo-declaration of PMML built-in function formatDatetime:
<DefineFunction name="formatDatetime" optype="categorical"> <ParameterField name="input" optype="ordinal"> <ParameterField name="pattern" dataType="string"> ... implementation built-in ... </DefineFunction> |
input
must be a date or time or dateTime.
Example: Format a date value as 'Month/Day/Year'.
<DerivedField name="StartDateUS" optype="categorical"> <Apply function="formatDatetime"> <FieldRef field="StartDate"/> <Constant>%m/%d/%y</Constant> </Apply> </DerivedField> |
StartDate
being the date August 20th, 2004 the result is StartDateUS="08/20/04"
.
dateDaysSinceYear
Function for transforming dates into integers. The type dateDaysSinceYear is a variant of the type date where the values are represented as the number of days since Year-01-01. The date January 1 of Year is represented by the number 0. January 2 of Year is represented by 1, February 1 of Year is represented by 31, etc. Dates before January 1 of Year are represented as negative numbers. For example, values of type dateDaysSince[1960] are the number of days since January 1, 1960. The date January 1, 1960 is represented by the number 0.For example, the date April 1, 2003 can be converted to the value 15796 of type dateDaysSince[1960].
Pseudo-declaration of PMML built-in function dateDaysSinceYear:
<DefineFunction name="dateDaysSinceYear" optype="continuous"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="referenceYear" optype="continuous"/> </DefineFunction> |
input
must be of datatype date or dateTime.
Example: Calculate days since 1970.
<DerivedField name="PurchaseDateDays" optype="continuous"> <Apply function="dateDaysSinceYear"/> <FieldRef field="PurchaseDate"/> <Constant>1970</Constant> </Apply> </DerivedField> |
dateSecondsSinceYear
Function for transforming dates into integers. The type dateSecondsSinceYear is a variant of the type date where the values are represented as the number of seconds since midnight starting the first day of Year (which is represented by 0). 1 minute after midnight on January 1 of Year is represented by 60, 1 hour after midnight on January 1 of Year is represented by 3600, etc. Times before January 1 of Year are represented as negative numbers.For example, values of type dateSecondsSince[1960] are the number of seconds since the midnight starting January 1, 1960. 30 minutes and 3 seconds after 3 o'clock in the morning of January 3, 1960 can be converted to the value 185403 of type dateSecondsSince[1960].
Pseudo-declaration of PMML built-in function dateSecondsSinceYear:
<DefineFunction name="dateSecondsSinceYear" optype="continuous"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="referenceYear" optype="continuous"/> </DefineFunction> |
input
must be of datatype date or dateTime.
If input is of datatype date, it is assumed that
the time is 00:00:00 at this date.
Example: Create a new field PurchaseDateSeconds from the PurchaseDate attribute relative to the year 1970.
<DerivedField name="PurchaseDateSeconds" optype="continuous"> <Apply function="dateSecondsinceYear"/> <FieldRef field="PurchaseDate"/> <Constant>1970</Constant> </Apply> </DerivedField> |
dateSecondsSinceMidnight
Function for transforming dates into integers. For example, Midnight returns a value of 0, 1 second after midnight (00:00:01) would return a value of 1, one minute after midnight would return a value of 60, etc. 23 minutes and 30 seconds after 5 o'clock in the morning should return 19410.Pseudo-declaration of PMML built-in function dateSecondsSinceMidnight:
<DefineFunction name="dateSecondsSinceMidnight" optype="continuous"> <ParameterField name="input" optype="ordinal"/> </DefineFunction> |
input
must be of datatype time or dateTime.
Example: Create a new field PurchaseDateSeconds from the PurchaseDate attribute relative to midnight.
<DerivedField name="PurchaseDateSeconds" optype="continuous"> <Apply function="dateSecondsSinceMidnight"/> <FieldRef field="PurchaseDate"/> </Apply> </DerivedField> |