|
||||||||||||||||||||
|
||||||||||||||||||||
| ||||||||||||||||||||
PMML 4.4.1 - Built-in functionsAlmost all programming languages come with a set of predefined functions that perform low-level operations. PMML has a similar set of functions.
The definitions of functions in PMML generally follow the design of functions and operators in XQuery. Further ideas are taken from MathML , XPath , Java Date formats. Reference is made herein to the constants NaN, INF, and -INF. They are defined in Transformations.html. Except as noted below, any missing inputs to a built-in function will result in a missing value being returned. Arithmetic Functions+, -, * and /Functions for simple arithmetics. Pseudo-declaration of PMML built-in function
|
Function | Missing | Valid | Invalid |
---|---|---|---|
isMissing | true | false | false |
isNotMissing | false | true | true |
isValid | false | true | false |
isNotValid | false | false | true |
Further boolean functions.
Return true
or false
. Result is dependent on applying either
function to two input parameters.
lessThan
<DefineFunction name="lessThan" optype="categorical" dataType="boolean"> <ParameterField name="x"/> <ParameterField name="y"/> ... implementation built-in ... </DefineFunction>
Example: Check if field A is less than field B. If so, returns
true
, else false
.
<Apply function="lessThan"> <FieldRef field="A"/> <FieldRef field="B"/> </Apply>
By definition, INF is taken as greater than any finite number or -INF and -INF is taken as less than any finite number or INF. But any attempt to either with itself is indeterminate and will result in a missing value being returned.
Further boolean functions.
Evaluate the results of two or more boolean values.
true
only if all input values are true, false
otherwise.true
if a single input value is true, false
only if all
input values are false.and
<DefineFunction name="and" optype="categorical" dataType="boolean"> The function takes a variable number of fields as parameters ... implementation built-in ... </DefineFunction>
Example: Check if field A is less than 3 and field B is less than
4. If so, return true
, else false
.
<Apply function="and"> <Apply function="lessThan"> <FieldRef field="A"/> <Constant dataType="integer">3</Constant> </Apply> <Apply function="lessThan"> <FieldRef field="B"/> <Constant dataType="integer">4</Constant> </Apply> </Apply>
Further boolean function.
Negates input boolean value.
not
<DefineFunction name="not" optype="categorical" dataType="boolean"> <ParameterField name="x" dataType="boolean"/> ... implementation built-in ... </DefineFunction>
Example: Check if field A is not less than B
(i.e. greater or equal to B). If so, returns true
,
else false
.
<Apply function="not"> <Apply function="lessThan"> <FieldRef field="A"/> <FieldRef field="B"/> </Apply> </Apply>
Further boolean functions.
Evaluates if a field value is contained in a given list of values.
isIn
<DefineFunction name="isIn" optype="categorical" dataType="boolean"> <ParameterField name="x"/> The list takes a variable number of fields as parameters ... implementation built-in ... </DefineFunction>
Example: Check if field color is in
(red, green, blue)
. If so, return true
, else
false
.
<Apply function="isIn"> <FieldRef field="color"/> <Constant dataType="string">red</Constant> <Constant dataType="string">green</Constant> <Constant dataType="string">blue</Constant> </Apply>
Implements IF-THEN-ELSE logic. The ELSE part is optional. If the ELSE part
is absent and the boolean value is false
then a missing value is
returned.
if
<DefineFunction name="if"> <ParameterField name="x" dataType="boolean"/> <ParameterField name="A"/> THEN part is required <ParameterField name="B"/> ELSE part is optional ... implementation built-in ... </DefineFunction>
Example: Check if field color is in
(red, green, blue)
. If so, returns "primary", else "other".
<Apply function="if"> <Apply function="isIn"> <FieldRef field="color"/> <Constant dataType="string">red</Constant> <Constant dataType="string">green</Constant> <Constant dataType="string">blue</Constant> </Apply> <Constant dataType="string">primary</Constant> <Constant dataType="string">other</Constant> </Apply>
Returns a string where all lowercase characters in the input string are replaced by their uppercase variants.
uppercase
<DefineFunction name="uppercase" optype="categorical" dataType="string"> <ParameterField name="input" dataType="string"/> ... implementation built-in ... </DefineFunction>
The function uppercase
uses the Unicode definitions for classifying
characters as uppercase / lowercase. See XQuery fn:upper-case
Example: Return the field Str with all characters in upper case.
<Apply function="uppercase"> <FieldRef field="Str"/> </Apply>
Assuming Str="aBc9"
the result corresponding to this
Apply
element is "ABC9"
.
Returns a string where all uppercase characters in the input string are replaced by their lowercase variants.
lowercase
:<DefineFunction name="lowercase" optype="categorical" dataType="string"> <ParameterField name="input" dataType="string"/> ... implementation built-in ... </DefineFunction>
The function lowercase
uses the Unicode definitions for classifying
characters as uppercase / lowercase. See XQuery
fn:lower-case.
Example: Return the field Str with all characters in lower case.
<Apply function="lowercase"> <FieldRef field="Str"/> </Apply>
Assuming Str="aBc9"
the result corresponding to this
Apply
element is "abc9"
.
Returns the string length for an input string.
stringLength
<DefineFunction name="stringLength" optype="continuous" dataType="integer"> <ParameterField name="input" dataType="string"/> ... See XQuery fn:string-length ... </DefineFunction>
Example: Return the length of string in the field Str.
<Apply function="stringLength"> <FieldRef field="Str"/> </Apply>
Assuming Str="aBc9x"
the result corresponding to this
Apply
element is 5
.
Extracts a substring from an input string.
substring
<DefineFunction name="substring" optype="categorical" dataType="string"> <ParameterField name="input" dataType="string"/> <ParameterField name="startPos" dataType="integer"/> <ParameterField name="length" dataType="integer"/> ... See XQuery fn:substring ... </DefineFunction>
startPos and length must be positive integers. The first character of a string is located at position 1 (not position 0).
Example: Return the 3 characters of field Str beginning at position 2.
<Apply function="substring"> <FieldRef field="Str"/> <Constant dataType="integer">2</Constant> <Constant dataType="integer">3</Constant> </Apply>
Assuming Str="aBc9x"
the result corresponding to this
Apply
element is "Bc9"
.
Returns a string where leading and trailing characters in the input string are removed. Note that trailing blanks in PMML, by definition, are not significant when strings are compared.
trimBlanks
:<DefineFunction name="trimBlanks" optype="categorical" dataType="string"> <ParameterField name="input" dataType="string"/> ... implementation built-in ... </DefineFunction>
Blanks include tab and newline characters. Use definitions according to Unicode.
Example: Trim blanks of field Str.
<Apply function="trimBlanks"> <FieldRef field="Str"/> </Apply>
Assuming Str=" aBc9x "
the result corresponding to this
Apply
element is "aBc9x"
.
Returns a string as a result of the concatenation of two or more parameters.
concat
:<DefineFunction name="concat" optype="categorical" dataType="string"> <ParameterField name="x"/> <ParameterField name="y"/> ... See XQuery fn:concat ... </DefineFunction>
Example: Concatenates field month, constant value "-" and field year.
<Apply function="concat"> <FieldRef field="month"/> <Constant>-</Constant> <FieldRef field="year"/> </Apply>
Assuming month=2
and year=2000
the result corresponding to this
Apply
element is "2-2000".
Replaces each substring in a given input string that matches a given pattern or regular expression by another string. Note that for regular expressions, PMML follows the specification implemented in the PCRE (Perl Compatible Regular Expressions) library.
replace
<DefineFunction name="replace" optype="categorical" dataType="string"> <ParameterField name="input" dataType="string"/> <ParameterField name="pattern" dataType="string"/> <ParameterField name="replacement" dataType="string"/> ... See XQuery fn:replace ... </DefineFunction>
Example: Replaces a sequence of "B" letters by letter "c".
<Apply function="replace"> <Constant>BBBB</Constant> <Constant>B+</Constant> <Constant>c</Constant> </Apply>
Attempts to match a pattern or regular expression against a given string.
It returns a Boolean: true
if a match is found or false
if not.
Note that for regular expressions, PMML follows the specification implemented in the
PCRE (Perl Compatible Regular Expressions) library.
matches
<DefineFunction name="matches" optype="categorical" dataType="boolean"> <ParameterField name="input" dataType="string"/> <ParameterField name="pattern" dataType="string"/> ... See XQuery fn:matches ... </DefineFunction>
Example: Attempts to match pattern "ary" against the value of field month.
<Apply function="matches"> <FieldRef field="month"/> <Constant>ar?y</Constant> </Apply>
Assuming month is either "January", "February", or "May";
the result corresponding to this Apply
element is true
. For
any other month, the result is false
.
Formats numbers according to a pattern. The pattern uses the Posix descriptors as used,
e.g., in the C function printf
.
formatNumber
<DefineFunction name="formatNumber" optype="categorical" dataType="string"> <ParameterField name="input" optype="continuous"/> <ParameterField name="pattern" dataType="string"/> ... implementation built-in ... </DefineFunction>
Example: Convert a number in the field Num into a string of length 3 with leading blanks.
<Apply function="formatNumber"> <FieldRef field="Num"/> <Constant>%3d</Constant> </Apply>
Assuming Num=2
the result corresponding to this
Apply
element is the string " 2".
Formats date and time value according to a pattern. The pattern is a Posix descriptors as
used, e.g., in the C function strftime
or the Unix command date.
formatDatetime
<DefineFunction name="formatDatetime" optype="categorical" dataType="string"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="pattern" dataType="string"/> ... implementation built-in ... </DefineFunction>
input must be a date or time or dateTime.
Example: Format a date value as 'Month/Day/Year'.
<DerivedField name="StartDateUS" dataType="string" optype="categorical"> <Apply function="formatDatetime"> <FieldRef field="StartDate"/> <Constant>%m/%d/%y</Constant> </Apply> </DerivedField>
With StartDate being the date August 20th, 2004 the
result is StartDateUS="08/20/04"
.
Function for transforming dates into integers.
The type dateDaysSinceYear is a
variant of the type date
where the values are represented as the
number of days since Year-01-01. The date January 1 of Year is
represented by the number 0. January 2 of Year is represented by 1,
February 1 of Year is represented by 31, etc. Dates before January 1
of Year are represented as negative numbers. For example, values of
type dateDaysSince[1960]
are the number of days since January 1,
1960. The date January 1, 1960 is represented by the number 0.
For example, the date April 1, 2003 can be converted to the value 15796 of
type dateDaysSince[1960]
.
dateDaysSinceYear
<DefineFunction name="dateDaysSinceYear" optype="continuous"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="referenceYear" optype="continuous"/> </DefineFunction>
input must be of datatype date
or dateTime
.
Example: Calculate days since 1970.
<DerivedField name="PurchaseDateDays" dataType="integer" optype="continuous"> <Apply function="dateDaysSinceYear"> <FieldRef field="PurchaseDate"/> <Constant>1970</Constant> </Apply> </DerivedField>
Function for transforming dates into integers.
The type dateSecondsSinceYear is a variant of the type date where the values are represented as the number of seconds since midnight starting the first day of Year (which is represented by 0). 1 minute after midnight on January 1 of Year is represented by 60, 1 hour after midnight on January 1 of Year is represented by 3600, etc. Times before January 1 of Year are represented as negative numbers.
For example, values of type dateSecondsSince[1960]
are the number of
seconds since the midnight starting January 1, 1960. 30 minutes and 3 seconds
after 3 o'clock in the morning of January 3, 1960 can be converted to the
value 185403 of type dateSecondsSince[1960]
.
dateSecondsSinceYear
<DefineFunction name="dateSecondsSinceYear" optype="continuous"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="referenceYear" optype="continuous"/> </DefineFunction>
input must be of datatype date
or
dateTime
. If input is of datatype date
, it is assumed that
the time is 00:00:00 at this date.
Example: Create a new field PurchaseDateSeconds from the PurchaseDate attribute relative to the year 1970.
<DerivedField name="PurchaseDateSeconds" dataType="integer" optype="continuous"> <Apply function="dateSecondsinceYear"> <FieldRef field="PurchaseDate"/> <Constant>1970</Constant> </Apply> </DerivedField>
Function for transforming dates into integers.
For example, Midnight returns a value of 0; 1 second after midnight (00:00:01) would return a value of 1; one minute after midnight would return a value of 60; etc. 23 minutes and 30 seconds after 5 o'clock in the morning should return 19410.
dateSecondsSinceMidnight
<DefineFunction name="dateSecondsSinceMidnight" optype="continuous"> <ParameterField name="input" optype="ordinal"/> </DefineFunction>
input must be of datatype time or dateTime.
Example: Create a new field PurchaseDateSeconds from the PurchaseDate attribute relative to midnight.
<DerivedField name="PurchaseDateSeconds" dataType="integer" optype="continuous"> <Apply function="dateSecondsSinceMidnight"> <FieldRef field="PurchaseDate"/> </Apply> </DerivedField>
Functions for normal distribution are widely used in statistical applications. Wikipedia has the following information at https://en.wikipedia.org/wiki/Normal_distribution:
In probability theory, the normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables.
The probability density function (PDF) of the normal distribution with mean Μ and standard deviation Σ is:
If Μ = 0
and Σ = 1
, the distribution is called the
standard normal distribution or the unit normal distribution.
The cumulative distribution function (CDF) of the standard normal distribution, usually denoted with the capital Greek letter Φ (phi), is the integral
In statistics one often uses the related error function, or erf(x)
, defined as the
probability of a random variable with normal distribution of mean 0 and variance 1/2 falling in
the range [-x, x], that is:
These integrals cannot be expressed in terms of elementary functions, and are often said to be special functions. However, many numerical approximations are known.
The two functions are closely related, namely
For a generic normal distribution f with mean Μ and standard deviation Σ, the cumulative distribution function is:
The inverse of normal CDF is called the quantile function. The quantile function of the standard normal distribution is called the probit function, and can be expressed in terms of the inverse error function:
For a normal random variable with mean Μ and variance Σ2, the quantile function is:
PMML defines the following built-in functions related to the normal distribution: normalCDF, normalPDF, normalIDF, stdNormalCDF, stdNormalPDF, stdNormalIDF, erf, normalIDF, and stdNormalIDF.
normalCDF
<DefineFunction name="normalCDF" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> <ParameterField name="mu" optype="continuous" dataType="double"/> <ParameterField name="sigma" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function normalCDF(x, mu, sigma)
returns the value
Φ(x, Μ, Σ) defined above.
The function stdNormalCDF(x)
returns the cumulative distribution function
value of x for the standard normal distribution. Its pseudo-declaration is:
<DefineFunction name="stdNormalCDF" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
Note that Σ here must be positive.
Functions normalPDF(x, Μ, Σ)
,
normalIDF(p, Μ, Σ)
, stdNormalPDF(x)
, and
stdNormalIDF(x)
have similar to above pseudo-declarations and compute probability
distribution functions and inverse distribution functions of normal distribution with mean
Μ and positive standard deviation Σ and of standard normal
distribution respectively.
PMML function erf
is defined similar to stdNormalCDF
and computes
erf(x)
as described above.
expm1
:<DefineFunction name="expm1" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The expm1 function returns ex-1, where x is the argument, and e is the base of the natural logarithms. The domain of this function is the whole real line. If the input is INF then INF is returned. If the input is -INF then -1 is returned.
hypot
:<DefineFunction name="hypot" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> <ParameterField name="y" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
Hypot is a mathematical function that computes the square root of the sum of the
squares of x and y. Therefore, hypot(x,y)
function returns
sqrt(x2 + y2)
, where x and y are two
parameters. This function is defined for all real numbers x and y.
ln1p
:<DefineFunction name="ln1p" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
ln1p is a mathematical function that returns ln(x+1)
, where
x > -1
. If x = -1
, the result is negative infinity, and for
x < -1
the result is NaN
.
rint
<DefineFunction name="rint" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The rint function returns the closest whole number to x, rounding toward the nearest even number if the fractional part is exactly one-half. If x is NaN, a NaN shall be returned.
sin
<DefineFunction name="sin" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function sin(x) returns the trigonometric sine of x, which is assumed to be in radians. The domain of this function is the whole real line. The range is [-1, 1].
asin
<DefineFunction name="asin" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function asin(x) is an inverse trigonometric function and returns the arc-sine
of x as an angle in radians between −π/2 and π/2. The domain of this function is
[-1, 1]. Beyond this domain, the result is NaN
.
sinh
<DefineFunction name="sinh" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function sinh(x) returns the hyperbolic sine of x, which is equal to (ex-e-x)/2. The domain of this function is the whole real line.
cos
<DefineFunction name="cos" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The functionn cos(x) returns the trigonometric cosine of x, which is assumed to be in radians. The domain of this function is the whole real line. The range is [-1, 1].
acos
<DefineFunction name="acos" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function acos(x) is an inverse trigonometric function and returns the
arc-cosine of x as an angle in radians between 0 and π.
The domain of this function is [-1, 1]. Beyond this domain, the result is NaN
.
cosh
<DefineFunction name="cosh" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function cosh(x) returns the hyperbolic cosine of x, which is equal to (ex+e-x)/2. The domain of this function is the whole real line.
tan
<DefineFunction name="tan" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function tan(x) returns the trigonometric tangent of x, which is
assumed to be in radians. The domain is all real numbers except ±π/2, ±3π/2, ±5π/2, …,
where the tan
function is undefined. The range of this function is the whole real
line.
tanh
<DefineFunction name="tanh" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function tanh(x) returns the hyperbolic tangent of x, which is equal to (ex-e-x)/(ex+e-x). The domain of this function is the whole real line.
atan
<DefineFunction name="atan" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction>
The function atan(x) is an inverse trigonometric function and returns the arc-tangent of x as an angle in radians between -π/2 and π/2. The domain of this function is the whole real line.
info at dmg.org |