Using Functions
 PMML3.0 Menu Home PMML Notice and License Changes Conformance General Structure Header Data Dictionary Mining Schema Transformations Statistics Taxomony Targets Output Functions Built-in Functions Model Composition Model Verification Association Rules Cluster Models General Regression Naive Bayes Neural Network Regression Ruleset Sequences Text Models Trees Vector Machine

## PMML 3.0 - Definition and application of functions

PMML provides a number of predefined functions that support fine-grained transformations such as changing characters to upper case or converting date and time values to strings. The predefined functions are built into PMML because they cannot be defined by expressions in PMML itself or because a definition would be too complex.

Without support for such functions an application would have to perform the transformations before using a PMML model. The transformations that were applied when the model was created must be equivalent to the transformations when the model is applied to new data. By integrating some of the transformations directly into the PMML model, the definition and execution of the data flow becomes less error-prone.

PMML also supports the definitions of new functions that have other PMML expressions in the function body. The function represents a parameterized expression. The semantics of applying a 'user-defined' function in PMML is

1. substitute the formal function parameters by the actual argument values, and then
2. replace the function application by the new expression.
That is, the function definitions are just a means for writing certain expressions in a more compact way.

A function can be applied to one or more other expressions such as constants, fields or results of transformations, see the group EXPRESSION in Transformations.html. When a function is applied, the actual arguments are identified by position. A function application itself is a PMML transformation expression. That is, there can be nested invocations of functions.

### Schema

The XML Schema for the definition and application of functions is
 ``` ```
The element Apply defines the application of a function. The function itself is identified by a name. The actual parameters of the function application are given in the content of the element. Each actual argument value is given by an EXPRESSION. The actual arguments are mapped by position to the formal parameters in the corresponding function definition.

The EXPRESSION in the content of DefineFunction is the function body that actually defines the meaning of the new function.

The function body must not refer to fields other than the parameter fields.

### Example applying a built-in function

Data cleansing is one of the common task done in preparing data for mining. Some of these operations can be supported directly in a PMML model. The following example demonstrates how to convert string values to upper case by applying the built-in function upper-case. Assuming that the original input data contains names of product groups and the names are provided in the field "prodgroup", we define a new derived field named "PGNorm" where all values use upper case characters.
 ``` ```
That is, when the value of the field "prodgroup" is, e.g., "Non-Food" the value of the field "PGNorm" becomes "NON-FOOD".

A DerivedField can contain an transformation expression such as <MapValues> or <Discretize>. The element <Apply> is just another transformation expression.

### Example for user-defined function

A derived field can be defined by a possibly complex transformation. If a certain transformation has to be applied to multiple fields it makes sense to encapsulate the definition of the transformation expression in a function and then apply the function multiple times. This reduces the complexity and the size of PMML models.

New 'user-defined' functions can be specified in a model using the element `DefineFunction` in the transformation dictionary.

Examples:

 ``` ... %H ... ```
Example: We assume that the field "StartTime" is defined with dataype="timeSeconds". An actual time value '09:39:02' would be represented as a number 34742, that is, the number of seconds since midnight at the given point in time. The transformation in the function AMPM maps this value to the string "AM". This value comes the actual value of the field "Shift". The input field "Shift" is also used in the definition of the derived field "StartHour". This categorical field has the actual value "09" produced by the date formatting function.

Note that we use a notation <Constant>HH</Constant> for constants. This notation is shorter and easier to handle than the combination <Constant><Value value="HH"></Constant>.

In general, the application of a function looks like

 ``` parameter expression 1 parameter expression 2 ... parameter expression n ```
The expressions are mapped by position to the arguments in the function definition.

Another example:
The new function with name "STATEGROUP" accepts one argument. The definition uses the transformation MapValues. For example, if the function is applied with a value "CA" the result is the string "West". The example also defines a new derived field with name "Group" as the result of applying the function "STATEGROUP" to the field "State".

 ``` CAWest ORWest NCEast ```
 e-mail info at dmg.org