|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||
PMML 4.4.1 - Definition and Application of FunctionsPMML provides a number of predefined functions that support fine-grained transformations such as changing characters to upper case or converting date and time values to strings. The predefined functions are built into PMML because they cannot be defined by expressions in PMML itself or because a definition would be too complex. Without support for such functions an application would have to perform the transformations before using a PMML model. The transformations that were applied when the model was created must be equivalent to the transformations when the model is applied to new data. By integrating some of the transformations directly into the PMML model, the definition and execution of the data flow becomes less error-prone. PMML also supports the definitions of new functions that have other PMML expressions in the function body. The function represents a parameterized expression. The semantics of applying a 'user-defined' function in PMML is:
A function can be applied to one or more other expressions such as
constants, fields or results of transformations, see the group
In order to allow a single function specification to be applicable for
multiple
For example, if an For ParameterFields, both dataType and optype are
optional. When the specified In cases where the specified
The SchemaThe XML Schema for the definition and application of functions is<xs:element name="DefineFunction"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="ParameterField" minOccurs="1" maxOccurs="unbounded"/> <xs:group ref="EXPRESSION"/> </xs:sequence> <xs:attribute name="name" type="xs:string" use="required"/> <xs:attribute name="optype" type="OPTYPE" use="required"/> <xs:attribute name="dataType" type="DATATYPE"/> </xs:complexType> </xs:element> <xs:element name="ParameterField"> <xs:complexType> <xs:attribute name="name" type="FIELD-NAME" use="required"/> <xs:attribute name="optype" type="OPTYPE"/> <xs:attribute name="dataType" type="DATATYPE"/> <xs:attribute name="displayName" type="xs:string"/> </xs:complexType> </xs:element> <xs:element name="Apply"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="EXPRESSION" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="function" type="xs:string" use="required"/> <xs:attribute name="mapMissingTo" type="xs:string"/> <xs:attribute name="defaultValue" type="xs:string"/> <xs:attribute name="invalidValueTreatment" type="INVALID-VALUE-TREATMENT-METHOD" default="returnInvalid"/> </xs:complexType> </xs:element> The DefineFunction is used to define new (user-defined) functions
as variations or compositions of existing functions or transformations. The
function's name must be unique and must not conflict with other
function names, either defined by PMML or other user-defined functions. The
The element Apply defines the application of a function. The
function itself is identified by name with the function attribute. The
actual parameters of the function application are given in the content of the
element. Each actual argument value is given by an The optional attribute mapMissingTo defines the output value for the cases when any of the function inputs are missing. This means that if it is specified and any of the input values of the function are missing then the function is not applied at all and the mapMissingTo value is returned instead. This is useful when the applied function cannot handle missing values. On the contrary, the optional value defaultValue defines an output value when the function returns a missing value. In other words, when a defaultValue value is provided, the function is applied first and if it produces a missing value the defaultValue is returned instead. The application of a function may sometimes yield invalid results (e.g. a
division by zero). As in the case of a Output table for Apply('*' stands for any combination, empty cell stands for no value)
Example applying a built-in functionData cleansing is one of the common tasks done in preparing data for mining. Some of these operations can be supported directly in a PMML model. The following example demonstrates how to convert string values to upper case by applying the built-in function uppercase. Assuming that the original input data contains names of product groups and
the names are provided in the field "prodgroup", we define a new
<DerivedField name="PGNorm" dataType="string" optype="categorical"> <Apply function="uppercase"> <FieldRef field="prodgroup"/> </Apply> </DerivedField> That is, when the value of the field prodgroup is, e.g., "Non-Food", the value of the field PGNorm becomes "NON-FOOD". A Example of user-defined functionA New user-defined functions can be specified in a model using the element
Example:<TransformationDictionary> <!-- define a new function called "AMPM" --> <DefineFunction name="AMPM" dataType="string" optype="categorical"> <!-- result type is "string" --> <!-- declaration of formal parameters --> <ParameterField name="TimeVal" optype="continuous" dataType="integer" displayName="Time value"/> <!-- there can be more than one parameter field --> <!-- The function body can be any expression--> <!-- Parameter names are used like field names in the expression --> <Discretize field="TimeVal"> <!-- uses name of parameter field --> <DiscretizeBin binValue="AM"> <Interval closure="closedClosed" leftMargin="0" rightMargin="43199"/> </DiscretizeBin> <DiscretizeBin binValue="PM"> <Interval closure="closedOpen" leftMargin="43200" rightMargin="86400"/> </DiscretizeBin> </Discretize> </DefineFunction> <!-- use function "AMPM" in a DerivedField --> <DerivedField name="Shift" dataType="string" optype="categorical"> <Apply function="AMPM"> <FieldRef field="StartTime"/> </Apply> </DerivedField> <!-- extract the hour from a time value --> <DerivedField name="StartHour" dataType="string" optype="categorical"> <Apply function="format-datetime"> <Constant>%H</Constant> <FieldRef field="StartTime"/> </Apply> </DerivedField> </TransformationDictionary> We assume that the field StartTime is defined with
Note that we use a notation In general, the application of a function looks like: <Apply function="MyFunc" xmlns="https://www.dmg.org/PMML-4_2"> <i>parameter expression 1</i> <i>parameter expression 2</i> <i> ... </i> <i>parameter expression n</i> </Apply>The expressions are mapped by position to the arguments in the function definition. Another example:<DefineFunction name="STATEGROUP" dataType="string" optype="categorical"> <ParameterField name="#1" optype="categorical" dataType="string"/> <MapValues outputColumn="Region"> <FieldColumnPair field="#1" column="State"/> <InlineTable> <row><State>CA</State><Region>West</Region></row> <row><State>OR</State><Region>West</Region></row> <row><State>NC</State><Region>East</Region></row> </InlineTable> </MapValues> </DefineFunction> <DerivedField name="Group" dataType="string" optype="categorical"> <Apply function="STATEGROUP"> <FieldRef field="State"/> </Apply> </DerivedField> The new function with name STATEGROUP accepts one argument. The
definition uses the transformation |
||||||||||||||||||||||||||||||||||
|