PMML 1.1 -- Mining Schema
Each model contains one mining schema which lists fields as used in that model. This is a subset of the fields as defined in the data dictionary. While the mining schema contains information that is specific to a certain model, the data dictionary contains data definitions which do not vary per model.
<!ELEMENT MiningSchema (Extension*, MiningField+) > <!ENTITY % FIELD-USAGE-TYPE "(active | predicted | supplementary)" > <!ENTITY % OUTLIER-TREATMENT-METHOD "( asIs | asMissingValues | asExtremeValues ) " > |
usageType
active: field used as input (independent field)
predicted: field whose value is predicted by the model
supplementary: field holding additional descriptive information
Supplementary fields are not required to apply a model. They are provided as additional information for explanatory purpose, though. When some field has gone through preprocessing transformations before a model is built, then an additional supplementary field is typically used to describe the statistics for the original field values.
outliers
asIs: field values treated at face value
asMissingValues: outlier values are treated as if they were missing
asExtremeValues: outlier values are changed to a specific high or low value defined in MiningField
<!ELEMENT MiningField (Extension*) > <!ATTLIST MiningField name %FIELD-NAME; #REQUIRED usageType %FIELD-USAGE-TYPE; "active" outliers %OUTLIER-TREATMENT-METHOD; "asIs" lowValue %NUMBER; #IMPLIED highValue %NUMBER; #IMPLIED > |
name: symbolic name of field, same as the name of some field in the data dictionary
highValue and lowValue: used in conjunction with %outlierTreatmentMethod "asExtremeValues" as values for records with outliers in this field if x < lowValue then x = lowValue
Conformance
- outlier treatment 'asIs', i.e. the default value of the attribute outliers in MiningField, is in core; other options are not in core.