Overview of most significant Changes
- more builtin functions
- specification of the dataType is now required in all places where applicable
- new link functions loglog and cauchit
- MiningSchema can now specify how to treat invalid input values
- Sequence models have been reworked:
- SetPredicate is now deprecated
- clarified distinction between informal attributes and actual constraints from the model building phase
- changes break compatibility
- more post-processing capabilities in Targets element
- various clarifications in TreeModel:
- added mechanism to specify confidences
- missing value treatment within the tree is now configurable
- behaviour when no child node applies is now configurable
List of all Changes
Associations
- typo: (Paragraph 1): The PMML spec says "a certain product" is associated with a set of other products. Text changed to say "a certain product or set of products"
- typo: "Here is a description of the attributes in an item" changed to itemset.
- added scoring procedure description.
BuiltinFunctions
- added further functions exp, pow, threshold, floor, ceil and round
ClusteringModel
- introduce default for compareFunction in element
ComparisonMeasure
- add default 1 for fieldWeight (already present in wording)
- typo: Reworded awkward sentence from "In particular this allows to map categorical input fields?" to "In particular this allows the mapping of categorical input fields..."
- typo: Eliminated typo in "NUM-ARRAY;." To "NUM-ARRAY."
Conformance
- typo: Added 3.0 models (Text, Rules and SVM) to bullet list
DataDictionary
- datatype is required for DataField
- continuous fields can have an unlimited amount of value range intervals
- removed dataType: boolean
Functions
- make attribute function in element Apply required
- optype is required for DefineFunction
- dataType clarification, ie. casts
GeneralRegression
- typo: Factor List: Changed text from "Each name in the list must match a name from the dictionary?" to "Each name in the list must match a DataField name or a DerivedField name."
- typo: targetVariableName: Changed text from "Each name in the list must match a name from the dictionary?" to "Each name in the list must match a DataField name or a DerivedField name."
- typo: Observation: "...Each name in the list must match a name from the dictionary?" but it doesn't state which dictionary GeneralRegression (Covariate List): Changed text from "Each name in the list must match a name from the dictionary?" to "Each name in the list must match a DataField name or a DerivedField name."
- added new link functions loglog and cauchit
- added scoring procedure
GeneralStructure
- clarified thousand separator
- typo: Changed "xmlns="https://www.dmg.org/PMML-3_0" to "xmlns="https://www.dmg.org/PMML-3_1" in three places [Two items]
- typo: Extension Mechanism: Changed attribute to an x- attribute, "<X-DataFieldSource x-sourceKnown="yes" >"
- typo: x- attributes have been deprecated so removed the "x-author" attribute in example
MiningSchema
- added attribute invalidValueTreatment for invalid value handling
- clarified missingValueReplacement strategy asMean
- typo: highValue and lowValue: Changed text to say these attributes are required, "...used in conjunction with, and are required"
ModelComposition
- removed final example
- clarified where final prediction comes from
- added new attributes from tree to DecisionTree (missingValueStrategy, missingValuePenalty, noTrueChildStrategy)
NaiveBayes
- re-added example model (gone since PMML 2.1 for unknown reasons)
NeuralNetwork
- clarify usage of attribute width
- define what to do in case of ties with classification
Output
- made attribute targetField in OutputField optional
- corrected definition of attribute targetField
Regression
- correct formulas for softmax and logit functions
- add new link functions loglog and cauchit in accordance with GeneralRegression
Sequence
- rework, most important changes: distinction between information and constraints, deprecated SetPredicate
SupportVectorMachine
- detail how to support categorical input variables, binary and non-binary classification
Targets
- TargetValue is required only for classification models
- added further post-processing capabilities
Transformations
- make InlineTable/TableLocator in MapValues optional to allow indicator variables for missing values
- added example for multi-dimensional case with MapValues
- DerivedFields can only have a list of valid values to define the order of ordinal fields. Value ranges for categorical or continuous fields are not possible anymore.
- dataType clarification
- Clarify that Discretization is a mapping to discrete values, not strings like previously stated
TreeModel
- Confidences and Missing Value handling - added attributes defaultChild (to Node), missingValueStrategy, missingValuePenalty (both to TreeModel), confidence (to ScoreDistribution) - note that missingValuePenalty applies only to the use of surrogate rules or the missingValueStrategy defaultChild
- Clarify evaluation of xor operator
- Correction: specify that field can also be a DerivedField in the LocalTansformations or TransformationDictionary
- allow Nodes with only a single child
- define noTrueChildStrategy to catch cases where no children can be chosen