PMML 1.1 -- Polynomial Regression
Model DTD and Tag Description
The regression functions are used to determine the relationship between the dependent variable (target variable) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on.
The regression formula is:
Dependent variable = intercept + Sumi (coefficient i * independent variablei ) + error
<!-- regression model. --> <!ELEMENT RegressionModel (Extension*, RegressionTable) > <!ATTLIST RegressionModel modelType (linearRegression| stepwisePolynomialRegression) #REQUIRED targetVariableName %FIELD-NAME; #REQUIRED modelName CDATA #IMPLIED > <!ELEMENT RegressionTable (NumericPredictor*), (CategoricalPredictor*))> <!ATTLIST RegressionTable intercept %REAL-NUMBER; #REQUIRED > <!ELEMENT NumericPredictor EMPTY> <!ATTLIST NumericPredictor name %FIELD-NAME; #REQUIRED exponent %INT-NUMBER; #REQUIRED coefficient %REAL-NUMBER; #REQUIRED mean %REAL-NUMBER; #IMPLIED > <!ELEMENT CategoricalPredictor EMPTY> <!ATTLIST CategoricalPredictor name %FIELD-NAME; #REQUIRED value CDATA #REQUIRED coefficient %REAL-NUMBER; #REQUIRED > |
RegressionModel : The root element of an XML regression model. Each instance of a regression model must start with this element.
modelName : This is a unique identifier specifying the name of the regression model.
modelType : Specifies the type of a regression model. This information is used to select the appropriate mathematical formulas during the scoring phase. The supported regression algorithms are listed.
targetVariableName : The name of the target variable (also called response variable).
RegressionTable : A table that lists the values of all predictors or independent variables.
NumericPredictor : Defines a numeric independent variable. The list of valid attributes comprises the name of the variable, the exponent to be used, and the coefficient by which the values of this variable must be multiplied. If the independent variable contains missing values, the mean attribute is used to replace the missing values with the mean value.
CategoricalPredictor : Defines a categorical independent variable. The list of attributes comprises the name of the variable, the value attribute, and the coefficient by which the values of this variable must be multiplied. To do a regression analysis with categorical values, some means must be applied to enable calculations. If the specified value of an independent value occurs, the term variable_name(value) is replaced with 1. Thus the coefficient is multiplied by 1. If the value does not occur, the term variable_name(value) is replaced with 0 so that the product
coefficient � variable_name(value)
yields 0. Consequently, the product is ignored in the ongoing analysis.
Example:
The following regression formula is used to predict the number of insurance claims:
number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 car location( carpark ) + 325.03 car location( street )
If the value carpark was specified for car location in a particular record, you would get the following formula:
number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 � 1 + 325.03 � 0
Linear Regression Sample
This is a linear regression equation predicting a number of insurance claims on prior knowledge of the values of the independent variables age, salary and car location. Car location is the only categorical variable. Its value attribute can take on two possible values, carpark and street.
number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 car location( carpark ) + 325.03 car location( street )
The corresponding XML model is:
<RegressionModel modelName="Sample for linear regression" modelType="linearRegression" targetVariableName=" number of claims"> <RegressionTable intercept="132.37"> <NumericPredictor name="age" exponent="1" coefficient="7.1"/> <NumericPredictor name="salary" exponent="1" coefficient="0.01"/> <CategoricalPredictor name="car location" value="carpark" coefficient="41.1"/> <CategoricalPredictor name="car location" value="street" coefficient="325.03"/> </RegressionTable> </RegressionModel> |
Stepwise Polynomial Regression Sample
This is a stepwise polynomial regression equation predicting a number of insurance claims on prior knowledge of the values of the independent variables salary and car location. Car location is a categorical variable. Its value attribute can take on two possible values, carpark and street.
number of claims = 3216.38 - 0.08 salary + 9.54E-7 salary**2 - 2.67E-12 salary**3 + 93.78 car location( carpark ) + 288.75 car location( street )
The corresponding XML model is:
<RegressionModel modelName="Sample for stepwise polynomial regression" modelType="stepwisePolynomialRegression" targetVariableName="number of claims"> <RegressionTable intercept="3216.38"> <NumericPredictor name="salary" exponent="1" coefficient="-0.08"/> <NumericPredictor name="salary" exponent="2" coefficient="9.54E-7"/> <NumericPredictor name="salary" exponent="3" coefficient="-2.67E-12"/> <CategoricalPredictor name="car location" value="carpark" coefficient="93.78"/> <CategoricalPredictor name="car location" value="street" coefficient="288.75"/> </RegressionTable> </RegressionModel> |