PMML Polynomial Regression
PMML1.1 Menu

Home


PMML Notice and License

PMML Conformance

Header

Data Dictionary

Mining Schema

Statistics

Normalization

Tree Model

General Regression

General Structure

Association Rules

Neural Network

Center and Distribution - based Clustering

PMML 1.1 DTD

Download PMML v1.1 (zip)

PMML 1.1 -- Polynomial Regression

Model DTD and Tag Description

The regression functions are used to determine the relationship between the dependent variable (target variable) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on.

The regression formula is:

    Dependent variable = intercept + Sumi (coefficient i * independent variablei ) + error

    			
    	<!-- regression model.  -->
    	
    	<!ELEMENT RegressionModel (Extension*, RegressionTable) >
    	
    	<!ATTLIST RegressionModel
    	    modelType (linearRegression| stepwisePolynomialRegression)    #REQUIRED
    	    targetVariableName         %FIELD-NAME;                       #REQUIRED
    	    modelName                  CDATA                              #IMPLIED
    	>
    		
    	<!ELEMENT RegressionTable (NumericPredictor*), (CategoricalPredictor*))>
    		
    	<!ATTLIST RegressionTable
    	     intercept          %REAL-NUMBER;                #REQUIRED
    	>
    		
    	<!ELEMENT NumericPredictor EMPTY>
                    
    	<!ATTLIST NumericPredictor
    	     name              %FIELD-NAME;                  #REQUIRED
                 exponent          %INT-NUMBER;                  #REQUIRED
                 coefficient       %REAL-NUMBER;                 #REQUIRED
    	     mean              %REAL-NUMBER;                 #IMPLIED
    	>
    		
    	<!ELEMENT CategoricalPredictor EMPTY>
    	
    	<!ATTLIST CategoricalPredictor
                 name              %FIELD-NAME;                  #REQUIRED
    	     value             CDATA                         #REQUIRED
    	     coefficient       %REAL-NUMBER;                 #REQUIRED
    	>
    	
    

    RegressionModel : The root element of an XML regression model. Each instance of a regression model must start with this element.

      modelName : This is a unique identifier specifying the name of the regression model.

      modelType : Specifies the type of a regression model. This information is used to select the appropriate mathematical formulas during the scoring phase. The supported regression algorithms are listed.

      targetVariableName : The name of the target variable (also called response variable).

    RegressionTable : A table that lists the values of all predictors or independent variables.

    NumericPredictor : Defines a numeric independent variable. The list of valid attributes comprises the name of the variable, the exponent to be used, and the coefficient by which the values of this variable must be multiplied. If the independent variable contains missing values, the mean attribute is used to replace the missing values with the mean value.

    CategoricalPredictor : Defines a categorical independent variable. The list of attributes comprises the name of the variable, the value attribute, and the coefficient by which the values of this variable must be multiplied. To do a regression analysis with categorical values, some means must be applied to enable calculations. If the specified value of an independent value occurs, the term variable_name(value) is replaced with 1. Thus the coefficient is multiplied by 1. If the value does not occur, the term variable_name(value) is replaced with 0 so that the product

    coefficient � variable_name(value)

      yields 0. Consequently, the product is ignored in the ongoing analysis.

    Example:

    The following regression formula is used to predict the number of insurance claims:

      number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 car location( carpark ) + 325.03 car location( street )

    If the value carpark was specified for car location in a particular record, you would get the following formula:

      number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 � 1 + 325.03 � 0


    Linear Regression Sample

    This is a linear regression equation predicting a number of insurance claims on prior knowledge of the values of the independent variables age, salary and car location. Car location is the only categorical variable. Its value attribute can take on two possible values, carpark and street.

      number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 car location( carpark ) + 325.03 car location( street )

    The corresponding XML model is:

    				
    	<RegressionModel modelName="Sample for linear regression" modelType="linearRegression" targetVariableName=" number of claims">
    				
    		<RegressionTable intercept="132.37">
    			     <NumericPredictor name="age"          exponent="1"    coefficient="7.1"/>
            		     <NumericPredictor name="salary"       exponent="1"    coefficient="0.01"/>
    			<CategoricalPredictor  name="car location" value="carpark" coefficient="41.1"/>
    			<CategoricalPredictor  name="car location" value="street"  coefficient="325.03"/>
    		</RegressionTable>
    	
    	</RegressionModel>
    	
    

    Stepwise Polynomial Regression Sample

    This is a stepwise polynomial regression equation predicting a number of insurance claims on prior knowledge of the values of the independent variables salary and car location. Car location is a categorical variable. Its value attribute can take on two possible values, carpark and street.

      number of claims = 3216.38 - 0.08 salary + 9.54E-7 salary**2 - 2.67E-12 salary**3 + 93.78 car location( carpark ) + 288.75 car location( street )

    The corresponding XML model is:

    
    	<RegressionModel modelName="Sample for stepwise polynomial regression" modelType="stepwisePolynomialRegression" targetVariableName="number of claims">
    				
    		<RegressionTable intercept="3216.38">
    		  	<NumericPredictor     name="salary"       exponent="1"    coefficient="-0.08"/>
    			<NumericPredictor     name="salary"       exponent="2"    coefficient="9.54E-7"/>
    			<NumericPredictor     name="salary"       exponent="3"    coefficient="-2.67E-12"/>
    			<CategoricalPredictor name="car location" value="carpark" coefficient="93.78"/>
    			<CategoricalPredictor name="car location" value="street"  coefficient="288.75"/>
    		</RegressionTable>
    			
    	</RegressionModel>
    
    
e-mail info at dmg.org