PMML 4.4.1 - Statistics

This schema for statistics provides a basic framework for representing variable statistics. It is used by schemas for specific models as ModelStats.

<xs:element name="ModelStats">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="UnivariateStats" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="MultivariateStats" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

The statistics for a model can consist of the statistics for fields in isolation from other fields, UnivariateStats, and statistics for fields in the presence of other fields, MultivariateStats and ANOVA.

Univariate Statistics

<xs:element name="UnivariateStats">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Counts" minOccurs="0"/>
      <xs:element ref="NumericInfo" minOccurs="0"/>
      <xs:element ref="DiscrStats" minOccurs="0"/>
      <xs:element ref="ContStats" minOccurs="0"/>
      <xs:element ref="Anova" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="field" type="FIELD-NAME"/>
    <xs:attribute name="weighted" default="0">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="0"/>
          <xs:enumeration value="1"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute> 
  </xs:complexType>
</xs:element>

An UnivariateStats element contains statistical information about a single MiningField or a DerivedField and should reflect the data used to create the model (before any missing/outlier/invalid value handling is performed by the MiningSchema). UnivariateStats elements in ModelStats must have a field name they refer to in attribute field. In other cases field may be missing. The attribute weighted indicates whether the counts for the field are computed using the weight variable. By default the weight is not used. Discrete and continuous statistics are possible simultaneously for numeric fields. This may be important if a numeric field has too many discrete values. The statistics can include the most frequent values and also a complete histogram distribution.

Statistics for ordinal fields are contained in DiscrStats.

<xs:element name="Counts">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="totalFreq" type="NUMBER" use="required"/>
    <xs:attribute name="missingFreq" type="NUMBER"/>
    <xs:attribute name="invalidFreq" type="NUMBER"/>
    <xs:attribute name="cardinality" type="xs:nonNegativeInteger"/>
  </xs:complexType>
</xs:element>

The element Counts carries counters for frequency of values with respect to their state of being missing, invalid, or valid. The counts can be non-integer if they are weighted.

totalFreq counts all records, same as for statistics of all MiningFields.
missingFreq counts the number of records where value is missing.
invalidFreq counts the number of records with values other than valid. The total frequency includes the missing values and invalid values.
The attribute cardinality is the number of unique, or distinct, values that the variable has.

<xs:element name="NumericInfo">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Quantile" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="minimum" type="NUMBER"/>
    <xs:attribute name="maximum" type="NUMBER"/>
    <xs:attribute name="mean" type="NUMBER"/>
    <xs:attribute name="standardDeviation" type="NUMBER"/>
    <xs:attribute name="median" type="NUMBER"/>
    <xs:attribute name="interQuartileRange" type="NUMBER"/>
  </xs:complexType>
</xs:element>

The values for mean, minimum, maximum and standardDeviation are defined as usual. median is calculated as the 50% quantile; interQuartileRange is calculated as (75% quantile - 25% quantile).

<xs:element name="Quantile">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="quantileLimit" type="PERCENTAGE-NUMBER" use="required"/>
    <xs:attribute name="quantileValue" type="NUMBER" use="required"/>
  </xs:complexType>
</xs:element>

quantileLimit is a percentage number between 0 and 100. quantileValue is the corresponding value in the domain of field values.

Discrete Statistics

<xs:element name="DiscrStats">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Array" minOccurs="0" maxOccurs="2"/>
    </xs:sequence>
    <xs:attribute name="modalValue" type="xs:string"/>
  </xs:complexType>
</xs:element>

modalValue is the most frequent discrete value. If only a single array is present, it must be of type NUM-ARRAY and contain a compact representation of all frequency numbers. If there is an additional STRING-ARRAY then the frequency numbers in the NUM-ARRAY correspond to the string values one by one. Otherwise the frequency numbers correspond to the list of (valid) values as given in the DataDictionary. If the statistics refer to a DerivedField, then all values are listed in the STRING-ARRAY because a DerivedField has no means to specify a list of valid values. If DiscrStats contains an array of string values and the corresponding DataField defines a list of valid values, then the set of values listed in the array in DiscrStats must be a subset of the set of valid values specified in the DataField.

Continuous Statistics

<xs:element name="ContStats">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Interval" minOccurs="0" maxOccurs="unbounded"/>
      <xs:group ref="FrequenciesType" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="totalValuesSum" type="NUMBER"/>
    <xs:attribute name="totalSquaresSum" type="NUMBER"/>
  </xs:complexType>
</xs:element>

<xs:group name="FrequenciesType">
  <xs:sequence>
    <xs:group ref="NUM-ARRAY" minOccurs="1" maxOccurs="3"/>
  </xs:sequence>
</xs:group>

The three ARRAYs contain the frequencies, sum of values, and sum of squared values for each interval.

Note: Interval is defined in the schema for DataDictionary.

Example

Here is an example excerpt which gives univariate statistics for the continuous field Age and the categorical field Sex:

...
<DataDictionary>
  <DataField name="Sex" optype="categorical" dataType="string">
    <Value value="female"/>
    <Value value="male"/>
  </DataField>
  <DataField name="Age" optype="continuous" dataType="double"/>
  ...
</DataDictionary>
...
  <ModelStats>
    <UnivariateStats field="Age">
      <Counts totalFreq="240"/>
      <NumericInfo mean="54.43" minimum="29" maximum="77"/>
      <ContStats>
        <Interval closure="openClosed" leftMargin="29" rightMargin="33.8"/>
        <Interval closure="openClosed" leftMargin="33.8" rightMargin="38.6"/>
        <Interval closure="openClosed" leftMargin="38.6" rightMargin="43.4"/>
        <Interval closure="openClosed" leftMargin="43.4" rightMargin="48.2"/>
        <Interval closure="openClosed" leftMargin="48.2" rightMargin="53"/>
        <Interval closure="openClosed" leftMargin="53" rightMargin="57.8"/>
        <Interval closure="openClosed" leftMargin="57.8" rightMargin="62.6"/>
        <Interval closure="openClosed" leftMargin="62.6" rightMargin="67.4"/>
        <Interval closure="openClosed" leftMargin="67.4" rightMargin="72.2"/>
        <Interval closure="openClosed" leftMargin="72.2" rightMargin="77"/>
        <Array type="int" n="10"> 1 8 28 30 30 43 51 33 13 3</Array>
      </ContStats>
    </UnivariateStats>
    <UnivariateStats field="Sex">
      <Counts totalFreq="240"/>
      <DiscrStats>
        <Array type="int" n="2"> 166 74</Array>
      </DiscrStats>
    </UnivariateStats>
    ...
  </ModelStats>
  ...

The array in DiscrStats for field Sex refers to the respective Value elements in the DataDictionary. Out of the total 240 instances, 166 are female while 74 are male.
Field Age has a mean of 54.43 years, with a minimum of 29 and a maximum of 77. The ContStats gives more details about the actual distribution. For instance, in 43 records the value of Age is larger than 53 years and less than or equal to 57.8.

Multivariate Statistics

<xs:element name="MultivariateStats">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="MultivariateStat" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="targetCategory" type="xs:string" use="optional"/>
  </xs:complexType>
</xs:element>

<xs:element name="MultivariateStat">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="name" type="xs:string"/>
    <xs:attribute name="category" type="xs:string"/>
    <xs:attribute name="exponent" type="INT-NUMBER" default="1"/>
    <xs:attribute name="isIntercept" type="xs:boolean" default="false"/>
    <xs:attribute name="importance" type="PROB-NUMBER"/>
    <xs:attribute name="stdError" type="NUMBER"/>
    <xs:attribute name="tValue" type="NUMBER"/>
    <xs:attribute name="chiSquareValue" type="NUMBER"/>
    <xs:attribute name="fStatistic" type="NUMBER"/>
    <xs:attribute name="dF" type="NUMBER"/>
    <xs:attribute name="pValueAlpha" type="PROB-NUMBER"/>
    <xs:attribute name="pValueInitial" type="PROB-NUMBER"/>
    <xs:attribute name="pValueFinal" type="PROB-NUMBER"/>
    <xs:attribute name="confidenceLevel" type="PROB-NUMBER" default="0.95"/>
    <xs:attribute name="confidenceLowerBound" type="NUMBER"/>
    <xs:attribute name="confidenceUpperBound" type="NUMBER"/>
  </xs:complexType>
</xs:element>

A MultivariateStats element is a container for MultivariateStat elements, each of which contains statistical information about a single MiningField, DerivedField or regression parameter or regression intercept. These statistics differ from those in UnivariateStats since they reflect the influence of the dependent variable and potentially other variables present in the model. The targetCategory attribute is used with classification models to specify the target category to which these statistics refer.

Each MultivariateStat's name attribute refers to either:

A MiningField from the MiningSchema of this model element (i.e., the MultivariateStats parent element)
A DerivedField defined within the context of this model element
A regression parameter for a regression model:
- When isIntercept attribute is true, the statistics are for the intercept. name, exponent and category are to be ignored.
- When isIntercept attribute is false (or omitted):
  - For RegressionModel parameters:
    - NumericPredictors are referenced by their name and the exponent attribute (or its default) is used to identify the appropriate NumericPredictor.
    - CategoricalPredictors are referenced by their name and the category attribute is required to identify the appropriate CategoricalPredictor.
    - PredictorTerms are referenced by their name.
  - For GeneralRegressionModel parameters:
    - PCells are referenced by their parameterName attribute and (optional) targetCategory attribute (which is referenced using the MultivariateStats element's targetCategory attribute).
  - Note: In the case that the MultivariateStat's name attribute refers to a regression parameter that has the same name as a MiningField or a DerivedField, by convention the statistics apply to the regression parameter.

importance has the same definition as the MiningField attribute and states the relative importance of the field, which in this case can be a MiningField or a DerivedField. This indicator is typically used in predictive models in order to rank fields by their predictive contribution. A value of 1.0 suggests that the target field is directly correlated to this field. A value of 0.0 suggests that the field is completely irrelevant.

stdError is the standard error (i.e., the estimated standard deviation) of the estimate of the individual coefficient.

tValue is for the T-distribution, used to assess the significance of individual coefficients. tValue = Coefficient/stdError

chiSquareValue can be used to test the true value of the parameter based on the sample estimate. It is computed for regression parameters as Coefficient*Coefficient/Variance.

fStatistic is used to determine whether the observed relationship between an independent variable and its dependent variable occurs by chance. It is usually computed for model effects.

dF is the Degrees of Freedom, dF = n - p - 1, where n is the number of records and p is the number of active variables.

For applications that perform automatic variable reduction, where independent variables whose p-Value (representing the probability that the observed relationship with its dependent variable occurs by chance) exceed a certain threshold (typically called Alpha) are automatically eliminated:

pValueAlpha is the variable elimination threshold.
pValueInitial indicates why the variable is or is not in the final model.
pValueFinal is the pValue with the variables present in the final model, after any variable reduction has taken place.

For applications that use confidence interval statistics to indicate the reliability of an estimate, where a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taken over all the data that might have been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the parameter the proportion of the time set by the confidence level:

confidenceLevel is the confidence level (a confidence level of 95% is typical)
confidenceLowerBound is the lower boundary of the confidence interval
confidenceUpperBound is the upper boundary of the confidence interval

Example

Here is an example excerpt which gives multivariate statistics for the independent variables of a regression model (note that Quarter_4th was eliminated by automatic variable reduction):

...
<ModelStats>
  ...
  <MultivariateStats>
    <MultivariateStat isIntercept="true" importance="0.98" stdError="0.3" tvalue="3.6" pValueAlpha="0.05" pValueInitial="0.01" pValueFinal="0.02" />
    <MultivariateStat name="Quarter_Idx" importance="0.96" stdError="2.2" tValue="4.7" pValueAlpha="0.05" pValueInitial="0.02" pValueFinal="0.04" />
    <MultivariateStat name="Quarter_2nd" importance="0.95" stdError="3.2" tValue="3.5" pValueAlpha="0.05" pValueInitial="0.03" pValueFinal="0.05" />
    <MultivariateStat name="Quarter_3rd" importance="0.94" stdError="2.1" tValue="7.3" pValueAlpha="0.05" pValueInitial="0.04" pValueFinal="0.06" />
    <MultivariateStat name="Quarter_4th" importance="0" stdError="2.3" tValue="4.4" pValueAlpha="0.05" pValueInitial="0.06" />
  </MultivariateStats>
  ...
</ModelStats>
...

ANOVA

PMML supports inclusion of ANOVA (Analysis of Variance) information that, while descriptive in nature, can help understand the relationship between certain independent variables and the target (dependent) variable. Specifically, the analysis is performed using one independent categorical variable, X, and a continuous target (dependent) variable, Y (usually found in types of regression and time series models).

<xs:element name="Anova">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="AnovaRow" minOccurs="3" maxOccurs="3"/>
    </xs:sequence>
    <xs:attribute name="target" type="FIELD-NAME"/>
  </xs:complexType>
</xs:element>

<xs:element name="AnovaRow">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="type" use="required">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="Model"/>
          <xs:enumeration value="Error"/>
          <xs:enumeration value="Total"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
    <xs:attribute name="sumOfSquares" type="NUMBER" use="required"/>
    <xs:attribute name="degreesOfFreedom" type="NUMBER" use="required"/>
    <xs:attribute name="meanOfSquares" type="NUMBER"/>
    <xs:attribute name="fValue" type="NUMBER"/>
    <xs:attribute name="pValue" type="PROB-NUMBER"/>
  </xs:complexType>
</xs:element>

There can be an Anova element for each target (dependent) variable defined in the model. The target attribute is used to specify the target variable of this ANOVA. If there is only one target variable, then the target attribute is optional. But if the model specifies more than one target, this field must indicate the name of target variable analyzed with respect to this variable (identified in the parent UnivariarteStats element's field attribute).

The fValue and pValue attributes are required when the AnovaRow is of type Model. They are not used for Error and Total AnovaRow types. The meanOfSquares is required when the AnovaRow type is Model or Error.

ANOVA is used to estimate the effect of X, an independent variable, on Y, the target (dependent) variable. It does this by testing the hypothesis that X has no effect on Y, called the null hypothesis. The null hypothesis is rejected if the probability is very low (less than the specified α) and not rejected otherwise. The probability is obtained using F-distribution.

Let's assume X contains k categories and there are a total of N records (or samples, rows, etc.). By analyzing the Y values amongst and within each category, we can test the null hypothesis and determine if X has an impact on Y. Sum-of-squares (SS) calculations are made which total the squared difference between all the Y values of a group and the average (or mean) of the Y values for that group. The sum-of-squares for Y can be partitioned into two groups: the sum-of-squares within each category and between the k categories. Thus,

SS_Total = SS_Model + SS_Error

Where SS_Total is the SS for all records, SS_Model is the SS between each category's mean and the total population's mean, and SS_Error is the SS between each record and the mean of that record's category.

Similarly, the degrees of freedom (DF) of the data can be partitioned into two groups as well:

DF_Total = DF_Model + DF_Error

Where DF_Total is N-1, DF_Model is k-1 and DF_Error is N - k

These calculations can then be used to compute the Mean of Squares (MS, from SS/DF), F-Value (from MS_Model/MS_Error) and its resulting p-Value. These results are summarized in an ANOVA table, represented as:

Total	Sum of Squares	Degrees of Freedom	Mean of SS	F-Value	p-Value
Model	SS_Model	DF_Model	MS_Model	MS_Model/MS_Error	p-Value
Error	SS_Error	DF_Error	MS_Error
Total	SS_Model + SS_Error	DF_Model + DF_Error

Conventionally in the context of ANOVA between a single independent variable and the target variable (also called one-factor ANOVA), the model and error variabilities are also referred to as between sample variability and within-sample variability, respectively.

ANOVA Example

This examples uses ANOVA to study two categorical independent variables, X1 and X2, and a continuous target (dependent) variable, Y. We want to know the effect of these two variables on Y.

X1 has three categories: AA, BB, CC. X2 has seven categories: a, b, c, d, e, f, g. The number of records analyzed, N, is 101. We'll use α = 0.20 for our hypothesis test.

For X1, we have the following ANOVA table. The p-value is greater than our α value of 0.20 and therefore indicates that the probability X1 has no effect on Y is significant. So, we accept the null hypothesis, which says that X1 has no effect on Y:

Total	Sum of Squares	Degrees of Freedom	Mean of SS	F-Value	p-Value
Model	21389708	2	10694854	0.77	0.66
Error	809210617	98	8257251
Total	830600325	100

For X2, we have another ANOVA table. The p-value is less than our α value of 0.20 and therefore indicates that the probability X2 has no effect on Y is insignificant. So, we reject the null hypothesis, and conclude that X2 does have an effect on Y:

Total	Sum of Squares	Degrees of Freedom	Mean of SS	F-Value	p-Value
Model	26811371	6	4468561	1.70	0.13
Error	247218517	94	2629984
Total	274029888	100

The PMML for both ANOVA tables in this example is shown below:

...
<ModelStats>
  <UnivariateStats field="X1">
    ...
    <Anova>
      <AnovaRow type="Model" sumOfSquares="21389708" degreesOfFreedom="2" meanOfSquares="7129903" fValue="0.85" pValue="0.47" />
      <AnovaRow type="Error" sumOfSquares="809210617" degreesOfFreedom="98" meanOfSquares="8342377"/>
      <AnovaRow type="Total" sumOfSquares="830600325" degreesOfFreedom="100" />
    </Anova>
  </UnivariateStats>
  <UnivariateStats field="X2">
    ...
    <Anova>
      <AnovaRow type="Model" sumOfSquares="26811371" degreesOfFreedom="6" meanOfSquares="4468561" fValue="1.699" pValue="0.13" />
      <AnovaRow type="Error" sumOfSquares="247218517" degreesOfFreedom="94" meanOfSquares="2629984"/>
      <AnovaRow type="Total" sumOfSquares="274029888" degreesOfFreedom="100" />
    </Anova>
  </UnivariateStats>
  ...
</ModelStats>
...

Partitions

A Partition contains statistics for a subset of records, for example it can describe the population in a cluster. The content of a Partition mirrors the definition of the general univariate statistics. That is, each Partition describes the distribution per field. For each field there can be information about frequencies, numeric moments, etc.

<xs:element name="Partition">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="PartitionFieldStats" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="name" type="xs:string" use="required"/>
    <xs:attribute name="size" type="NUMBER"/>
  </xs:complexType>
</xs:element>

The attribute name identifies the Partition. The attribute size is the number of records. All aggregates in PartitionFieldStats must have size = totalFrequency in Counts if specified.

<xs:element name="PartitionFieldStats">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Counts" minOccurs="0"/>
      <xs:element ref="NumericInfo" minOccurs="0"/>
      <xs:group ref="FrequenciesType" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="field" type="FIELD-NAME" use="required"/>
    <xs:attribute name="weighted" default="0">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="0"/>
          <xs:enumeration value="1"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

field references to (the name of) a MiningField for background statistics. The sequence of NUM-ARRAYs is the same as for ContStats. For categorical fields there is only one array containing the frequencies; for numeric fields, the second and third array contain the sums of values and the sums of squared values, respectively. The number of values in each array must match the number of categories or intervals in UnivariateStats of the field.

Example
Here is an example from a tree model:

...
<DataDictionary>
  <DataField name="Sex" optype="categorical" dataType="string">
    <Value value="female"/>
    <Value value="male"/>
  </DataField>
  <DataField name="Age" optype="continuous" dataType="double"/>
  ...
</DataDictionary>
...
  <ModelStats>
    <UnivariateStats field="Age" weighted="false">
      <Counts totalFreq="240"/>
      <NumericInfo mean="54.43" minimum="29" maximum="77"/>
      <ContStats>
        <Interval closure="openClosed" leftMargin="29" rightMargin="33.8"/>
        <Interval closure="openClosed" leftMargin="33.8" rightMargin="38.6"/>
        <Interval closure="openClosed" leftMargin="38.6" rightMargin="43.4"/>
        <Interval closure="openClosed" leftMargin="43.4" rightMargin="48.2"/>
        <Interval closure="openClosed" leftMargin="48.2" rightMargin="53"/>
        <Interval closure="openClosed" leftMargin="53" rightMargin="57.8"/>
        <Interval closure="openClosed" leftMargin="57.8" rightMargin="62.6"/>
        <Interval closure="openClosed" leftMargin="62.6" rightMargin="67.4"/>
        <Interval closure="openClosed" leftMargin="67.4" rightMargin="72.2"/>
        <Interval closure="openClosed" leftMargin="72.2" rightMargin="77"/>
        <Array type="int" n="10"> 1 8 28 30 30 43 51 33 13 3</Array>
      </ContStats>
    </UnivariateStats>
    ...
  ...
  <Node score="1" recordCount="134">
    <Partition name="1.1" size="134">
      <PartitionFieldStats field="Age" weighted="false">
        <Array type="int" n="10"> 1 5 17 24 17 19 23 17 8 3</Array>
      </PartitionFieldStats>
      <PartitionFieldStats field="Sex">
        <Array type="int" n="2"> 70 64</Array>
      </PartitionFieldStats>
    ...
  ...
  <Targets>
    <Target field="response" optype="categorical">
      <TargetValue value="Yes">
        <Partition name="Yes_classified" size="103">
          <PartitionFieldStats field="Age" weighted="false">
            <NumericInfo mean="56.6796116504854" minimum="35" maximum="77"/>
            <Array type="int" n="10"> 0 3 7 8 10 17 34 18 5 1</Array>
          </PartitionFieldStats>
          <PartitionFieldStats field="Sex">
            <Array type="int" n="2"> 88 15</Array>
          </PartitionFieldStats>
        </Partition>
      </TargetValue>
      ...
    </Target>
  </Targets>
  ...
...

There is a total of 134 records in the Node. This is already obvious from recordCount of the Node, but is reflected in the Partition element as well. However, there might be cases in which only the statistics for a sample might be given, in which the counts would be different. The Array inside PartitionFieldStats gives the distribution according to the intervals defined in UnivariatStats. For example, in 24 instances the Age of the records in the Node was larger than 43.4 years and less than or equal to 48.2. Likewise, in 70 cases the value of Sex was female, and male in 64.

The Partition element in Target tells that 103 records were classified Yes by the model. Again, more details on the distributions for fields Age and Sex are given in PartitionFieldStats: 88 instances had value female for field Sex, while 15 had the value male. There were no cases in which Age was less than or equal to 33.8 years, and only 1 in which it was larger than 72.2 and less than or equal to 77 years.

e-mail

info at dmg.org