DMG logo PMML 4.4 - Taxonomies and Hierarchies
PMML4.4 Menu

Home

Changes

XML Schema

Conformance

Interoperability

General Structure

Field Scope

Header

Data
Dictionary


Mining
Schema


Transformations

Statistics

Taxomony

Targets

Output

Functions

Built-in Functions

Model Verification

Model Explanation

Multiple Models

Anomaly Detection
Models


Association Rules

Baseline Models

Bayesian Network

Cluster
Models


Gaussian
Process


General
Regression


k-Nearest
Neighbors


Naive
Bayes


Neural
Network


Regression

Ruleset

Scorecard

Sequences

Text Models

Time Series

Trees

Vector Machine

PMML 4.4 - Taxonomies and Hierarchies

The values of a categorical field can be organized in a hierarchy. Well-known examples are product groups and geographical hierarchies such as City, Region, State, Country. Hierarchies are also known as taxonomies or categorization graphs.

The representation of hierarchies in PMML is based on parent/child relationships. A tabular format is used to provide the data for these relationships.

A taxonomy is constructed from a sequence of one or more parent/child tables. The actual values can be stored in external tables. Such a table is referenced by a TableLocator which implements a kind of URL for tables. The tabular data can also be part of the PMML document itself. In that case the element InlineTable is used instead of TableLocator. Below we give examples for these table references. However, the actual definitions of the content are outside the scope of PMML. Other standards that deal with representation of database tables in XML will be available rather soon. The current definitions in PMML are intended to be a framework which will be more specialized in the future.

<xs:element name="Taxonomy">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="ChildParent" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="name" type="xs:string" use="required"/>
  </xs:complexType>
</xs:element>

<xs:element name="ChildParent">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element minOccurs="0" maxOccurs="unbounded" ref="FieldColumnPair"/>
      <xs:choice>
        <xs:element ref="TableLocator"/>
        <xs:element ref="InlineTable"/>
      </xs:choice>
    </xs:sequence>
    <xs:attribute name="childField" type="xs:string" use="required"/>
    <xs:attribute name="parentField" type="xs:string" use="required"/>
    <xs:attribute name="parentLevelField" type="xs:string" use="optional"/>
    <xs:attribute name="isRecursive" use="optional" default="no">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="no"/>
          <xs:enumeration value="yes"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

<xs:element name="TableLocator">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="InlineTable">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="row" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="row">
  <xs:complexType>
    <xs:complexContent mixed="true">
      <xs:restriction base="xs:anyType">
        <xs:sequence>
          <xs:any processContents="skip" minOccurs="2" maxOccurs="unbounded"/>
        </xs:sequence> 
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Attribute Description

childField: Defines the name of the field which contains the child value per record.

parentField: Defines the name of the field which contains the parent value per record.

parentLevelField: Defines the name of the field which contains the level number for the parent. The children at the lowest level are usually called level 0 members. That is, the level numbers for parents start at 1. The

parentLevelField is optional because the levels can be derived from the child/parent data itself.

isRecursive: A recursive table can define a complete taxonomy in one table. That is, a value in the parent field can also be used in a child field.

A TableLocator may contain any description which helps an application to locate a certain table. PMML standard does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.

Example:

A hierarchy with ZIP codes, cities, and states could be defined by:

<Taxonomy name="ZIP-City-State">

  <ChildParent childField="ZIP code" parentField="City">
    <TableLocator>
      <Extension name="dbname" value="myDB"/>
    </TableLocator>
  </ChildParent>

  <ChildParent childField="cities" parentField="states">
    <TableLocator>
      <Extension name="dbname" value="myDB"/>
    </TableLocator>
  </ChildParent>

  <ChildParent childField="State" parentField="Country">
    <TableLocator>
      <Extension name="dbname" value="myDB"/>
    </TableLocator>
  </ChildParent>

</Taxonomy>

The actual data would be in tables with names as given by the locators.

ZIP_City | ZIP code    | City
===================================
         | CA 95126    | San Jose
         | CA 95020    | Gilroy
         | CA 90806    | Long Beach
         | IL 60463    | Oak Lawn
         | MA 02149    | Everett


Cities_States | cities     | states
===================================
              | San Jose   | CA
              | Gilroy     | CA
              | Long Beach | CA
              | Oak Lawn   | IL
              | Everett    | MA


AllStates | State | Country
===========================
          | CA    | USA
          | IL    | USA
          | MA    | USA

Note that there is no need to store the child/parent data in three different tables. We could also extend the table ZIP_City to include the column for States. This would come close to the typical address format with 'City, State ZIP'.

A hierarchy can be represented with two tables as follows: One table for level 0 items and their parents and another tables for upper level elements/categories and their parents. The latter table would be recursive in the sense the a value in the parent column can also appear in the child column. The additional attribute 'isRecursive' is intended to flag a table that defines the parent/child relationships up to the root level. If a recursive table appears in the content of the element Taxonomy then it must be the last table in the sequence.

TableLocator and InlineTable

The elements TableLocator and InlineTable are not yet completely defined by PMML. However, the rows in a InlineTable should be written according to the default representation of tables and rows in SQL/XML. A row can contain a sequence of elements. Each element tag corresponds to a field name. The content of an element defines the field value. See the following example.

<Taxonomy name="ZIP-City-State">
  <ChildParent childField="ZIP code" parentField="City">
    <TableLocator>
      <Extension name="dbname" value="myDB"/>
      <Extension name="tableName" value="ZIP_City"/>
    </TableLocator>
  </ChildParent>
  <ChildParent childField="member" parentField="group" isRecursive="yes">
    <InlineTable>
      <row><member>San Jose</member><group>CA</group></row>
      <row><member>Gilroy</member><group>CA</group></row>
      <row><member>Long Beach</member><group>CA</group></row>
      <row><member>Oak Lawn</member><group>IL</group></row>
      <row><member>Everett</member><group>MA</group></row>
      <row><member>CA</member><group>USA</group></row>
      <row><member>IL</member><group>USA</group></row>
      <row><member>MA</member><group>USA</group></row>
    </InlineTable>
  </ChildParent>
</Taxonomy>
e-mail info at dmg.org