Taxonomies and Hierarchies
PMML2.1 Menu

Home


PMML Notice and License

General Structure

Header

Data
Dictionary


Mining
Schema


Transformations

Statistics

Conformance

Taxomony

Trees

Regression

General
Regression


Cluster
Models


Association Rules

Neural
Network


Naive
Bayes


Sequences

PMML 2.1 -- Taxonomies and Hierarchies

The values of a categorical field can be organized in a hierarchy. Well-known examples are product groups and geographical hierarchies such as City, Region, State, Country. Hierarchies are also known as taxonomies or categorization graphs.

The representation of hierarchies in PMML is based on parent/child relationships. A tabular format is used to provide the data for these relationships.

A taxonomy is constructed from a sequence of one or more parent/child tables. The actual values can be stored in external tables. Such a table is referenced by a TableLocator which implements a kind of URL for tables. The tabular data can also be part of the PMML document itself. In that case the element InlineTable is used instead of TableLocator. Below we give examples for these table references. However, the actual definitions of the content are outside the scope of PMML. Other standards that deal with representation of database tables in XML will be available rather soon. The current definitions in PMML are intended to be a framework which will be more specialized in the future.


 <xs:element name='Taxonomy'>
  <xs:complexType>
   <xs:sequence>
    <xs:element ref='ChildParent' maxOccurs='unbounded'/>
   </xs:sequence>
   <xs:attribute name='name' type='xs:string' use='required'/>
  </xs:complexType>
 </xs:element>

 <xs:element name='ChildParent'>
  <xs:complexType>
   <xs:choice>
    <xs:element ref='TableLocator'/>
    <xs:element ref='InlineTable'/>
   </xs:choice>
   <xs:attribute name='childField' type='xs:string' use='required'/>
   <xs:attribute name='parentField' type='xs:string' use='required'/>
   <xs:attribute name='parentLevelField' type='xs:string' use='optional'/>
   <xs:attribute name='isRecursive' use='optional' default='no'>
    <xs:simpleType>
     <xs:restriction base='xs:string'>
      <xs:enumeration value='no'/>
      <xs:enumeration value='yes'/>
     </xs:restriction>
    </xs:simpleType>
   </xs:attribute>
  </xs:complexType>
 </xs:element>

 <xs:element name='TableLocator'>
  <xs:complexType>
   <xs:sequence>
    <xs:element ref='Extension' minOccurs='0' maxOccurs='unbounded'/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>

 <xs:element name='InlineTable'>
  <xs:complexType>
   <xs:sequence>
    <xs:element ref='Extension' minOccurs='0' maxOccurs='unbounded'/>
    <xs:element ref='row' minOccurs='0' maxOccurs='unbounded'/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>


 <xs:element name="row" type="xs:anyType"/>


Attribute Description

    childField: Defines the name of the field which contains the child value per record.

    parentField: Defines the name of the field which contains the parent value per record.

    parentLevelField: Defines the name of the field which contains the level number for the parent. The childrens at the lowest level are usually called level 0 members. That is, the level numbers for parents start at 1. The

    parentLevelField is optional because the levels can be derived from the child/parent data itself.

    isRecursive: A recursive table can define a complete taxonomy in one table. That is, a value in the parent field can also be used in a child field.

A TableLocator may contain any description which helps an application to locate a certain table. PMML 2.1 does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.


Example:

A hierarchy with ZIP codes, cities, and states could be defined by:


<Taxonomy name="ZIP-City-State">

<ChildParent childColumn="ZIP code" parentColumn="City">
     <TableLocator x-dbname="myDB" x-tableName="ZIP_City" />
</ChildParent>

<ChildParent childColumn="cities" parentColumn="states">
     <TableLocator x-dbname="myDB" x-tableName="Cities_States" />
</ChildParent>

<ChildParent childColumn="State" parentColumn="Country">
     <TableLocator x-dbname="myDB" x-tableName="AllStates" />
</ChildParent>

</Taxonomy>

The actual data would be in tables with names as given by the locators.

ZIP_City | ZIP code    | City
===================================
         | CA 95126    | San Jose
         | CA 95020    | Gilroy
         | CA 90806    | Long Beach
         | IL 60463    | Oak Lawn
         | MA 02149    | Everett


Cities_States | cities     | states
===================================
              | San Jose   | CA
              | Gilroy     | CA
              | Long Beach | CA
              | Oak Lawn   | IL
              | Everett    | MA


AllStates | State | Country
===========================
          | CA    | USA
          | IL    | USA
          | MA    | USA

Note that there is no need to store the child/parent data in three different tables. We could also extend the table ZIP_City to include the column for States. This would come close to the typical address format with 'City, State ZIP'.

A hierarchy can represented with two tables as follows: One table for level 0 items and their parents and another tables for upper level elements/categories and their parents. The latter table would be recursive in the sense the a value in the parent column can also appear in the child column. The additional attribute 'isRecursive' is intended to flag a table that defines the parent/child relationships up to the root level. If a recursive table appears in the content of the element Taxonomy then it must be the last table in the sequence.


TableLocator and InlineTable

The elements TableLocator and InlineTable are not yet completely defined by PMML. However, the rows in a InlineTable should be written according to the default represenation of tables and rows in SQL/XML. A row can contain a sequence of elements. Each element tag corresponds to a field name. The content of an element defines the field value. See the following example.


<Taxonomy name="ZIP-City-State">

<ChildParent childColumn="ZIP code" parentColumn="City">
     <TableLocator x-dbname="myDB" x-tableName="ZIP_City" />
</ChildParent>

<ChildParent childColumn="member" parentColumn="group" isRecursive="yes" >
     <InlineTable>
         <row><member>San Jose</member><group>CA</group></row>
         <row><member>Gilroy</member><group>CA</group></row>
         <row><member>Long Beach</member><group>CA</group></row>
         <row><member>Oak Lawn</member><group>IL</group></row>
         <row><member>Everett</member><group>MA</group></row>
         <row><member>CA</member><group>USA</group></row>
         <row><member>IL</member><group>USA</group></row>
         <row><member>MA</member><group>USA</group></row>
     </InlineTable>
</ChildParent>

</Taxonomy>
e-mail info at dmg.org