|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PMML 4.4.1 - Time Series ModelsFor Time Series models, please see the Notice of Essential Claims. A Time Series is a sequence of data points, measured at points in time, usually, but not necessarily, forming equidistant intervals. Time series analysis strives to understand such time series, often with the goal of making forecasts (predictions) or of filling in missing values between known data points. Time series prediction is the use of a model to predict future events based on known past events before they are measured. Interpolation is the use of a model to complement or amend values between known data points. The model must contain information on the general trend, a description of periodic behavior and an overall fitting function that can be used for forecasting and/or interpolation. It may also contain detailed information on various aspects of the time series and the expected forecasting accuracy. In addition to the entries common to all models, a TimeSeriesModel contains results of at least one time series algorithm, for example SpectralAnalysis, ARIMA, ExponentialSmoothing or SeasonalTrendDecomposition. In PMML 4.4, only Exponential Smoothing, ARIMA, GARCH, and State Space models are defined, the other algorithms are planned for later versions. TimeSeries elements can hold original or predicted time series values. Please find the documentation for particular model types in the following sections: <xs:element name="TimeSeriesModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="ModelExplanation" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="TimeSeries" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="SpectralAnalysis" minOccurs="0"/> <xs:element ref="ARIMA" minOccurs="0"/> <xs:element ref="ExponentialSmoothing" minOccurs="0"/> <xs:element ref="SeasonalTrendDecomposition" minOccurs="0"/> <xs:element ref="StateSpaceModel" minOccurs="0"/> <xs:element ref="GARCH" minOccurs="0"/> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string" use="optional"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string" use="optional"/> <xs:attribute name="bestFit" type="TIMESERIES-ALGORITHM" use="required"/> <xs:attribute name="isScorable" type="xs:boolean" default="true"/> </xs:complexType> </xs:element> <xs:simpleType name="TIMESERIES-ALGORITHM"> <xs:restriction base="xs:string"> <xs:enumeration value="ARIMA"/> <xs:enumeration value="ExponentialSmoothing"/> <xs:enumeration value="SeasonalTrendDecomposition"/> <xs:enumeration value="SpectralAnalysis"/> <xs:enumeration value="StateSpaceModel"/> <xs:enumeration value="GARCH"/> </xs:restriction> </xs:simpleType> modelName and algorithmName can have arbitrary strings describing the specific model and algorithm. functionName must be timeSeries. The isScorable attribute indicates whether the model is valid for scoring. If this attribute is true or if it is missing, then the model should be processed normally. However, if the attribute is false, then the model producer has indicated that this model is intended for information purposes only and should not be used to generate results. In order to be valid PMML, all required elements and attributes must be present, even for non-scoring models. For more details, see General Structure. The attribute bestFit is required, and indicates which of the time series algorithms provided by the model results in the best-fitting model. This algorithm should be used when scoring the model.<xs:element name="TimeSeries"> <xs:complexType> <xs:sequence> <xs:element ref="TimeAnchor" minOccurs="0" maxOccurs="1"/> <xs:element ref="TimeValue" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="usage" type="TIMESERIES-USAGE" default="original"/> <xs:attribute name="startTime" type="REAL-NUMBER"/> <xs:attribute name="endTime" type="REAL-NUMBER"/> <xs:attribute name="interpolationMethod" type="INTERPOLATION-METHOD" default="none"/> <xs:attribute name="field" type="FIELD-NAME"/> </xs:complexType> </xs:element> <xs:simpleType name="TIMESERIES-USAGE"> <xs:restriction base="xs:string"> <xs:enumeration value="original"/> <xs:enumeration value="logical"/> <xs:enumeration value="prediction"/> </xs:restriction> </xs:simpleType> <xs:element name="TimeValue"> <xs:complexType> <xs:sequence> <xs:element ref="Timestamp" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="index" type="INT-NUMBER" use="optional"/> <xs:attribute name="time" type="NUMBER" use="optional"/> <xs:attribute name="value" type="REAL-NUMBER" use="required"/> <xs:attribute name="standardError" type="REAL-NUMBER" use="optional"/> </xs:complexType> </xs:element> <xs:element name="TimeAnchor"> <xs:complexType> <xs:sequence> <xs:element ref="TimeCycle" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="TimeException" minOccurs="0" maxOccurs="2"/> </xs:sequence> <xs:attribute name="type" type="TIME-ANCHOR"/> <xs:attribute name="offset" type="INT-NUMBER"/> <xs:attribute name="stepsize" type="INT-NUMBER"/> <xs:attribute name="displayName" type="xs:string" use="optional"/> </xs:complexType> </xs:element> <xs:element name="TimeCycle"> <xs:complexType> <xs:sequence> <xs:group ref="INT-ARRAY" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="length" type="INT-NUMBER"/> <xs:attribute name="type" type="VALID-TIME-SPEC"/> <xs:attribute name="displayName" type="xs:string" use="optional"/> </xs:complexType> </xs:element> <xs:simpleType name="TIME-ANCHOR"> <xs:restriction base="xs:string"> <xs:enumeration value="dateTimeMillisecondsSince[0]"/> <xs:enumeration value="dateTimeMillisecondsSince[1960]"/> <xs:enumeration value="dateTimeMillisecondsSince[1970]"/> <xs:enumeration value="dateTimeMillisecondsSince[1980]"/> <xs:enumeration value="dateTimeSecondsSince[0]"/> <xs:enumeration value="dateTimeSecondsSince[1960]"/> <xs:enumeration value="dateTimeSecondsSince[1970]"/> <xs:enumeration value="dateTimeSecondsSince[1980]"/> <xs:enumeration value="dateDaysSince[0]"/> <xs:enumeration value="dateDaysSince[1960]"/> <xs:enumeration value="dateDaysSince[1970]"/> <xs:enumeration value="dateDaysSince[1980]"/> <xs:enumeration value="dateMonthsSince[0]"/> <xs:enumeration value="dateMonthsSince[1960]"/> <xs:enumeration value="dateMonthsSince[1970]"/> <xs:enumeration value="dateMonthsSince[1980]"/> <xs:enumeration value="dateYearsSince[0]"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="VALID-TIME-SPEC"> <xs:restriction base="xs:string"> <xs:enumeration value="includeAll"/> <xs:enumeration value="includeFromTo"/> <xs:enumeration value="excludeFromTo"/> <xs:enumeration value="includeSet"/> <xs:enumeration value="excludeSet"/> </xs:restriction> </xs:simpleType> <xs:element name="TimeException"> <xs:complexType> <xs:sequence> <xs:group ref="INT-ARRAY" minOccurs="1"/> </xs:sequence> <xs:attribute name="type" type="TIME-EXCEPTION-TYPE"/> <xs:attribute name="count" type="INT-NUMBER"/> </xs:complexType> </xs:element> <xs:simpleType name="TIME-EXCEPTION-TYPE"> <xs:restriction base="xs:string"> <xs:enumeration value="exclude"/> <xs:enumeration value="include"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="INTERPOLATION-METHOD"> <xs:restriction base="xs:string"> <xs:enumeration value="none"/> <xs:enumeration value="linear"/> <xs:enumeration value="exponentialSpline"/> <xs:enumeration value="cubicSpline"/> </xs:restriction> </xs:simpleType> The element TimeSeries contains a time series consisting of one or more TimeValue objects. The time series can be an original time series as read from the input data; in this case the attribute interpolationMethod is 'none'. Or it can be a pre-processed and interpolated time series; pre-processing and interpolation may be necessary to produce a logical time series, for most time series algorithms require a sequence of logically equidistant time steps. If a logical time series is present, it was used as input into the algorithm. Finally, the time series (usage = 'prediction') may hold values predicted by the best-fitting model. StartTime and endTime refer to points in an input time series
between which the points were used for fitting. They can be integers
indicating the index into a logical time series or real numbers indicating
original points in time. These values are required when the usage type is
original or logical. The attribute field provides the field name for the time series variable and is required if there are several time series fields in the context. TimeValue contains one single point of a time series. The point can
either be a known point from the past; in this case, only the attribute
value is required. In addition, time or index must be
used. In case of a logical TimeSeries, index values must be present. Or the
time point is a predicted future value; in this case, the attribute
standardError can contain the incertitude (predicted standard error)
of the prediction standard based on the empirically determined
error. TimeAnchor optionally defines the relationship between time points in a time series and a calendar. It is not used for computing predictions, but it may be used by applications or visualization tools that want to come up with predictions based on points of time in a calendar as opposed to just a look-ahead index. Time is anchored at an offset with respect to a specified calendar point given by type. And the flow of time is defined in smallest steps of size stepsize. Both offset and stepsize are (long) integer values in the units specified in type. An optional displayName, e.g. "day" can be provided as a name for the time step. TimeCycle allows to express the situation where time steps are not
contiguous on a calendar. As an example, consider hourly revenue data of a
store that opens Mo-Sat from 7am to 9pm. One has to represent hours as being
the step size of the data, but one also wants to be able to specify that
Sundays and night-shifts should be disregarded, i.e. for the time series
prediction, the value for Monday 8am (aggregated revenue between 7am and 8am)
immediately follows that of Saturday 9pm. <TimeAnchor type="dateTimeSecondsSince[1960]" offset="1530543600" stepsize="3600" displayName="hour"> <TimeCycle length="24" type="includeFromTo" displayName="day"> <Array type="int" n="2">7 20</Array> </TimeCycle> <TimeCycle length="7" type="excludeSet" displayName="week"> <Array type="int" n="1">6</Array> </TimeCycle> </TimeAnchor> Calendar entries can now be described as a sequence of values. The 15th hour of the 6th day in the 30th week since the time anchor (1530543600 seconds after the beginning of 1960) would become <29, 5, 14>. In addition to the regular behavior, there may be exceptions to the TimeStep specification. The store may, for example, be closed on July 4th, but exceptionally open late because of an event on some other day. This is captured by up to two TimeExceptions, which contain lists of unsystematic exclusions or inclusions as arrays of index values. All index values of a certain TimeCycle can be specified by using the length value instead of a valid index; -1 is used for the regular indexes.. The following TimeExceptions specify additional shop closure and opening hours. <TimeExceptions type="exclude" count="2"> <!-- closed in the 5th week on the 6th day at the 8th hour --> <Array type="int">4 5 7</Array> <!-- closed in the 33rd week, throughout the 1st day --> <Array type="int">32 0 24</Array> </TimeExceptions%gt; <TimeExceptions type="include" count="2"> <!-- open in the 1st week on the 7th day at regular hours --> <Array type="int">0 6 -1</Array> <!-- open in the 34th week on the 6th day at the 20th hour --> <Array type="int">33 5 19</Array> </TimeExceptions> Exponential SmoothingExponentialSmoothing contains an exponential smoothing model for the time series. It is one out of the 15 possible model type combinations (no trend N, additive trend A, damped additive trend DA, multiplicative trend M, damped multiplicative trend DM) * (no seasonality N, additive seasonality A, multiplicative seasonality M). If the model contains a seasonality, the seasonality info is captured in the Seasonality sub-element. Each TimeValue sub-element contains one predicted time point. The predicted time points are calculated from Gardner, Jr., E.S., "Exponential smoothing: The state of the art - Part II", 3 Jun. 2005 for the given Trend combined with the Seasonality type. The number of predicted time points contained in the model may be determined by the modeling kernel, for example by using the incertitude ranges of each prediction. This model also supports the multiple smoothing techniques described in the Multiple Smoothing for Higher-order Polynomials found in Brown, R.G., Smoothing, Forecasting and Prediction of Discrete Time Series, Mineola, New York: Dover Publications, Inc., 2004. <xs:element name="ExponentialSmoothing"> <xs:complexType> <xs:sequence> <xs:element ref="Level" minOccurs="1" maxOccurs="1"/> <xs:element ref="Trend_ExpoSmooth" minOccurs="0" maxOccurs="1"/> <xs:element ref="Seasonality_ExpoSmooth" minOccurs="0" maxOccurs="1"/> <xs:element ref="TimeValue" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="RMSE" type="REAL-NUMBER"/> <xs:attribute name="transformation" default="none"> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="none"/> <xs:enumeration value="logarithmic"/> <xs:enumeration value="squareroot"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> RMSE is the root mean squared error of the predictions. Transformation specifies what transformation has been applied to the time series prior to executing the algorithm. Possible values are "none," "logarithmic" and "squareroot." This attribute is informational only, and does not affect scoring. Level specifies smoothedValue the smoothed value of the time series at the last known point of the history. The optional attribute alpha is the optimal smoothing parameter for the level. It can be used to continue the fitting process if more data become known, but it is not needed for scoring. However, it may be used to compute theoretical confidence intervals. <xs:element name="Level"> <xs:complexType> <xs:attribute name="alpha" type="REAL-NUMBER" use="optional"/> <xs:attribute name="smoothedValue" type="REAL-NUMBER"/> </xs:complexType> </xs:element> Trend_ExpoSmooth specifies the smoothed value or coefficients of the trend at the last known point of the history. For the Gardner models, the smoothed value can be found in the smoothedValue attribute; for Brown's polynomial models, the smoothed coefficients can be found in the array sub-element. The smoothed coefficients are required if the specified trend is "polynomial_exponential;" otherwise, the smoothed value is required. The optional attribute gamma is the optimal smoothing parameter for the trend. It can be used to continue the fitting process if more data become known, but it is not needed for scoring. The damping parameter phi is needed when scoring the damped versions of the Gardner models. <xs:element name="Trend_ExpoSmooth"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY" minOccurs="0"/> </xs:sequence> <xs:attribute name="trend" default="additive"> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="additive"/> <xs:enumeration value="damped_additive"/> <xs:enumeration value="multiplicative"/> <xs:enumeration value="damped_multiplicative"/> <xs:enumeration value="polynomial_exponential"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="gamma" type="REAL-NUMBER" use="optional"/> <xs:attribute name="phi" type="REAL-NUMBER" use="optional" default="1"/> <xs:attribute name="smoothedValue" type="REAL-NUMBER" use="optional"/> </xs:complexType> </xs:element> Seasonality_ExpoSmooth describes a periodic oscillation cycle with a length of period time units, where period must be a positive integer. The phase indicates the season index of the last known data point; it defaults to period. The oscillation can be additive, that means of the form 'trend + oscillation' or multiplicative, that means of the form 'trend * oscillation'. Unit is a string used for naming the cycles, such as "week" or "year." It is optional and serves only for explanatory purposes. delta is seasonal smoothing weight. The sub-element RealArray (of size period) contains floating point numbers which describe the local values of the oscillation at each of the season indices. In the additive case, the sum of all these numbers may be normalized to 0. In the multiplicative case, the product of all these numbers may be normalized to 1. <xs:element name="Seasonality_ExpoSmooth"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> <xs:attribute name="type" use="required"> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="additive"/> <xs:enumeration value="multiplicative"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="period" type="INT-NUMBER" use="required"/> <xs:attribute name="unit" type="xs:string" use="optional"/> <xs:attribute name="phase" type="INT-NUMBER" use="optional"/> <xs:attribute name="delta" type="REAL-NUMBER" use="optional"/> </xs:complexType> </xs:element> Scoring ProcedurePredictions of future events in a time series are based on a set of well-defined formulae. The formula used to score a particular model is based on the specified trend and seasonality. The formulae use the following notation:
The complete set of formulae are listed below. The first 15 formulae are identified by the combination of trend and seasonality specified by the model (i.e. <trend>-<seasonality>). The remaining formula is used for Brown's polynomial models. In this case, the number of values found in the Array element within the Trend_ExpoSmooth element will dictate the order of the polynomial model. A single coefficient indicates a constant model, for which predictions are calculated by applying the formula Xt(m) = a0. For a linear model (two coefficients), the formula Xt(m) = a0 + a1m is used, while a quadratic model (three coefficients) would use the formula Xt(m) = a0 + a1m + (1/2)a2m2 . N-N: Xt(m) = St N-A: Xt(m) = St + It-p+m N-M: Xt(m) = StIt-p+m A-N: Xt(m) = St + mTt A-A: Xt(m) = St + mTt + It-p+m A-M: Xt(m) = (St + mTt)It-p+m DA-N: Xt(m) = St + (Sum[i in 1..m]φiTt) DA-A: Xt(m) = St + (Sum[i in 1..m]φiTt) + It-p+m DA-M: Xt(m) = (St + (Sum[i in 1..m]φiTt))It-p+m M-N: Xt(m) = StRtm M-A: Xt(m) = StRtm + It-p+m M-M: Xt(m) = (StRtm)It-p+m DM-N: Xt(m) = StRt(Sum[i in 1..m]φi) DM-A: Xt(m) = StRt(Sum[i in 1..m]φi) + It-p+m DM-M: Xt(m) = (StRt(Sum[i in 1..m]φi))It-p+m Brown: Xt(m) = a0 + a1m + (1/2)a2m2 + ... + (1/n!)anmn Example for a time series model: <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="2018 DMG.org"> <Application name="test application"/> <Timestamp>2008-06-23 10:30:00</Timestamp> </Header> <DataDictionary numberOfFields="2"> <DataField dataType="integer" optype="continuous" name="TS" displayName="TS"/> <DataField dataType="double" optype="continuous" name="VALUE" displayName="TS-VALUE"/> </DataDictionary> <TimeSeriesModel modelName="AA2Model" functionName="timeSeries" algorithmName="exponential smoothing" bestFit="ExponentialSmoothing"> <MiningSchema> <MiningField name="TS" usageType="order"/> <MiningField name="VALUE" usageType="target"/> </MiningSchema> <TimeSeries usage="logical" startTime="1" endTime="24" interpolationMethod="none"> <TimeValue index="1" value="112"/> <TimeValue index="2" value="118"/> <TimeValue index="3" value="132"/> <TimeValue index="4" value="129"/> <TimeValue index="5" value="121"/> <TimeValue index="6" value="135"/> <TimeValue index="7" value="148"/> <TimeValue index="8" value="148"/> <TimeValue index="9" value="136"/> <TimeValue index="10" value="119"/> <TimeValue index="11" value="104"/> <TimeValue index="12" value="118"/> <TimeValue index="13" value="115"/> <TimeValue index="14" value="126"/> <TimeValue index="15" value="141"/> <TimeValue index="16" value="135"/> <TimeValue index="17" value="125"/> <TimeValue index="18" value="149"/> <TimeValue index="19" value="170"/> <TimeValue index="20" value="170"/> <TimeValue index="21" value="158"/> <TimeValue index="22" value="133"/> <TimeValue index="23" value="114"/> <TimeValue index="24" value="140"/> </TimeSeries> <TimeSeries usage="prediction" interpolationMethod="none"> <TimeValue index="25" value="145" standardError="7.3"/> <TimeValue index="26" value="150" standardError="8.3"/> <TimeValue index="27" value="178" standardError="9.3"/> <TimeValue index="28" value="163" standardError="10.3"/> <TimeValue index="29" value="172" standardError="11.3"/> <TimeValue index="30" value="178" standardError="12.3"/> <TimeValue index="31" value="199" standardError="13.3"/> <TimeValue index="32" value="199" standardError="14.3"/> <TimeValue index="33" value="184" standardError="15.3"/> <TimeValue index="34" value="162" standardError="16.3"/> <TimeValue index="35" value="146" standardError="17.3"/> <TimeValue index="36" value="166" standardError="18.3"/> </TimeSeries> <ExponentialSmoothing RMSE="7.3"> <Level alpha="0.233984" smoothedValue="139.8"/> <Trend_ExpoSmooth smoothedValue="4.139" gamma="3.910E-4" phi="1.006" trend="damped_additive"/> <Seasonality_ExpoSmooth type="multiplicative" period="12" unit="month" delta="0.8254" phase="12"> <Array n="12" type="real"> .900 .840 .924 .976 .994 1.120 0.981 1.025 1.038 1.038 0.908 1.259 </Array> </Seasonality_ExpoSmooth> </ExponentialSmoothing> </TimeSeriesModel> </PMML> This is an example of an exponential smoothing time series model using Brown's multiple smoothing technique: <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"> <Application name="test application"/> </Header> <DataDictionary numberOfFields="2"> <DataField dataType="double" name="Month_Index" optype="continuous"> <Interval closure="closedClosed" leftMargin="1" rightMargin="50"/> </DataField> <DataField dataType="double" name="QuadraticMonth" optype="continuous"/> </DataDictionary> <TimeSeriesModel bestFit="ExponentialSmoothing" functionName="timeSeries" modelName="QuadraticMonth Predictor"> <MiningSchema> <MiningField invalidValueTreatment="asIs" name="Month_Index" outliers="asIs" usageType="order"/> <MiningField name="QuadraticMonth" outliers="asIs" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="QuadraticMonth Predictor - Predicted Value" optype="continuous"/> </Output> <TimeSeries> <TimeValue index="50" value="2550"/> </TimeSeries> <ExponentialSmoothing RMSE="1.469296539064514"> <Level alpha="0.29"/> <Trend_ExpoSmooth trend="polynomial_exponential"> <Array type="real" n="3">2549.999972 100.9999732 1.999994714</Array> </Trend_ExpoSmooth> <TimeValue index="1" value="1.499999999999986"/> <TimeValue index="2" value="4.434999999999974"/> <TimeValue index="3" value="9.422699999999999"/> <TimeValue index="4" value="16.69814500000001"/> <TimeValue index="5" value="26.30456892500002"/> <TimeValue index="6" value="38.19968310645001"/> <TimeValue index="7" value="52.31215520655601"/> <TimeValue index="8" value="68.56947439659567"/> <TimeValue index="9" value="86.90993292618671"/> <TimeValue index="10" value="107.2862207849398"/> <TimeValue index="11" value="129.6649358584095"/> <TimeValue index="12" value="154.0243926888923"/> <TimeValue index="13" value="180.3519824960263"/> <TimeValue index="14" value="208.6416885111437"/> <TimeValue index="15" value="238.8920018116692"/> <TimeValue index="16" value="271.1042947305884"/> <TimeValue index="17" value="305.281618813078"/> <TimeValue index="18" value="341.4278584112037"/> <TimeValue index="19" value="379.5471635161653"/> <TimeValue index="20" value="419.6435914851766"/> <TimeValue index="21" value="461.7208987097413"/> <TimeValue index="22" value="505.7824356899514"/> <TimeValue index="23" value="551.8311103729045"/> <TimeValue index="24" value="599.8693941784766"/> <TimeValue index="25" value="649.8993527234378"/> <TimeValue index="26" value="701.9226890294883"/> <TimeValue index="27" value="755.9407912489685"/> <TimeValue index="28" value="811.9547799736052"/> <TimeValue index="29" value="869.965552291198"/> <TimeValue index="30" value="929.9738211628786"/> <TimeValue index="31" value="991.980149602085"/> <TimeValue index="32" value="1055.984979693915"/> <TimeValue index="33" value="1121.988656811033"/> <TimeValue index="34" value="1189.991449540624"/> <TimeValue index="35" value="1259.993565893434"/> <TimeValue index="36" value="1331.99516636062"/> <TimeValue index="37" value="1405.996374344018"/> <TimeValue index="38" value="1481.997284428074"/> <TimeValue index="39" value="1559.997968898639"/> <TimeValue index="40" value="1639.998482851363"/> <TimeValue index="41" value="1721.998868174932"/> <TimeValue index="42" value="1805.999156642965"/> <TimeValue index="43" value="1891.999372304384"/> <TimeValue index="44" value="1979.999533324537"/> <TimeValue index="45" value="2069.999653398587"/> <TimeValue index="46" value="2161.999742833126"/> <TimeValue index="47" value="2255.999809371593"/> <TimeValue index="48" value="2351.9998588225"/> <TimeValue index="49" value="2449.999895536411"/> <TimeValue index="50" value="2549.999922767288"/> </ExponentialSmoothing> </TimeSeriesModel> </PMML> This model can be scored using Brown's formula for multiple smoothing. For instance, to predict the next value in the series (i.e. index=51, which corresponds to m=1), we use the following calculation: Xt(m) = a0 + a1m + (1/2)a2m2 ARIMA and Transfer Function models
The element ARIMA contains an ARIMA (Autoregressive Integrated Moving Average) model
for the time series data to better understand the data or to predict values for
future time points in the series. An ARIMA model, also known as Box-Jenkins model,
is composed of two parts: an autoregressive (AR) part and a moving average (MA) part.
The AR part specifies the current value of the time series as a linear combination of
the previous values of the time series plus a stochastic term. The MA part is a
linear function of white noise error terms or random shocks at current and previous
time points. A Transfer Function model is ARIMA model with predictor series.
The element ARIMA supports both the non-seasonal and seasonal ARIMA models,
with or without a set of predictors. Other features supported in the element include
the specification of the numerator and denominator polynomials for each predictor,
the definition of each detected outlier, and the input of extra information needed
for forecasting. The technical details of the model are described in
Brockwell, P. J., and R. A. Davis. Time series: theory and methods.
Springer Science & Business Media, 1991. Overview of models and methods.Denote the target time series to be modeled and predicted Yt. We will use B to represent a backward shift (or lag) operator: BYt = Yt-1, its k-th degree is BkYt = Yt-k. Δd is a differencing operator of order d: Δd = ( 1 - B )d. The general form for an AR polynomial of order p is defined as: ϕp (B) = 1 - φ1 B - φ2 B2 - ... - φp Bpand a MA polynomial of order q: θq (B) = 1 - θ1 B - θ2 B2 - ... - θq BqThose represent non-seasonal part of the model, and seasonal polynomials of order P and Q are used for the seasonal components of AR and MA: ΦP (Bs) = 1 - Φ1 Bs - Φ2 B2s - ... - ΦP BPsand ΘQ (Bs) = 1 - Θ1 Bs - Θ2 B2s - ... - ΘQ BQs where s is the seasonal length or period. The coefficients φi, θi, Φi, Θi of the above polynomials constitute a major part of an ARIMA model. Often the values of a time series are affected by one or more other time series, called predictor time series or dynamic regressors. We will denote them by Xit, i=1,..., k. Consider the numerator polynomial for the i-th predictor Xit: Numi = ( ωi0 - ωi1B - ... - ωiuBu ) * ( 1 - Ωi1Bs - ... - ΩivBvs )and denominator polynomial for that predictor: Deni = ( 1 - δi1B - ... - δirBr ) * ( 1 - Δi1Bs - ... )The following equations are given in terms of Yt and Xit, but they can be log or square root transformations of the original series. If the target time series was transformed then its predictions should be back-transformed to the original scale. ARIMA model includes the following widely used variations:ARMA( p, q ): Yt = μ + θq( B ) / ϕp( B ) * at Here μ is a constant, at is a white noise series normally distributed with mean zero and variance σ2. ARIMA( p, d, q ): Δd Yt = μ + θq( B ) / ϕp( B ) * at Note: ARIMA( p, 0, q ) = ARMA( p, q ).ARIMA( p, d, q )( P, D, Q ): Δd ΔD Yt = μ + θq( B ) ΘQ( Bs ) / ϕp( B ) ΦP( Bs ) * at Here Δd and ΔD are the difference operator and the seasonal difference operator for Yt. Note: ARIMA( p, 0, q )( P, 0, Q ) = ARMA( p, q )( P, Q ).Transfer Function model: Δd ΔD Yt = μ + Σi=1 to K (Numi / Deni Δdi ΔDi Bli Xit) + Nt, where Nt is the noise series that follows zero mean ARMA(p,q)(P,Q) model: Nt = θq( B ) ΘQ( Bs ) / ϕp( B ) ΦP( Bs ) * at Here Δdi and ΔDi are the difference operator and
seasonal difference operator for Xit, respectively, and Bli
is delay term for Xit , li is the order of the delay. Transfer function model with outliers Outliers can have significant effect on the predictions of time series models.We follow the paper "Outliers, Level Shifts, and Variance Changes in Time Series" by
Ruey S. Tsay. Journal of Forecasting, Vol. 7, I-20 (1988).
Six types of outliers are allowed: additive outliers (AO), innovational outliers (IO),
level shift (LS), temporary (or transient) change (TC), seasonal additive (SA), local trend (LT).
Suppose there are M outliers at times T1, T2, ..., TM with types O1,
O2, ..., OM and magnitudes w1, w2, ..., wM.
The transfer function model incorporating all these outliers is
Since ITk(t) can be regarded as a predictor series, i.e. the series takes value 1 if t=Tk, otherwise is 0, we can consider each outlier part LOk(B) Δd ΔD ITk(t) as a transfer function. Thus, in the scoring descriptions below they should be treated as transfer functions. ARIMA element in PMML 4.4 and later includes all the parameters mentioned above and provides three forecasting methods: conditional least squares, Theta recursion, Kalman filter. Theta recursion and Kalman filter are two kinds of exact least squares prediction method. This method is normally used to predict the noise term Nt, and Yt is then obtained by adding the other terms such as the mean constant μ or transfer functions for predictors or outliers. Theta recursion methodFor a series with zero mean, and without predictors, outliers, or differencing operators, one-step-ahead forecast of Yt+1 based on the past observations Y1,...,Yt and past predicted values Ŷ1,...,Ŷt is: for t between 1 and m = max( p, q ):Ŷt+1 = Σj=1 to t θt,j (Yt+1-j-Ŷt+1-j ), for t > m: Ŷt+1= φ1 Yt+φ2 Yt-1+...+φp Yt+1-p+ Σj=1 to q θt,j (Yt+1-j-Ŷt+1-j ) θ(t,j) is computed recursively based on theoretical auto-covariance of ARMA(p,q) model and the
forecast variance at previous time point. The forecast variance of Ŷt+1 is estimated as
σt+12=σ2*νt,
where the forecast variance factor νt is also recursively computed based on theoretical auto-covariance
of ARMA(p,q) model, θt,j, and the forecast variance factor at previous time point.
The computation details can be found in proposition 5.2.2 and equation 5.3.9 in Chapter 5 of Brockwell and Davis(1991)
where this is called "The innovation algorithm".
Kalman Filter methodLet m = max( p, q ). The state-space representation of a time series Yt with zero mean and without predictors, outliers, or differencing operators is:St+1 = F St + H et Yt = G St + et where St is a state vector of m by 1, and G=(1,0,...,0)(1×m) HT=(ψ1,...,ψm ) where ψi are coefficients in (1-θ1 B-θ2 B2-...-θq Bq)/ (1-φ1 B-φ1 B2-...-φp Bp )= 1+ψ1 B+ψ2 B2+... F is an m×m matrix with 1's above the main diagonal, the bottom row containing φm, φm-1, ..., φ1, and zeros elsewhere. Here φi = 0 if i > p. The noise term et has expectation E(et)=0 and variance Var(et ) = σ2. The details of one-step-ahead forecast and forecast variance are given by the proposition 12.2.2 in Chapter 12 of Brockwell and Davis (1991). After model estimation, the following statistics need to be saved into PMML:
Maximum likelihood forecasting methodsThe unknown parameters of ARMA model are estimated iteratively by maximizing the log likelihood function. In each iteration, we use the theta recursive method or Kalman filter method to compute one-step-ahead forecast values based on parameters in the last iteration, and then compute the log-likelihood value. Based on log-likelihood value, we update the parameters based on derivatives. These steps are repeated until the criteria of convergence are reached. Since the theta recursive method or Kalman filter method are involved in the maximum likelihood estimation of parameters, we call the PMML element containing statistics for the two forecasting methods "MaximumLikelihoodStats".The schemaHere is the definition of ARIMA element:<xs:element name="ARIMA"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="NonseasonalComponent" minOccurs="0"/> <xs:element ref="SeasonalComponent" minOccurs="0"/> <xs:element ref="DynamicRegressor" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MaximumLikelihoodStat" minOccurs="0"/> <xs:element ref="OutlierEffect" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="RMSE" type="REAL-NUMBER"/> <xs:attribute name="transformation" default="none"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="none"/> <xs:enumeration value="logarithmic"/> <xs:enumeration value="squareroot"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="constantTerm" type="REAL-NUMBER" default="0"/> <xs:attribute name="predictionMethod" default="conditionalLeastSquares"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="conditionalLeastSquares"/> <xs:enumeration value="exactLeastSquares"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> The attribute RMSE is the root mean squared error, or standard deviation of the residuals. The attribute transformation is similar to the one in ExponentialSmoothing, which specifies what transformation has been applied to the target series prior to executing the algorithm. If present, the inverse of this transform should be applied to the model predictions to get the predicted value of the original series. The attribute constantTerm specifies the constant value μ in ARIMA part. This value should be added to the predicted value (before applying the inverse of the transformation, if specified) to get the prediction of the original series. The attribute predictionMethod specifies the method of prediction of the ARIMA model. Description of each method is given in the Prediction Methods section. Note that if predictionMethod is exactLeastSquares then element MaximumLikelihoodStat must be present. Nonseasonal Component and Seasonal ComponentThe AutoRegressive and MovingAverage components of the non-seasonal and seasonal component, respectively, of the time series model are described by the following elements: <xs:element name="NonseasonalComponent"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="AR" minOccurs="0"/> <xs:element ref="MA" minOccurs="0"/> </xs:sequence> <xs:attribute name="p" type="xs:nonNegativeInteger" default="0"/> <xs:attribute name="d" type="xs:nonNegativeInteger" default="0"/> <xs:attribute name="q" type="xs:nonNegativeInteger" default="0"/> </xs:complexType> </xs:element> <xs:element name="SeasonalComponent"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="AR" minOccurs="0"/> <xs:element ref="MA" minOccurs="0"/> </xs:sequence> <xs:attribute name="P" type="xs:nonNegativeInteger" default="0"/> <xs:attribute name="D" type="xs:nonNegativeInteger" default="0"/> <xs:attribute name="Q" type="xs:nonNegativeInteger" default="0"/> <xs:attribute name="period" type="xs:nonNegativeInteger" use="required"/> </xs:complexType> </xs:element> <xs:element name="AR"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="MA"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MACoefficients" minOccurs="0"/> <xs:element ref="Residuals" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="MACoefficients"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Residuals"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> The attribute p is the order of the non-seasonal autoregressive part of the model. The attribute d is the order of the non-seasonal differencing. The attribute q is the order of the non-seasonal moving average part of the model. The attribute P is the order of the seasonal autoregressive part of the model. The attribute D is the order of the seasonal differencing. The attribute Q is the order of the seasonal moving average part of the model. The attribute period is the seasonal length. The REAL-ARRAY in AR specifies the polynomial coefficients of autoregressive (AR) part from order 1 to order p (or order P for seasonal AR). The REAL-ARRAYs in MACoefficients specifies the polynomial coefficients of moving average (MA) part from order 1 to order q (or order Q for seasonal MA), as well as the residuals to which the coefficients are to be applied to. The REAL-ARRAY in Residuals specifies the residual error values, that is the past differences between the actual time series values and the model predictions, to which the MA coefficients are multiplied by. These values are needed for the conditionalLeastSquares prediction method and for GARCH. Dynamic RegressorTime series predictions can usually depend on not just the previous values of the series but other independent time series as well. The fields containing such time series are called external or dynamic regressors. They are also referred to as exogenous regressors in State Space models. The effects of the dynamic regressors can be added to the predictions of an ARIMA model or a State Space model (defined later in this document). <xs:element name="DynamicRegressor"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Numerator" minOccurs="0"/> <xs:element ref="Denominator" minOccurs="0"/> <xs:element ref="RegressorValues" minOccurs="0"/> </xs:sequence> <xs:attribute name="field" type="FIELD-NAME" use="required"/> <xs:attribute name="transformation" default="none"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="none"/> <xs:enumeration value="logarithmic"/> <xs:enumeration value="squareroot"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="delay" type="INT-NUMBER" default="0"/> <xs:attribute name="futureValuesMethod" default="constant"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="constant"/> <xs:enumeration value="trend"/> <xs:enumeration value="stored"/> <xs:enumeration value="otherModel"/> <xs:enumeration value="userSupplied"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="targetField" type="FIELD-NAME"/> </xs:complexType> </xs:element> <xs:element name="Numerator"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="NonseasonalFactor" minOccurs="0"/> <xs:element ref="SeasonalFactor" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Denominator"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="NonseasonalFactor" minOccurs="0"/> <xs:element ref="SeasonalFactor" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="SeasonalFactor"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> <xs:attribute name="difference" type="INT-NUMBER" default="0"/> <xs:attribute name="maximumOrder" type="INT-NUMBER"/> </xs:complexType> </xs:element> <xs:element name="NonseasonalFactor"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> <xs:attribute name="difference" type="INT-NUMBER" default="0"/> <xs:attribute name="maximumOrder" type="INT-NUMBER"/> </xs:complexType> </xs:element> Numerator and Denominator define the numerator and denominator polynomials of the fraction for a specified predictor. More details are given in the prediction methods section. field is the field name to which all the operations defined in this element are to be applied to. transformation specifies the transformation applied to this field. Possible values are "none" "logarithmic" and "squareroot", with default "none". delay specifies the number of intervals by which the input field's influence is delayed. For example, if delay="5", the value of the input predictor at time t doesn't affect forecast until five periods have elapsed (t+5). difference specifies the order of differencing applied to the selected predictor series before estimating models. maximumOrder defines the maximum order of numerator or denominator polynomial. All positive lower orders should be included in the model and represented in the REAL-ARRAY. In addition, order zero is always included for the numerator polynomial. RegressorValues contains the observed (and possibly future) values of the predictor. It can also have additional information needed for predicting future values of the predictor. This element is defined below. futureValuesMethod specifies a way to obtain future values of the predictor needed for predicting the target values. The following options are available:
targetField the target time series field name to which the regressor value should be added to. This is only needed for the case of multidimensional time series. <xs:element name="RegressorValues"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="TimeSeries" minOccurs="0"/> <xs:element ref="TrendCoefficients" minOccurs="0"/> <xs:element ref="TransferFunctionValues" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="TrendCoefficients"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="REAL-SparseArray" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="TransferFunctionValues"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> The coefficients ci used for trend computation described above are stored in the TrendCoefficients element, with the index of the array corresponding to the order of the polynomial term to which the coefficient is applied. TransferFunctionValues can contain the past values of transfer function, which help in computation of the future values of transfer function, see scoring description below. The Exact Least Squares methodKalman filter and Theta recursion information is stored in the following element:<xs:element name="MaximumLikelihoodStat"> <xs:complexType> <xs:sequence> <xs:choice> <xs:element ref="KalmanState"/> <xs:element ref="ThetaRecursionState"/> </xs:choice> </xs:sequence> <xs:attribute name="method" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="kalman"/> <xs:enumeration value="thetaRecursion"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="periodDeficit" default="0" type="INT-NUMBER"/> </xs:complexType> </xs:element> method specifies the method used to do the prediction. Different methods involve different parameters that are needed when the model is used to forecast into the future. Supported methods are “kalman” and “thetaRecursion”. The Kalman filter method can be used for time series with or without missing, and the theta recursion method can be used for time series without missing value. The predicted time points are calculated based on Chapters 5 and 12 of Brockwell and Davis’s book. periodDeficit specifies the number of missing values at the end of differenced series. The actual model fitting period ends at (endTime – periodDeficit). For example, the original dependent values Y[97], Y[98], Y[99], Y[100] are 11, 10, missing, 12, respectively. If the model calls for first-differencing, the differenced series dY[98], dY[99], dY[100] will be -1, missing, missing. The model fitting period will end at (100 – 2), and the periodDeficit=2 means that user expects to forecast from this model to start at 101. Noise prediction for Kalman filter method<xs:element name="KalmanState"> <xs:complexType> <xs:sequence> <xs:element ref="FinalOmega"/> <xs:element ref="FinalStateVector"/> <xs:element ref="HVector" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FinalOmega"> <xs:complexType> <xs:sequence> <xs:element ref="Matrix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FinalStateVector"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="HVector"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> FinalOmega includes a symmetric m*m matrix which represents the final omega matrix where the m = max(p, q). FinalStateVector includes a REAL-ARRAY. The length of the ARRAY should equal to m, ‘the larger of the degree of autoregression and the degree of moving average’. HVector includes a REAL-ARRAY to define the first m psi weights instead of recalculating at scoring time. Noise prediction for theta recursion method<xs:element name="ThetaRecursionState"> <xs:complexType> <xs:sequence> <xs:element ref="FinalNoise"/> <xs:element ref="FinalPredictedNoise"/> <xs:element ref="FinalTheta"/> <xs:element ref="FinalNu"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FinalNoise"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FinalPredictedNoise"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FinalTheta"> <xs:complexType> <xs:sequence> <xs:element ref="Theta" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Theta"> <xs:complexType> <xs:attribute name="i" type="INT-NUMBER"/> <xs:attribute name="j" type="INT-NUMBER"/> <xs:attribute name="theta" type="xs:double"/> </xs:complexType> </xs:element> <xs:element name="FinalNu"> <xs:complexType> <xs:sequence> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> FinalNoise includes a REAL-ARRAY which represents the final values of noise. The length of the ARRAY equals to m, ‘the larger of the degree of autoregression and the degree of moving average’. FinalPredictedNoise includes a REAL-ARRAY which represents the final predictions of noise. The length of the ARRAY equals to q, the degree of moving average. FinalTheta contains a collection of θi,i-k values needed for predicting future values of the series. Those values are kept in triples with two indices to avoid ambiguity. Each triple is kept in element Theta that has attributes for i, j=i-k, and the corresponding θi,j value. FinalNu contain values for νi. OutlierEffect<xs:element name="OutlierEffect"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="type" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="additive"/> <xs:enumeration value="level"/> <xs:enumeration value="transient"/> <xs:enumeration value="seasonalAdditive"/> <xs:enumeration value="trend"/> <xs:enumeration value="innovational"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="startTime" use="required" type="REAL-NUMBER"/> <xs:attribute name="magnitude" use="required" type="REAL-NUMBER"/> <xs:attribute name="dampingCoefficient" use="optional" type="REAL-NUMBER"/> </xs:complexType> </xs:element> OutlierEffect supports a number of detected outliers during the modeling process. Their types, locations, and magnitudes are represented by attributes type, startTime, and magnitude respectively. dampingCoefficient specifies the value of delta for the transient outlier. Details of their application are given in the Prediction Methods section. additive means there is an additive outlier, as a pulse function, at the time of startTime which needs to add a magnitude for predicted values at that time. level means there is a level shift, as a step function, from the time of startTime which needs to add a magnitude for all predicted values from that time on. transient means there is a transient change, as a damp function, from the time of startTime which needs to add a magnitude for predicted values from that time on, and the magnitude is damped over time. seasonalAdditive is similar with level, but the magnitude, as a seasonal pulse function, is only added to the subsequent time points with the same seasonal phase of startTime. trend means there is a local trend from the time of startTime which needs to add a magnitude for predicted values from that time, and the magnitude is increased or decreased over time based on its sign. innovational means there is an innovational outlier at the time of startTime which needs to add a magnitude for the noise at that time. Prediction Methods for ARIMAFor ARIMA models, the conditionalLeastSquares prediction method is simple. Given all information up to time t, one predicts the time series at time t+1, and then uses that to predict the series at time t+2 and so on. Given all information up to a given time t, the prediction at time t+1 is calculated using the ARIMA equations defined, the information required is the AR and MA coefficients and the residuals. To calculate predictions at time t+2 (or t+h in general), one needs the residuals and time series values up to time t+1 (or t+h-1 in general) to apply the AR and MA coefficients to respectively. One assumes all residuals after time t are 0 and the time series values after time t are given by the prediction calculations done for those time values. Scoring examples below provide all the details. The future values of the regressors are either calculated based on a trend, or are assumed not to change with time, or are supplied in PMML or are provided by the user during scoring. Example for a non-seasonal ARIMA model with conditional least squares predictionConsider an ARIMA(3,1,1) model with auto-regressive coefficients φ1, φ2, φ3 and a moving-average coefficient θ1. The model is represented as: Δ1Yt = θ1(B)/ϕ3(B) * at Multiplying both sides by ϕ3 (B) this can be written as: (1 - φ1B - φ2B2 - φ3B3)(1-B)yt = (1 - θ1B)at Rewriting this, the prediction at time t+1 is: yt+1 = (1 + φ1)yt - (φ1 - φ2)yt-1 - (φ2 - φ3)yt-2 - φ3yt-3 - θ1at here at is the residual at time t. The AR and MA elements give the coefficients at each time step. The coefficient array gives the values of the coefficients starting at time t, t-1 and so on. Assume that the AR and MA coefficients at each time step were calculated as φ1=0.05, φ2=0.1, φ3=0.3, θ1=-0.4 and the fit error (residual) was 2: <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"/> <DataDictionary> <DataField name="orders" optype="continuous" dataType="double" displayName="TS"/> <DataField name="date" optype="continuous" dataType="dateDaysSince[1970]" displayName="Time"/> </DataDictionary> <TimeSeriesModel functionName="timeSeries" bestFit="ARIMA"> <MiningSchema> <MiningField name="date" usageType="order"/> <MiningField name="orders" usageType="target"/> </MiningSchema> <TimeSeries usage="logical" startTime="1" endTime="6"> <TimeAnchor type="dateDaysSince[1970]" offset="6940" stepsize="1" displayName="Month"> <TimeCycle length="12" type="includeFromTo" displayName="Year"> <Array type="int" n="2">0 11</Array> </TimeCycle> </TimeAnchor> <TimeValue index="1" value="11357.92"/> <TimeValue index="2" value="10605.95"/> <TimeValue index="3" value="16998.57"/> <TimeValue index="4" value="6563.75"/> <TimeValue index="5" value="6607.69"/> <TimeValue index="6" value="9839.0"/> </TimeSeries> <ARIMA RMSE="218.6" transformation="none"> <NonseasonalComponent p="3" d="1" q="1"> <AR> <Array type="real" n="3">0.05 0.1 0.3 </Array> </AR> <MA> <MACoefficients> <Array type="real">-0.4</Array> </MACoefficients> <Residuals> <Array type="real">2</Array> </Residuals> </MA> </NonseasonalComponent> </ARIMA> </TimeSeriesModel> </PMML> The prediction of 'orders' at time index "7" is therefore: orders = 1.05*9839 + 0.05*6607.69 + 0.2*6563.75 - 0.3*16998.57 - (-0.4)*2 Transfer Function Scoring and ExamplesForecasting in the historical dataSuppose that a data set {Yt,X1t,…,XKt} is given for t=1,...,n, also a time series model is given. Then the one-step-ahead forecast can be obtained using the following steps: Step 1. Computation of transfer functionFor each predictor series, transfer function needs to be calculated. For a predictor series Xit, let the transfer function be Vit = Numi / Deni Δdi ΔDi Bli. It can be calculated as follows.Calculate Uit = Δdi ΔDi Bli Xit Recursively calculate Vit = -Σj=1 to di Deni.coeff(j) * Vit-j + Σj=0 to ni Numi.coeff(j) * Uit-j where ni and di are degrees of numerator and denominator polynomials for the i-th predictor, respectively, Numi.coeff(j) and Deni.coeff(j) are the coefficients of those polynomials. All missing Vit-j in the first term are taken to be Vi,-∞ and missing Uit-j in the second term are taken to be Ui,-∞, where Ui,-∞ is the first non-missing measurement of Uit-j and Vi,-∞ = Numi.coeff(1)/ Deni.coeff(1) * Ui,-∞. Step 2. Computation of noise processNt = Δd ΔD Yt - μ - Σi=1 to K Numi / Deni Δdi ΔDi Bli XitStep 3. One-step-ahead forecast of noise processNt is a zero mean ARMA(p+sP,q+sQ) process. One-step-ahead forecast can be obtained by theta recursive method or Kalman filter method, denote it as N̂t.Step 4. One-step-ahead of forecast for the series Δd Δd YtLet Zt = Δd ΔD Yt, then one-step-ahead of series Zt isẐt = μ + Σi=1 to K Vit + N̂t Step 5. One-step-ahead of forecast the YtŶt = Ẑt - Σj=1 to d+Ds τj Yt-jwhere τj is the coefficient corresponding to power j of the difference operator Δd ΔD Note: If Yt is a transformation of an original series, then we need to back transform results to get the prediction of the original series. Forecasting beyond the historical dataSuppose that time series {Yt, X1t,…,XKt} are given up to time n, and that future predictor series {X1t,…,XKt} are given for up to time n+H or can be computed based on futureValuesMethod specified in DynamicRegressor element. Now we can use the transfer function model that is built on the historical data to forecast future values of target for t=n+1 to n+H using the method described in the above section except that we cannot compute the noise values Nt, t = n+1,...,n+H because there is no target value Yt, t = n+1,...,n+H. So based on Nt and N̂t in historical data, we will compute one-step-ahead forecast for noise process if t=n+1, and h-step-ahead forecast if h is between 1 and n+H. Kalman filter example and scoring<PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"/> <DataDictionary> <DataField name="VALUE" optype="continuous" dataType="double" displayName=" TS-VALUE"/> <DataField name="TS" optype="continuous" dataType="double" displayName="TS"/> </DataDictionary> <TimeSeriesModel functionName="timeSeries" bestFit="ARIMA"> <MiningSchema> <MiningField name="TS" usageType="order"/> <MiningField name="VALUE" usageType="target"/> </MiningSchema> <TimeSeries usage="logical" startTime="1" endTime="240"> <TimeValue index="1" value="1.00"/> <TimeValue index="2" value="0.58"/> <TimeValue index="239" value="0.49"/> <TimeValue index="240" value="0.48"/> </TimeSeries> <ARIMA RMSE="1.00854505389749" transformation="none" constantTerm="0.375271182246529" predictionMethod="exactLeastSquares"> <NonseasonalComponent p="1" d="0" q="1"> <AR> <Array type="real"> 0.441912328691372 </Array> </AR> <MA> <MACoefficients> <Array type="real">-0.135110189708823</Array> </MACoefficients> </MA> </NonseasonalComponent> <MaximumLikelihoodStat method="kalman" periodDeficit="0"> <KalmanState> <FinalOmega> <Matrix kind="symmetric" nbRows="1" nbCols="1"> <Array type="real" n="1">0</Array> </Matrix> </FinalOmega> <FinalStateVector> <Array n="1" type="real">0.00493251439212172</Array> </FinalStateVector> </KalmanState> </MaximumLikelihoodStat> </ARIMA> </TimeSeriesModel> </PMML> This is an ARIMA(1,0,1) model
Yt = μ + ( 1 - θ B ) / ( 1 - φ B ) at In this case, m=1, G=1, F=φ, and S241 = 0.00493251439212172 which is computed after model
building and saved in PMML.
The one-step-ahead forecast of N241 is Theta Recursion Example and Scoring<PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"/> <DataDictionary> <DataField name="TS" optype="continuous" dataType="double" displayName="T"/> <DataField name="x" optype="continuous" dataType="double" displayName="predictor-series"/> <DataField name="y" optype="continuous" dataType="double" displayName="target-series"/> </DataDictionary> <TimeSeriesModel functionName="timeSeries" bestFit="ARIMA"> <MiningSchema> <MiningField name="TS" usageType="order"/> <MiningField name="x" usageType="active"/> <MiningField name="y" usageType="target"/> </MiningSchema> <TimeSeries usage="logical" startTime="1" endTime="63"> <TimeValue index="1" value="3750.1797916378027"/> <TimeValue index="2" value="3846.066273602286"/> <TimeValue index="63" value="12141.488999636887"/> </TimeSeries> <ARIMA RMSE="90.8960561930059" transformation="none" constantTerm="342.594932405502" predictionMethod="exactLeastSquares"> <NonseasonalComponent p="1" d="1" q="1"> <AR> <Array type="real">0.590549588503187</Array> </AR> <MA> <MACoefficients> <Array type="real">0.459274266537189</Array> </MACoefficients> </MA> </NonseasonalComponent> <DynamicRegressor field="x" futureValuesMethod="userSupplied"> <Numerator> <NonseasonalFactor> <Array type="real">-0.0114728000479396 -.0115880130277021</Array> </NonseasonalFactor> </Numerator> <RegressorValues> <TimeSeries usage="logical" startTime="63" endTime="63"> <TimeValue index="63" value="2538309.727789499"/> </TimeSeries> </RegressorValues> </DynamicRegressor> <MaximumLikelihoodStat method="thetaRecursion" periodDeficit="0"> <ThetaRecursionState> <FinalNoise> <Array n="1" type="real">127.264876980187</Array> </FinalNoise> <FinalPredictedNoise> <Array n="1" type="real">-2.10096285515485</Array> </FinalPredictedNoise> <FinalTheta> <Theta i="63" j="1" theta="-0.459274266537189"/> </FinalTheta> <FinalNu> <Array n="1" type="real">1</Array> </FinalNu> </ThetaRecursionState> </MaximumLikelihoodStat> </ARIMA> </TimeSeriesModel> </PMML> The dataset has two variables: target Y and predictor X. The dataset has 63 points and the model
is ARIMA(1,1,1) with a predictor Xt: Step 1. Compute transfer function for Xt: Step 2. Compute the noise prediction Step 3. Let Z64 = ( 1 - B ) Y64. Its prediction is Ẑ64 = μ + V64 + N̂64 = 650.783 Step 4. The forecasted value for target at t=64 is Ŷ64 = Y63 + Ẑ64 = 12141.488999636887 + 650.783 = 12792.27 More generally, theta recursion scoring is computed as follows. Let Nt follow ARMA( p, q ) model. One-step-ahead noise forecast is given by N̂n+1 = φ1 Nn + φ2 Nn-1 +...+ φp Nn+1-p + Σj=1 to q θn,j (Nn+1-j - N̂n+1-j )h-step-ahead noise forecast when h is less than or equal q is N̂n(h) = Σi=1 to p φi N̂n (h-i) + Σj=h to q θn+h-1,j (Nn+h-j - N̂n+h-j ) and for h > q N̂n(h) = Σi=1 to p φi N̂n (h-i) where N̂n (j) = Nn-j for j less than or equal to 0. And θij are computed recursively as follows: For i=n+1,n+2,... when k < i - q we set θi,i-k = 0. For k between i-q and i-1 we have θi,i-k = νk-1 (κ(i+1,k+1) - Σj=i-q to k-1 θk,k-j θi,i-j νj ) νi = κ(i+1,i+1) - Σj=i-q to i-1 θi,i-j2 νj = Σr=0 to q ϑr2 - Σk=1 to q θi,k2 νi-k where κ(i+1,k+1) = Σr=0 to q-|i-k| ϑr ϑr+|i-k|, for i-q-1 < k < i, with ϑ0 = 1, ϑi = -θi for i=1,...,q, and ϑi = 0 if i > q. Consider a slightly more complicated example of a theta recursion model. <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"/> <DataDictionary> <DataField name="TS" optype="continuous" dataType="double" displayName="T"/> <DataField name="x" optype="continuous" dataType="double" displayName="predictor-series"/> <DataField name="y" optype="continuous" dataType="double" displayName="target-series"/> </DataDictionary> <TimeSeriesModel functionName="timeSeries" bestFit="ARIMA"> <MiningSchema> <MiningField name="TS" usageType="order"/> <MiningField name="x" usageType="active"/> <MiningField name="y" usageType="target"/> </MiningSchema> <TimeSeries usage="logical" startTime="1" endTime="63"> <TimeValue index="1" value="3750.1797916378027"/> <TimeValue index="2" value="3846.066273602286"/> <TimeValue index="63" value="12141.488999636887"/> </TimeSeries> <ARIMA transformation="none" constantTerm="320.636778635366" predictionMethod="exactLeastSquares"> <NonseasonalComponent p="1" d="1" q="2"> <AR> <Array type="real">-0.09114206662863</Array> </AR> <MA> <MACoefficients> <Array type="real">-0.21505809467966 -0.182217351518243 </Array> </MACoefficients> </MA> </NonseasonalComponent> <DynamicRegressor field="x" futureValuesMethod="userSupplied"> <Numerator> <NonseasonalFactor> <Array type="real">-0.0101715603339601 -0.0102725915582332</Array> </NonseasonalFactor> </Numerator> <RegressorValues> <TimeSeries usage="logical" startTime="63" endTime="63"> <TimeValue index="63" value="2538309.727789499"/> </TimeSeries> </RegressorValues> </DynamicRegressor> <MaximumLikelihoodStat method="thetaRecursion" periodDeficit="0"> <ThetaRecursionState> <FinalNoise> <Array n="2" type="real">-0.948468422344376 130.727431958243</Array> </FinalNoise> <FinalPredictedNoise> <Array n="2" type="real">-9.3141625486384 27.6726476335164</Array> </FinalPredictedNoise> <FinalTheta> <Theta i="63" j="1" theta="0.21505809467966"/> <Theta i="63" j="2" theta="0.182217351518243"/> </FinalTheta> <FinalNu> <Array n="2" type="real">1 1</Array> </FinalNu> </ThetaRecursionState> </MaximumLikelihoodStat> </ARIMA> </TimeSeriesModel> </PMML> The model is ARIMA(1,1,2) with a predictor Xt: Step 1. Compute transfer function for Xt: Step 2. Compute the noise prediction Step 3. Let Z64 = ( 1 - B ) Y64. Its prediction is Ẑ64 = μ + V64 + N̂64 = 588.8577 Step 4. The forecasted value for target at t=64 is Ŷ64 = Y63 + Ẑ64 = 12141.488999636887 + 588.8577 = 12730.35 To predict the time series at t=65 we will use the same steps, but we also need to compute the new θi,i-k. Step 1. Compute transfer function for Xt, with user-supplied predictor values: Step 2. Compute the noise prediction Step 3. Let Z65 = (1-B)Y65. Its prediction is Ẑ65 = μ + V65 + N̂63(2) = 594.8083 Step 4. The forecast value for target at t=65 is Ŷ65 = Ŷ64 + Ẑ65 = 12730.35 + 594.8083 = 13325.16 Confidence of Prediction CalculationStep 1. Compute the variance estimate of h-step ahead of noise process: If the theta recursive method is used, the variance of h-step ahead of noise process can be computed as σNt2(h) = σ2Σj=0 to h-1 (Σr=0 to j χ2θt+h-r-1,j-r)2νt+h-j-1 Where σ2 is estimated as the squared RMSE, the constants χr are calculated recursively as χ0 = 1 χr = Σk=1 to min(p,r) φkχr-k, r=1,2,3,... And the computation details of νt and θij can be found in proposition 5.2.2 and equation 5.3.9 in Chapter 5 of Brockwell and Davis (1991) where this is called "The innovation algorithm". If the Kalman Filter method is used, the variance of h-step ahead of noise process can be computed as σNt2(h) = σ2(Ωt(h)(1,1) + 1) Where σ2 is estimated as the squared RMSE, and the Ωt(h)(1,1) can be computed as Ωt(h) = FΩt(h-1)FT + HHT with Ωt(1) = Ωt+1 Step 2. Compute the confidence interval of h-step-ahead prediction: The 100(1-α)% confidence interval of prediction at the time point t+h is computed as (Ŷt(h) - tdf,α/2*σNt(h), Ŷt(h) + tdf,α/2*σNt(h)) where tdf,α/2 is the (1-α/2)100th percentile of the t distribution with degree of freedom df which can be computed by the number of valid noise residuals minus the number of parameters. GARCHGARCH (Generalized Auto Regressive Conditional Heteroscedasticity) models (Bollerslev, T. P., 1986, Generalized Autoregressive Conditional Heteroscedasticity, Journal of Econometrics, 31: 307-327.) are widely used to characterize and model time series with time-varying volatility. In GARCH models the conditional variance of the error terms are assumed to have a structure very similar to the structure of the conditional expectation in an ARMA model, i.e., GARCH models assume the variance of the current error term to be a linear function of the squares of the previous errors and their variances: Yt = μ + Σi=1 to p φiYt-i + Σj=1 to q θjεt-j + εtht = α0 + Σi=1 to gq αiεt-i2 + Σj=1 to gp βjht-j Here εt are residuals from the MA part of the non-seasonal component, ht are variances, αi and βj are residual square coefficients and coefficients of past variances, respectively. gp and gq are order of variance and order of residual squares, respectively. The following element contains both the ARMA part and a GARCH part for a complete GARCH model. <xs:element name="GARCH"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="ARMAPart"/> <xs:element ref="GARCHPart"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="ARMAPart"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="AR"/> <xs:element ref="MA"/> </xs:sequence> <xs:attribute name="constant" default="0" type="REAL-NUMBER"/> <xs:attribute name="p" use="required" type="INT-NUMBER"/> <xs:attribute name="q" use="required" type="INT-NUMBER"/> </xs:complexType> </xs:element> <xs:element name="GARCHPart"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="ResidualSquareCoefficients"/> <xs:element ref="VarianceCoefficients"/> </xs:sequence> <xs:attribute name="constant" default="0" type="REAL-NUMBER"/> <xs:attribute name="gp" use="required" type="INT-NUMBER"/> <xs:attribute name="gq" use="required" type="INT-NUMBER"/> </xs:complexType> </xs:element> <xs:element name="ResidualSquareCoefficients"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Residuals" minOccurs="0"/> <xs:element ref="MACoefficients" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="VarianceCoefficients"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="PastVariances" minOccurs="0"/> <xs:element ref="MACoefficients" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="PastVariances"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> GARCH Example and Scoring<PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="(c) DMG 2016" description="Time Series Example"/> <DataDictionary numberOfFields="1"> <DataField dataType="double" displayName="Time Series" name="ts" optype="continuous"/> </DataDictionary> <TimeSeriesModel algorithmName="GARCH" bestFit="GARCH" functionName="timeSeries"> <MiningSchema> <MiningField name="ts" usageType="target"/> </MiningSchema> <TimeSeries usage="original" startTime="97" endTime="100" interpolationMethod="none"> <TimeValue index="97" value="1.1"/> <TimeValue index="98" value="1.2"/> <TimeValue index="99" value="1.3"/> <TimeValue index="100" value="1.4"/> </TimeSeries> <GARCH> <ARMAPart p="4" q="2" constant="-3.4"> <AR> <Array n="4" type="real">1.5 1.4 1.3 1.2</Array> </AR> <MA> <MACoefficients> <Array n="2" type="real">1.1 1.2</Array> </MACoefficients> <Residuals> <Array n="2" type="real">1.1 1.2</Array> </Residuals> </MA> </ARMAPart> <GARCHPart constant="0.12" gp="2" gq="3"> <ResidualSquareCoefficients> <Residuals> <Array n="3" type="real">1.1 1.2 1.3</Array> </Residuals> <MACoefficients> <Array n="3" type="real">1.3 1.2 1.1</Array> </MACoefficients> </ResidualSquareCoefficients> <VarianceCoefficients> <PastVariances> <Array n="2" type="real">1.1 1.2</Array> </PastVariances> <MACoefficients> <Array n="2" type="real">1.3 1.2</Array> </MACoefficients> </VarianceCoefficients> </GARCHPart> </GARCH> </TimeSeriesModel> </PMML>
The steps to score such a model one step ahead are:
In this example PMML the values of Yt are: 1.1, 1.2, 1.3, 1.4, μ=-3.4, p=4, q=2, gp=2, gq=3; values of φi are found in AR element of ARMAPart: 1.5, 1.4, 1.3, 1.2; values of θj are in MACoefficients of MA in ARMAPart: 1.1, 1.2; αi's are the MACoefficients in ResidualSquareCoefficients: 1.3, 1.2, 1.1; βi's are the MACoefficients in VarianceCoefficients: 1.3, 1.2. State Space modelsState Space models are multi-dimensional time series models which can be represented in the following form: Yt = GXt + WtXt = FXt-1 + Vt-1 The Observable vector Y is a vector representing an m-dimensional time series yt at time t. The State vector X is a size p vector describing how the series is advanced from time t-1 to time t. The Measurement matrix G is a m by p matrix converting the State Vector X at time t to an observable vector Yt. The Transition matrix F is a p by p matrix describing how the state of the series changes from the previous time step to the present time. The Intercept vector is a vector of intercepts which is used for multivariate models. In the case of univariate models and single intercept values the attribute "intercept" can be used. The Predicted State Covariance matrix is the estimate of the state uncertainty matrix. The Selected State Covariance matrix is utilized to calculate the estimate of the state uncertainity matrix. The Observation Variance matrix defines the observation variance of that time series. W and V represent noise and error, not important for prediction of the value of each time series itself. If dynamic regressors are present, the final predicted value is the sum of the value predicted by the dynamic regressors and the value predicted by the matrix multiplication method. Note that general ARIMA models can be expressed as state space models, but not all state space models are ARIMA models. <xs:element name="StateSpaceModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="StateVector" minOccurs="0"/> <xs:element ref="TransitionMatrix" minOccurs="0"/> <xs:element ref="MeasurementMatrix" minOccurs="0"/> <xs:element ref="InterceptVector" minOccurs="0"/> <xs:element ref="PredictedStateCovarianceMatrix" minOccurs="0"/> <xs:element ref="SelectedStateCovarianceMatrix" minOccurs="0"/> <xs:element ref="ObservationVarianceMatrix" minOccurs="0"/> <xs:element ref="PsiVector" minOccurs="0"/> <xs:element ref="DynamicRegressor" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="variance" type="REAL-NUMBER"/> <xs:attribute name="period" default="none"/> <xs:attribute name="intercept" type="REAL-NUMBER" default="0"/> </xs:complexType> </xs:element> <xs:element name="StateVector"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="TransitionMatrix"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Matrix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="MeasurementMatrix"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Matrix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="InterceptVector"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> <xs:attribute name="type" default="state"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="state"/> <xs:enumeration value="observation"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> <xs:element name="PredictedStateCovarianceMatrix"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Matrix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="SelectedStateCovarianceMatrix"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Matrix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="ObservationVarianceMatrix"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Matrix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="PsiVector"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="REAL-ARRAY"/> </xs:sequence> <xs:attribute name="targetField" type="xs:string"/> <xs:attribute name="variance" type="xs:string"/> </xs:complexType> </xs:element>
The vector "PsiVector" is a representation of an ARMA model as a MA model of infinite order.
The elements of this vector, ψ, can be used to forecast the variance of the next step prediction;
in most cases, a very good approximation of the
variance of the predicted value h time steps in the future is: Example for a state space model with two target seriesThe result of fitting a state space model to a time series is the three parameters; the state vector Xt,
the transition matrix F and the Measurement matrix G. For the purposes of prediction, the errors are ignored.
Using the equations defining s state space model, the prediction of the observable vector Y is: G =
F =
X = [-0.53, -2.41, 0.78, 88.74]T Next suppose the time series depends on 1 regressor variable z, given by Y1(z) = zt + zt-1 Y2(z) = zt In other words, the final prediction for the time series at time t+1 is: Yt+1 = G (FXt) + Z where Z1 = zt+1 + zt Z2 = zt+1 The PMML representation is given as: <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4.1"> <Header copyright="DMG.org"/> <DataDictionary> <DataField name="Y1" optype="continuous" dataType="double"/> <DataField name="Y2" optype="continuous" dataType="double"/> <DataField name="date" optype="continuous" dataType="dateDaysSince[1970]" displayName="TS-VALUE"/> <DataField name="z" optype="continuous" dataType="double" displayName="ExternalRegressor"/> </DataDictionary> <TimeSeriesModel functionName="timeSeries" bestFit="StateSpaceModel"> <MiningSchema> <MiningField name="date" usageType="order"/> <MiningField name="z" usageType="active"/> <MiningField name="Y1" usageType="target"/> <MiningField name="Y2" usageType="target"/> </MiningSchema> <Output> <OutputField name="orders" feature="predictedValue" targetField="Y1" dataType="double"/> <OutputField name="profit" feature="predictedValue" targetField="Y2" dataType="double"/> </Output> <TimeSeries usage="logical" startTime="1" endTime="6" field="Y1"> <TimeAnchor type="dateDaysSince[1970]" offset="6940" stepsize="1" displayName="Month"> <TimeCycle length="12" type="includeFromTo" displayName="Year"> <Array type="int" n="2">0 11</Array> </TimeCycle> </TimeAnchor> <TimeValue index="1" value="11357.92"/> <TimeValue index="2" value="10605.95"/> <TimeValue index="3" value="16998.57"/> <TimeValue index="4" value="6563.75"/> <TimeValue index="5" value="6607.69"/> <TimeValue index="6" value="9839.0"/> </TimeSeries> <TimeSeries usage="logical" startTime="1" endTime="6" field="Y2"> <TimeAnchor type="dateDaysSince[1970]" offset="6940" stepsize="1" displayName="Month"> <TimeCycle length="12" type="includeFromTo" displayName="Year"> <Array type="int" n="2">0 11</Array> </TimeCycle> </TimeAnchor> <TimeValue index="1" value="11357.92"/> <TimeValue index="2" value="10605.95"/> <TimeValue index="3" value="16998.57"/> <TimeValue index="4" value="6563.75"/> <TimeValue index="5" value="6607.69"/> <TimeValue index="6" value="9839.0"/> </TimeSeries> <StateSpaceModel> <StateVector> <Array type="real">-0.52 -2.41 0.78 88.74</Array> </StateVector> <TransitionMatrix> <Matrix nbRows="4" nbCols="4"> <Array type="real">0.052 1 0 0</Array> <Array type="real">0.119 0 1 0</Array> <Array type="real">0.373 0 0 0</Array> <Array type="real">1.000 0 0 1</Array> </Matrix> </TransitionMatrix> <MeasurementMatrix> <Matrix nbRows="1" nbCols="4"> <Array type="real">1 0 0 1</Array> <Array type="real">0 0 1 0</Array> </Matrix> </MeasurementMatrix> <PsiVector targetField="Y1" variance="10"> <Array type="real">-0.402 0.098 0.330</Array> </PsiVector> <DynamicRegressor field="z" targetField="Y1" delay="0" futureValuesMethod="constant"> <Numerator> <NonseasonalFactor> <Array type="real">1 -1</Array> </NonseasonalFactor> </Numerator> <Denominator> <NonseasonalFactor> <Array type="real">1</Array> </NonseasonalFactor> </Denominator> <RegressorValues> <TimeSeries> <TimeValue index="6" value="9.1"/> </TimeSeries> </RegressorValues> </DynamicRegressor> <DynamicRegressor field="z" targetField="Y2" delay="0" futureValuesMethod="constant"> <Numerator> <NonseasonalFactor> <Array type="real">1</Array> </NonseasonalFactor> </Numerator> <Denominator> <NonseasonalFactor> <Array type="real">1</Array> </NonseasonalFactor> </Denominator> <RegressorValues> <TimeSeries> <TimeValue index="6" value="9.1"/> </TimeSeries> </RegressorValues> </DynamicRegressor> </StateSpaceModel> </TimeSeriesModel> </PMML> Y1t = 85.77 Y2t = -0.19 The coefficients of the regressors are stored in the Numerator and Denominator element. Substituting those, we get the expected equation for the regressor prediction: Y1(z) = z7 + z6 Y2(z) = z7 The regressor value at t=6 is stored in the RegressorValue element z6 = 9.1 The regressor value at t=7 is unknown, but the futureValuesMethod="constant" attribute in the DynamicRegressor element indicates that we should take it to be same as that at t=6: z7 = 9.1 We therefore get the final prediction: orders = 85.77 + 1*9.1 + 1*9.1 = 103.97 profit = -0.19 + 1*9.1 = 8.91 The variance of the series Y1at time index 6 is given as 10 and the first element of the PsiVector is given; the predicted variance of the series prediction at time index 7 is therefore: 10*(1 + (-0.402)2) = 11.6 Since the variance of the series Y2 is not given, we cannot calculate the variance of the predicted value of Y2 ARIMA, SARIMAX and VARMAX Models in State Space formVARMAX are Vector AutoRegressive Moving-Average models with optional eXogenous regressors. SARIMAX are Seasonal ARIMA models with optional eXogenous regressors. To express ARIMA, SARIMAX and VARMAX models in state space form, we add intercept terms to the state
and observation equations. For SARIMAX and multivariate VARMAX models, μo is 0.
For ARIMA models, μs is 0. These intercepts are stored in the InterceptVector element, and the attribute type (state or observation) distinguishes between the intercepts. To calculate prediction intervals exactly, we use the following: Here, P is the predicted state covariance matrix which is updated using the transition matrix F and the selected state covariance matrix RQRT. The conditional mean square error matrix J is calculated using the observation variance matrix H, measurement matrix G, and P. The variance of the point forecast h steps in the future is calculated using the estimate of the variance of the residuals σ̂2 and J. For a multivariate model, J is a matrix, and its diagonal elements are used in this step. For a univariate model, J is of size 1x1. The confidence of prediction intervals are then calculated using the multiplier tdf,α/2 and the variance from the previous step. The multiplier tdf,α/2 is computed in the same way as in the Confidence of Prediction Calculation section above. Note: Notations like [{a11, a12}, {a21, a22}] represent matrix. Here the number of rows and columns are 2, the element at row 1 and column 1 is a11, row 1 and column 2 is a12, row 2 and column 1 is a21 and row 2 and column 2 is a22. Example for univariate ARIMA model with intercept<PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header description="ARIMA Time Series Model"/> <DataDictionary numberOfFields="2"> <DataField name="ts_value" optype="continuous" dataType="double"/> <DataField name="h" optype="continuous" dataType="double"/> </DataDictionary> <TimeSeriesModel modelName="ARIMA_model" functionName="timeSeries" bestFit="StateSpaceModel"> <MiningSchema> <MiningField name="ts_value" usageType="predicted" invalidValueTreatment="returnInvalid"/> <MiningField name="h" usageType="supplementary" invalidValueTreatment="returnInvalid"/> </MiningSchema> <Output> <OutputField name="Predicted_ts_value" optype="continuous" dataType="string" feature="predictedValue"/> <OutputField name="cpi_80_lower" optype="continuous" dataType="string" feature="confidenceIntervalLower" value="80"/> <OutputField name="cpi_80_upper" optype="continuous" dataType="string" feature="confidenceIntervalUpper" value="80"/> </Output> <TimeSeries usage="original" startTime="1" endTime="100"> <TimeValue index="1" value="88"/> <TimeValue index="2" value="84"/> <TimeValue index="99" value="222"/> <TimeValue index="100" value="220"/> </TimeSeries> <StateSpaceModel variance="9.97357634862506"> <StateVector> <Array type="real" n="2">74.9567433980672 -52.3002235159934</Array> </StateVector> <TransitionMatrix> <Matrix nbRows="2" nbCols="2"> <Array type="real">1.66128548696599 1</Array> <Array type="real">-0.679287914563663 0</Array> </Matrix> </TransitionMatrix> <MeasurementMatrix> <Matrix nbRows="1" nbCols="2"> <Array type="real">1 0</Array> </Matrix> </MeasurementMatrix> <InterceptVector type="observation"> <Array type="real" n="1">143.007281014872</Array> </InterceptVector> <PredictedStateCovarianceMatrix> <Matrix nbRows="2" nbCols="2"> <Array type="real">0 0</Array> <Array type="real">0 0</Array> </Matrix> </PredictedStateCovarianceMatrix> <SelectedStateCovarianceMatrix> <Matrix nbRows="2" nbCols="2"> <Array type="real">1 0.508921473603515</Array> <Array type="real">0.508921473603515 0.259001066294773</Array> </Matrix> </SelectedStateCovarianceMatrix> <ObservationVarianceMatrix> <Matrix nbRows="1" nbCols="1"> <Array type="real">0</Array> </Matrix> </ObservationVarianceMatrix> </StateSpaceModel> </TimeSeriesModel> </PMML> This example is an ARIMA(2,0,1) model expressed in state space form. For h=1 (one step ahead), the point forecast is calculated as follows. Since interceptVector contains type = "observation", the intercept μo is used when calculating Y.
Xt+1 = FXt Other types of models.The following elements are not used in this version of PMML and only serve
as placeholders for future versions. <xs:element name="SpectralAnalysis"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="SeasonalTrendDecomposition"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|