weka.classifiers.timeseries
Class TSLagMaker

java.lang.Object
  extended by weka.classifiers.timeseries.TSLagMaker
All Implemented Interfaces:
java.io.Serializable

public class TSLagMaker
extends java.lang.Object
implements java.io.Serializable

A class for creating lagged versions of target variable(s) for use in time series forecasting. Uses the TimeseriesTranslate filter. Has options for creating averages of consecutive lagged variables (which can be useful for long lagged variables). Some polynomials of time are also created (if there is a time stamp), such as time^2 and time^3. Also creates cross products between time and the lagged and averaged lagged variables. If there is no date time stamp in the data then the user has the option of having an artificial time stamp created. Time stamps, real or otherwise, are used for modeling trends rather than using a differencing-based approach. Also has routines for dealing with a date timestamp - i.e. it can detect a monthly time period (because months are different lengths) and maps date time stamps to equal spaced time intervals. For example, in general, a date time stamp is remapped by subtracting the first observed value and adding this value divided by the constant delta (difference between consecutive steps) to the result. In the case of a detected monthly time period, the remapping involves subtracting the base year and then adding to this the number of the month within the current year plus twelve times the number of intervening years since the base year. Also has routines for adding new attributes derived from a date time stamp to the data - e.g. AM indicator, day of the week, month, quarter etc. In the case where there is no real data time stamp, the user may specify a nominal periodic variable (if one exists in the data). For example, month might be coded as a nominal value. In this case it can be specified as the primary periodic variable. The point is, that in all these cases (nominal periodic and date-derived periodics), we are able to determine what the value of these variables will be in future instances (as computed from the last known historic instance).

Version:
$Revision: $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
Serialized Form

Nested Class Summary
static class TSLagMaker.Periodicity
           
 
Constructor Summary
TSLagMaker()
           
 
Method Summary
 double advanceSuppliedTimeValue(double valueToAdvance)
          Utility method to advance a supplied time value by one unit according to the periodicity set for this LagMaker.
 double advanceSuppliedTimeValue(double valueToAdvance, TSLagMaker.Periodicity dateBasedPeriodicity)
          Utility method to advance a supplied time value by one unit.
 void clearLagHistories()
          Clears any history accumulated in the lag creating filters.
 Instances createTimeLagCrossProducts(Instances insts)
           
 double decrementSuppliedTimeValue(double valueToDecrement)
           
 double decrementSuppliedTimeValue(double valueToDecrement, TSLagMaker.Periodicity dateBasedPeriodicity)
           
static TSLagMaker.Periodicity determinePeriodicity(Instances insts, java.lang.String timeName)
           
 boolean getAddAMIndicator()
          Return true if an AM indicator attribute is to be created.
 boolean getAddDayOfWeek()
          Return true if a day of the week attribute is to be created.
 boolean getAddMonthOfYear()
          Returns true if a month of the year attribute is to be created.
 boolean getAddQuarterOfYear()
          Returns true if a quarter attribute is to be created.
 boolean getAddWeekendIndicator()
          Returns true if a weekend indicator attribute is to be created.
 boolean getAdjustForTrends()
          Returns true if we are adjusting for trends via a real or artificial time stamp.
 boolean getAdjustForVariance()
          Returns true if we are adjusting for variance by taking the log of the target(s).
 double getArtificialTimeStartValue()
          Returns the initial value of the artificial time stamp.
 boolean getAverageConsecutiveLongLags()
          Returns true if consecutive long lagged variables are to be averaged.
 int getAverageLagsAfter()
          Return the point after which long lagged variables will be averaged.
 double getDeltaTime()
          Return the difference between time values.
 java.util.List<java.lang.String> getFieldsToLag()
          Get the names of the fields to create lagged variables for.
 java.lang.String getFineTuneLags()
           
 java.lang.String getLagRange()
          Get the ranges used to fine tune lag selection
 int getMaxLag()
          Get the maximum lag to create.
 int getMinLag()
          Get the minimum lag to create.
 int getNumConsecutiveLongLagsToAverage()
          Get the number of consecutive long lagged variables to average.
 java.lang.String[] getOptions()
          Gets the current settings of the LagMaker.
 java.lang.String getPrimaryPeriodicFieldName()
          The name of the primary periodic attribute or null if one hasn't been specified.
 java.lang.String getTimeStampField()
          Get the name of the time stamp field.
 Instances getTransformedData(Instances insts)
          Creates a transformed data set based on the user's settings
 void incrementArtificialTimeValue(int increment)
          Increment the artificial time value with the supplied incrememt value.
 boolean isUsingAnArtificialTimeIndex()
          Returns true if an artificial time index is in use.
 java.util.Enumeration<Option> listOptions()
          Returns an enumeration describing the available options.
 Instance processInstance(Instance source, boolean incrementTime, boolean setAnyPeriodic)
           
 Instance processInstance(Instance source, boolean incrementTime, boolean setAnyPeriodic, boolean temporary)
          Process an instance in the original format and produce a transformed instance as output.
 Instance processInstancePreview(Instance source, boolean incrementTime, boolean setAnyPeriodic)
           
 Instances replaceMissing(Instances toReplace, boolean dateOnly, java.util.List<java.lang.Integer>... missingReport)
          Replace missing target values by interpolation.
 void reset()
          Reset the lag maker.
 void setAddAMIndicator(boolean am)
          Set whether to create an AM indicator attribute.
 void setAddDayOfWeek(boolean d)
          Set whether to create a day of the week attribute.
 void setAddMonthOfYear(boolean m)
          Set whether to create a month of the year attribute.
 void setAddQuarterOfYear(boolean q)
          Set whether to create a quarter attribute.
 void setAddWeekendIndicator(boolean w)
          Set whether to create a weekend indicator attribute.
 void setAdjustForTrends(boolean a)
          Set whether to adjust for trends or not.
 void setAdjustForVariance(boolean v)
          Set whether to adjust for variance in the data by taking the log of the target(s).
 void setArtificialTimeStartValue(double value)
          Set the starting value for the artificial time stamp.
 void setAverageConsecutiveLongLags(boolean avg)
          Sets whether to average consecutive long lagged variables.
 void setAverageLagsAfter(int a)
          Set at which point consecutive long lagged variables are to be averaged (default = 2, i.e.
 void setFieldsToLag(java.util.List<java.lang.String> names)
          Set the names of the fields to create lagged variables for
 void setFineTuneLags(java.lang.String ranges)
           
 void setLagRange(java.lang.String lagRange)
          Set ranges to fine tune lag selection.
 void setMaxLag(int max)
          Set the maximum lag to create (default = 12, i.e.
 void setMinLag(int min)
          Set the minimum lag to create (default = 1, i.e.
 void setNumConsecutiveLongLagsToAverage(int c)
          Set the number of long lagged variables to average for each averaged variable created (default = 2, e.g.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPrimaryPeriodicFieldName(java.lang.String p)
          Set the name of a periodic attribute in the data.
 void setTimeStampField(java.lang.String name)
          Set the name of the time stamp field in the data
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TSLagMaker

public TSLagMaker()
Method Detail

reset

public void reset()
Reset the lag maker.


listOptions

public java.util.Enumeration<Option> listOptions()
Returns an enumeration describing the available options.

Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the LagMaker.

Returns:
an array of strings suitable for passing to setOptions

setFieldsToLag

public void setFieldsToLag(java.util.List<java.lang.String> names)
                    throws java.lang.Exception
Set the names of the fields to create lagged variables for

Parameters:
names - a List of field names for which to create lagged variables
Throws:
java.lang.Exception - if a problem occurs

getFieldsToLag

public java.util.List<java.lang.String> getFieldsToLag()
Get the names of the fields to create lagged variables for.

Returns:
a List of field names for which lagged variables will be created.

setTimeStampField

public void setTimeStampField(java.lang.String name)
Set the name of the time stamp field in the data

Parameters:
name - the name of the time stamp field

getTimeStampField

public java.lang.String getTimeStampField()
Get the name of the time stamp field.

Returns:
the name of the time stamp field or null if one hasn't been specified.

setAdjustForTrends

public void setAdjustForTrends(boolean a)
Set whether to adjust for trends or not. If there is no time stamp field specified, and this is set to true, then an artificial time stamp will be created.

Parameters:
a - true if we are to adjust for trends via a real or artificial time stamp

getAdjustForTrends

public boolean getAdjustForTrends()
Returns true if we are adjusting for trends via a real or artificial time stamp.

Returns:
true if we are adjusting for trends via a real or artificial time stamp in the data.

setAdjustForVariance

public void setAdjustForVariance(boolean v)
Set whether to adjust for variance in the data by taking the log of the target(s).

Parameters:
v - true to adjust for variance by taking the log of the target(s).

getAdjustForVariance

public boolean getAdjustForVariance()
Returns true if we are adjusting for variance by taking the log of the target(s).

Returns:
true if we are adjusting for variance.

setFineTuneLags

public void setFineTuneLags(java.lang.String ranges)

getFineTuneLags

public java.lang.String getFineTuneLags()

setMinLag

public void setMinLag(int min)
Set the minimum lag to create (default = 1, i.e. t-1).

Parameters:
min - the minimum lag to create

getMinLag

public int getMinLag()
Get the minimum lag to create.

Returns:
the minimum lag to create.

setMaxLag

public void setMaxLag(int max)
Set the maximum lag to create (default = 12, i.e. t-12).

Parameters:
max - the maximum lag to create.

getMaxLag

public int getMaxLag()
Get the maximum lag to create.

Returns:
the maximum lag to create.

setLagRange

public void setLagRange(java.lang.String lagRange)
Set ranges to fine tune lag selection.

Parameters:
lagRange - a set of ranges (e.g. 2,3,4,7-9).

getLagRange

public java.lang.String getLagRange()
Get the ranges used to fine tune lag selection

Returns:
the ranges (if any) used to fine tune lag selection

setAverageConsecutiveLongLags

public void setAverageConsecutiveLongLags(boolean avg)
Sets whether to average consecutive long lagged variables. Setting this to true creates new variables that are averages of long lags and the original lagged variables involved are removed.

Parameters:
avg - true if consecutive long lags are to be averaged.

getAverageConsecutiveLongLags

public boolean getAverageConsecutiveLongLags()
Returns true if consecutive long lagged variables are to be averaged.

Returns:
true if consecutive long lagged variables are to be averaged.

setAverageLagsAfter

public void setAverageLagsAfter(int a)
Set at which point consecutive long lagged variables are to be averaged (default = 2, i.e. start replacing lagged variables after t-2 with averages).

Parameters:
a - the point at which to start averaging consecutive long lagged variables.

getAverageLagsAfter

public int getAverageLagsAfter()
Return the point after which long lagged variables will be averaged.

Returns:
the point after which long lagged variables will be averaged.

setNumConsecutiveLongLagsToAverage

public void setNumConsecutiveLongLagsToAverage(int c)
Set the number of long lagged variables to average for each averaged variable created (default = 2, e.g. a set average after value of 2 and a num consecutive to average = 2 will average t-3 and t-4 into a new variable, t-5 and t-6 into a new variable ect.

Parameters:
c - the number of consecutive long lagged variables to average.

getNumConsecutiveLongLagsToAverage

public int getNumConsecutiveLongLagsToAverage()
Get the number of consecutive long lagged variables to average.

Returns:
the number of long lagged variables to average.

setPrimaryPeriodicFieldName

public void setPrimaryPeriodicFieldName(java.lang.String p)
Set the name of a periodic attribute in the data. This attribute has to be nominal and cyclic so that it is possible to know what the value will be given the current one.

Parameters:
p - the name of the primary periodic attribute (if any) in the data.

getPrimaryPeriodicFieldName

public java.lang.String getPrimaryPeriodicFieldName()
The name of the primary periodic attribute or null if one hasn't been specified.

Returns:
the name of the primary periodic attribute or null if one hasn't been specified.

setAddAMIndicator

public void setAddAMIndicator(boolean am)
Set whether to create an AM indicator attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
am - true if an AM indicator attribute is to be created.

getAddAMIndicator

public boolean getAddAMIndicator()
Return true if an AM indicator attribute is to be created.

Returns:
true if an AM indiciator attribute is to be created.

setAddDayOfWeek

public void setAddDayOfWeek(boolean d)
Set whether to create a day of the week attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
d - true if a day of the week attribute is to be created.

getAddDayOfWeek

public boolean getAddDayOfWeek()
Return true if a day of the week attribute is to be created.

Returns:
true if a day of the week attribute is to be created.

setAddWeekendIndicator

public void setAddWeekendIndicator(boolean w)
Set whether to create a weekend indicator attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
w - true if a weekend indicator attribute is to be created.

getAddWeekendIndicator

public boolean getAddWeekendIndicator()
Returns true if a weekend indicator attribute is to be created.

Returns:
true if a weekend indicator attribute is to be created.

setAddMonthOfYear

public void setAddMonthOfYear(boolean m)
Set whether to create a month of the year attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
m - true if a month of the year attribute is to be created.

getAddMonthOfYear

public boolean getAddMonthOfYear()
Returns true if a month of the year attribute is to be created.

Returns:
true if a month of the year attribute is to be created.

setAddQuarterOfYear

public void setAddQuarterOfYear(boolean q)
Set whether to create a quarter attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
q - true if a quarter attribute is to be added.

getAddQuarterOfYear

public boolean getAddQuarterOfYear()
Returns true if a quarter attribute is to be created.

Returns:
true if a quarter attribute is to be created.

isUsingAnArtificialTimeIndex

public boolean isUsingAnArtificialTimeIndex()
Returns true if an artificial time index is in use.

Returns:
true if an artificial time index is in use.

setArtificialTimeStartValue

public void setArtificialTimeStartValue(double value)
                                 throws java.lang.Exception
Set the starting value for the artificial time stamp.

Parameters:
value - the value to initialize the artificial time stamp with.
Throws:
java.lang.Exception - if an artificial time stamp is not being used.

getArtificialTimeStartValue

public double getArtificialTimeStartValue()
                                   throws java.lang.Exception
Returns the initial value of the artificial time stamp.

Returns:
the initial value of the artificial time stamp.
Throws:
java.lang.Exception - if an artificial time stamp is not being used.

incrementArtificialTimeValue

public void incrementArtificialTimeValue(int increment)
Increment the artificial time value with the supplied incrememt value.

Parameters:
increment - the value to increment by.

getDeltaTime

public double getDeltaTime()
Return the difference between time values.

Returns:
the difference between time values.

createTimeLagCrossProducts

public Instances createTimeLagCrossProducts(Instances insts)
                                     throws java.lang.Exception
Throws:
java.lang.Exception

determinePeriodicity

public static TSLagMaker.Periodicity determinePeriodicity(Instances insts,
                                                          java.lang.String timeName)

replaceMissing

public Instances replaceMissing(Instances toReplace,
                                boolean dateOnly,
                                java.util.List<java.lang.Integer>... missingReport)
Replace missing target values by interpolation. Also replaces missing date values (if a date timestamp has been specified and if possible).

Parameters:
toReplace - the instances to replace missing target values and time stamp values
dateOnly - if true, only replace missing date values and not missing target values (useful for hold-out test sets)
missingReport - a varargs parameter that, if provided, is expected to be two lists of Integer. The first list will be populated with the instance numbers (duplicates are possible) of instances that have missing targets replaced. The second list will be populated with the instance numbers of instances that have missing time stamp values replaced.
Returns:
the instances with missing targets and (possibly) missing time stamp values replaced.

getTransformedData

public Instances getTransformedData(Instances insts)
                             throws java.lang.Exception
Creates a transformed data set based on the user's settings

Parameters:
insts - the instances to transform
Returns:
a transformed data set
Throws:
java.lang.Exception - if a problem occurs during the creation of lagged and auxiliary attributes.

processInstance

public Instance processInstance(Instance source,
                                boolean incrementTime,
                                boolean setAnyPeriodic)
                         throws java.lang.Exception
Throws:
java.lang.Exception

processInstancePreview

public Instance processInstancePreview(Instance source,
                                       boolean incrementTime,
                                       boolean setAnyPeriodic)
                                throws java.lang.Exception
Throws:
java.lang.Exception

processInstance

public Instance processInstance(Instance source,
                                boolean incrementTime,
                                boolean setAnyPeriodic,
                                boolean temporary)
                         throws java.lang.Exception
Process an instance in the original format and produce a transformed instance as output. Assumes that the lag maker has been configured an initialized with a call to getTransformedDataset()

Parameters:
source - an instance in original format
incrementTime - true if any time stamp value should be incremented based on the time stamp value from the last instance seen and set in the outputted instance
setAnyPeriodic - true if any user-specified periodic value should be set in the transformed instance based on the value from the last instance seen.
Returns:
a transformed instance
Throws:
java.lang.Exception - if something goes wrong.

clearLagHistories

public void clearLagHistories()
                       throws java.lang.Exception
Clears any history accumulated in the lag creating filters.

Throws:
java.lang.Exception - if something goes wrong.

advanceSuppliedTimeValue

public double advanceSuppliedTimeValue(double valueToAdvance)
Utility method to advance a supplied time value by one unit according to the periodicity set for this LagMaker.

Parameters:
valueToAdvance - the time value to advance
Returns:
the advanced value or the original value if this lag maker is not adjusting for trends

advanceSuppliedTimeValue

public double advanceSuppliedTimeValue(double valueToAdvance,
                                       TSLagMaker.Periodicity dateBasedPeriodicity)
Utility method to advance a supplied time value by one unit.

Parameters:
valueToAdvance - the time value to advance
dateBasedPeriodicity - the periodicity to use for data arithmetic
Returns:
the advanced value or the original value if this lag maker is not adjusting for trends.

decrementSuppliedTimeValue

public double decrementSuppliedTimeValue(double valueToDecrement)

decrementSuppliedTimeValue

public double decrementSuppliedTimeValue(double valueToDecrement,
                                         TSLagMaker.Periodicity dateBasedPeriodicity)