Forecast
Siddhi allows you to forecast future events using linear regression on real time data streams. The forecast
function uses a dependent event stream (Y), an independent event stream (X) and a user-specified next X value, and returns the forecast Y value based on the regression equation of the historical data.
The two implementations of the forecast
function can be distinguished as follows.
- forecast: This allows you to specify a batch size (optional) that defines the number of events to be considered for the regression calculation when forecasting the Y value.
- lengthTimeForecast: This allows you to restrict the number of events considered for the regression calculation when forecasting the Y value based on a specified time window and/or batch size.
Input parameters for the forecast function
The following table describes the input parameters available for the forecast
function.
Parameter | Description | Required/Optional | Default Value |
---|---|---|---|
Calculation Interval | The frequency with which the regression calculation should be carried out. | Optional | 1 (i.e., for every event) |
Batch Size | The maximum number of events that should be used for a regression calculation. | Optional | 1,000,000,000 |
Confidence Interval | The confidence interval to be used for a regression calculation. | Optional | 0.95 |
Next X Value | The value to be used to forecast the Y value. This can be a constant or an expression (e.g., x+5). | Required | Â |
Y Stream | The data stream of the dependent variable. | Required | Â |
X Stream | The data stream of the independent variable. | Required | Â |
Format: forecast(nextX, Y, X)
or forecast(calculation interval, batch size, confidence interval, nextX, Y, X)
Input parameters for the lengthTimeForecast function
The following table describes the input parameters available for the lengthTimeForecast
function.
Parameter | Description | Required/Optional | Default Value |
---|---|---|---|
Time Window | The maximum time duration that should be considered for a regression calculation. | Required | Â |
Batch Size | The maximum number of events that shoukd be used for a regression calculation. | Required | Â |
Next X Value | The value to be used to forecast the Y value. This can be a constant or an expression (e.g., x+5). | Required | Â |
Calculation Interval | The frequency with which the regression calculation should be carried out. | Optional | 1 (i.e., for every event) |
Confidence Interval | The confidence interval to be used for a regression calculation. | Optional | 0.95 |
Y Stream | The data stream of the dependent variable. | Required | Â |
X Stream | The data stream of the independent variable. | Required | Â |
Format:Â lengthTimeForecast(time window, batch size, nextX, Y, X)
or lengthTimeForecast(time window, batch size, nextX, calculation interval, confidence interval, Y, X)
Output parameters
The following table describes the output parameters.
The same output parameters are available for each implementation.
Parameter | Name | Description |
---|---|---|
Forecast Y |
| The forecast Y value based on next X and regression equation. |
Standard Error |
| The standard error of the regression equation. |
β coefficients |
| β coefficients of the simple linear regression. |
Input Stream Data | The name given in the input stream. | All the items sent in the input stream. |
Examples
The queries given in the examples below return the following wen executed.
- Y value based on the regression equation established using the Y stream and the X stream
- The standard error of the regression equation (ε)
- β coefficients
- All the items available in the input stream
Example 1
The following query submits an expression to be used as the next X value (X+2), a dependent input stream (Y,) and an independent input stream (X) that are used to perform linear regression between Y and X streams, and compute the forecast Y value based on the next X value specified by you.
from StockExchangeStream#timeseries:forecast(X+5, Y, X) select * insert into StockForecaster
Example 2
The following query submits a time window (2 seconds), a batch size (100 events), a constant to be used as the next X value (10), a dependent input stream (Y) and an independent input stream (X) that are used to perform linear regression between Y and X streams, and compute the forecast Y value based on the next X value specified by you.
from StockExchangeStream#timeseries:lengthTimeForecast(2 sec, 100, 10, Y, X) select * insert into StockForecaster