Regression
Siddhi enables users to perform linear regression on real time data streams. The regress function takes in a dependent event stream (Y), any number of independent event streams (X1, X2,...Xn) and returns all coefficients of the regression equationÂ
The two implementations of regression could be distinguished as follows
- regress: This allows you to specify the batch size (optional) that defines the number of events to be considered for the calculation of regression.
- lengthTimeRegress: This allows you to specify the time window and batch size (required). The number of events considered for the regression calculation can be restricted based on the time window and/or the batch size.
Input parameters for regress function
The following table describes the input parameters available for the regress
function.
Parameter | Description | Required/Optional | Default Value |
---|---|---|---|
Calculation Interval | The frequency with which the regression calculation should be carried out. | Optional | 1 (i.e., for every event) |
Batch Size | The maximum number of events to be used for a regression calculation. | Optional | 1,000,000,000 |
Confidence Interval | The confidence interval to be used for a regression calculation. | Optional | 0.95 |
Y Stream | The data stream of the dependent variable. | Required | Â |
X Stream(s) | The data stream(s) of the independent variable. | Required | Â |
Format: regress(Y, X1, X2,....,Xn)
or regress(calculation interval, batch size, confidence interval, Y, X1, X2,....,Xn)
Â
Input parameters for lengthTimeRegress function
The following table describes the input parameters available for the lengthTimeRegress
function.
Parameter | Description | Required/Optional | Default Value |
---|---|---|---|
Time Window | The maximum time duration to be considered for the regression calculation. | Required | Â |
Batch Size | The maximum number of events to be used for a regression calculation. | Required | Â |
Calculation Interval | The frequency with which the regression calculation should be carried out. | Optional | 1 (for every event) |
Confidence Interval | The confidence interval to be used for a regression calculation. | Optional | 0.95 |
Y Stream | The data stream of the dependent variable. | Required | Â |
X Stream(s) | The data stream(s) of the independent variable. | Required | Â |
Format: lengthTimeRegress(time window, batch size, Y, X1, X2,....,Xn)
or lengthTimeRegress(time window, batch size, calculation interval, confidence interval, Y, X1, X2,....,Xn)
.Â
Output parameters
The following table describes the output parameters.
The same output parameters are available for each implementation.
Parameter | Name | Description |
---|---|---|
Standard Error | stdError | The standard error of the regression equation. |
β coefficients | beta0 , beta1 , beta2 etc. | n+1 β coefficients where n is the number of x parameters. |
Input Stream Data | The name given in the input stream | All the attributes sent in the input stream. |
The regress
and lengthTimeRegress
functions nullify any β coefficients that fail the T-test based on the confidence interval. You can access any of the output parameters using its name (as given in the table above).
Examples
Example 1
The following query submits a calculation interval (every 10 events), a batch size (100,000 events), a confidence interval (0.95), a dependent input stream (Y) and 3 independent input streams (X1, X2, X3) that are used to perform linear regression between Y and all the X streams.
from StockExchangeStream#timeseries:regress(10, 100000, 0.95, Y, X1, X2, X3) select * insert into StockForecaster
When this query is executed, it returns the standard error of the regression equation (ε), 4 β coefficients (β0, β1, β2, β3) and all the items available in the input stream. These results can be used to build a relationship between Y and all the Xs (regression equation) as follows.
Example 2
The following query submits a time window (200 milliseconds), a batch size (10,000 events), a calculation interval (every 2 events), a confidence interval (0.95),  a dependent input stream (Y) and an independent input stream (X) that are used to perform linear regression between Y and all the X streams.
from StockExchangeStream#timeseries:lengthTimeRegress(200, 10000, 2, 0.95, Y, X) select * insert into StockForecaster
When this query is executed, it returns the standard error of the regression equation (ε), 2 β coefficients (β0, β1) and all the items available in the input stream.