Outlier
Siddhi enables users to identify outliers using linear regression on real time, data streams. The outlier function takes in a dependent event stream (Y), an independent event stream (X) and a user specified range for outliers, and returns whether the current event is an outlier, based on the regression equation that fits historical data.
Input Parameters
Parameter | Â Required/Optional | Description |
Calculation Interval | Optional | The frequency of regression calculation. Default value: 1 (i.e. at every event) |
Batch Size | Optional | The maximum number of events used for a regression calculation Default value: 1,000,000,000 events |
Confidence Interval | Optional | Confidence Interval to be used for regression calculation Default value: 0.95 |
Range | Required | Number of standard deviations from the regression equation |
Y Stream | Required | Data stream of the dependent variable |
X Stream | Required | Data stream of the independent variable |
Â
Output Parameters
Parameter | Name | Description |
Outlier | outlier | True if the event is an outlier, False if not |
Standard Error | stdError | Standard Error of the Regression Equation |
β coefficients | beta0, beta1 | β coefficients of the Regression Equation |
Input Stream Data | Name given in the input stream | All items sent in the input stream |
Examples
The following query submits the number of standard deviations to be used as a range (2), a dependent input stream (Y) and an independent input stream X, that will be used to perform linear regression between Y and X and output whether the current event is an outlier or not.
from StockExchangeStream#transform.timeseries:outlier(2, Y, X)
select *
insert into StockForecaster    Â
Â
When executed, the above query will return whether the current event is an outlier or not along with the standard error of the regression equation (ε), β coefficients and all the items available in the input stream.Â