Outlier Detection
Identify statistical outliers in numeric fields using z-score or MAD methods.
Overview
Outlier detection identifies features with values that are statistically unusual compared to the dataset distribution.
Methods
Z-Score Method
Uses mean and standard deviation:
Z-score = (value - mean) / standard_deviation
Features with |z-score| > threshold are outliers
Sensitive to outliers in calculation
MAD Method
Uses median and median absolute deviation:
Modified z-score = 0.6745 * (value - median) / MAD
Features with |modified z-score| > threshold are outliers
More robust to outliers in calculation
Inputs
Dataset: Any dataset with numeric field
Value Field: Numeric field to analyze
Method: “zscore” or “mad” (default: “zscore”)
Threshold: Z-score threshold or MAD multiplier (default: 2.0)
Outputs
New dataset containing:
Original features
Outlier Score: Z-score or MAD score
Is Outlier: Boolean flag
Original attributes
Example
{
"dataset_id": 123,
"value_field": "income",
"method": "zscore",
"threshold": 2.0
}
Background Jobs
This analysis runs as a background job. See Outlier Analysis Worker for details.
Use Cases
Data quality assessment
Anomaly detection
Error identification
Extreme value analysis
Notes
Null values are excluded from calculations
Threshold of 2.0 identifies ~5% of data as outliers (normal distribution)
MAD method recommended for skewed distributions
Consider spatial context when interpreting results