Outlier Analysis Worker

Processes outlier detection jobs to identify statistical outliers in spatial data.

Overview

The outlier analysis worker identifies features with values that are statistically unusual using z-score or MAD (Median Absolute Deviation) methods.

Job Type

outlier_analysis

Input Parameters

{
  "dataset_id": 123,
  "value_field": "income",
  "method": "zscore",
  "threshold": 2.0
}

Parameters

  • dataset_id (required): Source dataset ID

  • value_field (required): Numeric field to analyze

  • method (optional): “zscore” or “mad” (default: “zscore”)

  • threshold (optional): Z-score threshold or MAD multiplier (default: 2.0)

Output

Creates a new dataset with outlier analysis results:

  • Original features marked as outliers

  • Outlier score (z-score or MAD score)

  • Outlier flag

  • Original attributes preserved

Methods

Z-Score Method

Calculates standardized z-scores:

  • Mean and standard deviation calculated

  • Z-score = (value - mean) / standard_deviation

  • Features with |z-score| > threshold are outliers

MAD Method

Uses Median Absolute Deviation:

  • Median calculated

  • MAD = median(|value - median|)

  • Modified z-score = 0.6745 * (value - median) / MAD

  • Features with |modified z-score| > threshold are outliers

Example

# Enqueue an outlier analysis job via API
curl -X POST "https://example.com/api/analysis/outlier_run.php" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_id": 123,
    "value_field": "income",
    "method": "zscore",
    "threshold": 2.0
  }'

Background Jobs

This analysis runs as a background job. The worker:

  1. Fetches queued outlier_analysis jobs

  2. Validates input parameters

  3. Calculates statistics (mean/std or median/MAD)

  4. Identifies outliers

  5. Creates output dataset

  6. Marks job as completed

Performance Considerations

  • Processing time depends on dataset size

  • Z-score method requires two passes (mean/std, then scoring)

  • MAD method is more robust to outliers in calculation

  • Consider filtering null values before analysis