Overview

In addition to the Dependency Policy checks run during feed execution we execute aggregate checks that are defined as a feed’s Validation Policy.

Validation Policy checks compare numeric values against historic numeric values. For example, the total count of collected rows might be compared to the total rows in a previous run. We support both *custom policies *****and default policies. There are 4 components that define the behavior of all validations

  1. Dimensionality - are the checks at the row level, on individual columns etc.
  2. Type - what aggregate value is being checked, i.e. mean or null count
  3. Validation Rule - The way in which the aggregations of type are being compared, i.e. lowerRatio would check the ratio
  4. Seasonality - What baseline are values compared against? I.e. the previous run, a week ago, a month etc.

**Custom Policies ****are defined per feed and can make any arbitrary comparisons between the most recently collected data and any previously collected data.

**Default Policies ****are also defined per feed, however they must use an existing Validation Rule. This makes them much easier to configure quickly. In the absence of user defined rules default rules are implemented based on either data profiling (after establishing a baseline for the data shape) or business context.

Dimensionality

Dimensionality is currently one of three values:

  1. By Column - Compares values within a given column. This is the most common type.
  2. By Row - Compares across an entire row. Used primarily for duplication checks.
  3. By Table - Compares across an entire run. This includes total bytes & total rows.

Type

Possible types currently include: