seeknal audit¶
Run data quality audits on cached node outputs from the last pipeline execution. Audits are defined in node YAML configurations and executed against the cached parquet files in the target directory.
Synopsis¶
Description¶
The audit command executes audit rules defined in your YAML node configurations
against the cached outputs from the last seeknal run execution. This allows you
to verify data quality without re-executing the entire pipeline.
Audits are defined using the audits: key in your node YAML files and can include
checks like row count thresholds, null value checks, uniqueness constraints, and
custom SQL assertions.
Options¶
| Option | Description |
|---|---|
NODE |
Specific node to audit (e.g., source.users). If omitted, audits all nodes |
--target-path |
Path to target directory (default: target) |
Examples¶
Audit all nodes¶
Audit a specific node¶
Audit a transform¶
Defining Audits in YAML¶
# seeknal/transforms/clean_data.yml
kind: transform
name: clean_data
description: Clean and validate user data
audits:
- type: row_count
min_count: 100
- type: not_null
columns: [user_id, email]
- type: unique
columns: [user_id]
Exit Codes¶
| Code | Description |
|---|---|
| 0 | All audits passed |
| 1 | One or more audits failed |
See Also¶
- seeknal validate-features - Validate feature group data
- seeknal validate - Validate configurations
- seeknal run - Execute pipeline