Common Configuration¶
Define reusable column mappings, SQL filters, and SQL snippets in a shared location so every transform in your project stays consistent.
Overview¶
As pipelines grow, the same column names, filter conditions, and JOIN patterns appear in many transforms. Common configuration centralises these values in seeknal/common/ so you can:
- Change a column name in one place and have it propagate everywhere.
- Share business rules (e.g., "active users only") across transforms.
- Keep SQL snippets (e.g., region enrichment JOINs) DRY.
Values are referenced in transform SQL with {{ dotted.key }} syntax and resolved before execution.
Directory Structure¶
seeknal/
common/
sources.yml # Reusable source column mappings
rules.yml # Reusable SQL filter expressions
transformations.yml # Reusable SQL snippets
sources/
transforms/
...
The seeknal/common/ directory is created automatically by seeknal init. Files inside it are not treated as pipeline nodes -- they are loaded separately and made available to all transforms.
File Formats¶
sources.yml¶
Maps logical names to source parameters (column names, identifiers, etc.):
sources:
- id: traffic
ref: source.traffic_daily
params:
dateCol: date_id
idCol: msisdn
revenueCol: total_revenue
- id: customers
ref: source.customer_master
params:
idCol: customer_id
segmentCol: segment_code
Each parameter becomes a flat key: {source_id}.{param_name}.
| Expression | Resolves to |
|---|---|
{{ traffic.dateCol }} |
date_id |
{{ traffic.idCol }} |
msisdn |
{{ customers.segmentCol }} |
segment_code |
The source ref is also available as {{ traffic.ref }} (resolves to source.traffic_daily).
rules.yml¶
Defines reusable SQL filter expressions:
rules:
- id: activeOnly
value: "status = 'active'"
- id: callExpression
value: "service_type = 'Voice'"
- id: recentData
value: "date_id >= '2025-01-01'"
Each rule becomes rules.{rule_id}.
| Expression | Resolves to |
|---|---|
{{ rules.activeOnly }} |
status = 'active' |
{{ rules.callExpression }} |
service_type = 'Voice' |
transformations.yml¶
Defines reusable SQL snippets (JOINs, sub-selects, CTEs, etc.):
transformations:
- id: enrichWithRegion
sql: "LEFT JOIN ref('source.regions') r ON t.region_id = r.id"
- id: excludeTestAccounts
sql: "WHERE account_type != 'test'"
Each transformation becomes transforms.{transform_id}.
| Expression | Resolves to |
|---|---|
{{ transforms.enrichWithRegion }} |
LEFT JOIN ref('source.regions') r ON t.region_id = r.id |
{{ transforms.excludeTestAccounts }} |
WHERE account_type != 'test' |
Usage in Transforms¶
Reference common config values in any transform SQL using {{ }} syntax:
name: daily_voice_revenue
kind: transform
inputs:
- ref: source.traffic_daily
transform: |
SELECT
{{ traffic.idCol }},
{{ traffic.dateCol }},
SUM({{ traffic.revenueCol }}) AS revenue
FROM input_0
WHERE {{ rules.callExpression }}
AND {{ rules.recentData }}
GROUP BY {{ traffic.idCol }}, {{ traffic.dateCol }}
After resolution this becomes:
SELECT
msisdn,
date_id,
SUM(total_revenue) AS revenue
FROM input_0
WHERE service_type = 'Voice'
AND date_id >= '2025-01-01'
GROUP BY msisdn, date_id
Before / After Comparison¶
Without common config -- column names and filters are duplicated:
# transforms/voice_revenue.yml
transform: |
SELECT msisdn, date_id, SUM(total_revenue) AS revenue
FROM input_0
WHERE service_type = 'Voice'
GROUP BY msisdn, date_id
# transforms/data_usage.yml
transform: |
SELECT msisdn, date_id, SUM(data_bytes) AS usage
FROM input_0
WHERE service_type = 'Data'
GROUP BY msisdn, date_id
If the column msisdn is renamed to subscriber_id upstream, you must update every transform.
With common config -- single source of truth:
# common/sources.yml
sources:
- id: traffic
ref: source.traffic_daily
params:
dateCol: date_id
idCol: msisdn
# transforms/voice_revenue.yml
transform: |
SELECT {{ traffic.idCol }}, {{ traffic.dateCol }}, SUM(total_revenue) AS revenue
FROM input_0
WHERE service_type = 'Voice'
GROUP BY {{ traffic.idCol }}, {{ traffic.dateCol }}
# transforms/data_usage.yml
transform: |
SELECT {{ traffic.idCol }}, {{ traffic.dateCol }}, SUM(data_bytes) AS usage
FROM input_0
WHERE service_type = 'Data'
GROUP BY {{ traffic.idCol }}, {{ traffic.dateCol }}
Now renaming msisdn to subscriber_id requires changing only common/sources.yml.
Resolution Priority¶
When Seeknal encounters a {{ expression }} it resolves using this priority cascade (highest wins):
| Priority | Source | Example |
|---|---|---|
| 1 | CLI overrides | --param date=2025-01-01 |
| 2 | Context values | {{ run_id }}, {{ run_date }} |
| 3 | Built-in functions | {{ today }}, {{ month_start(-1) }} |
| 4 | Environment variables | {{ env:API_KEY }} |
| 5 | Common config | {{ traffic.dateCol }}, {{ rules.activeOnly }} |
| 6 | Pass-through | Unresolved expression left as-is |
Common config expressions are identified by containing a . (dot). Simple expressions like {{ today }} or {{ run_id }} are never looked up in common config.
Error Handling¶
If common config is loaded and a dotted expression does not match any key, Seeknal raises an error with suggestions:
ValueError: Unknown common config key '{{ traffic.datCol }}'.
Available keys: traffic.dateCol, traffic.idCol, traffic.revenueCol.
Did you mean: traffic.dateCol?
If no seeknal/common/ directory exists, dotted expressions pass through unchanged for backward compatibility.
Interaction with ref() Syntax¶
Common config {{ }} expressions are resolved before ref() syntax. This means you can use common config inside ref() arguments or alongside them:
transform: |
SELECT t.{{ traffic.idCol }}, r.region_name
FROM ref('source.traffic_daily') t
{{ transforms.enrichWithRegion }}
Resolution order:
1. {{ traffic.idCol }} and {{ transforms.enrichWithRegion }} are resolved first
2. ref('source.traffic_daily') is resolved to input_0
3. input_0 is bound to the actual data source
Getting Started¶
To add common config to an existing project:
- Create the directory (or run
seeknal initon a new project):
- Add a sources file with your most-used column mappings:
cat > seeknal/common/sources.yml << 'EOF'
sources:
- id: traffic
ref: source.traffic_daily
params:
dateCol: date_id
idCol: msisdn
EOF
- Replace hardcoded values in your transforms:
# Before
transform: |
SELECT msisdn, date_id FROM input_0
# After
transform: |
SELECT {{ traffic.idCol }}, {{ traffic.dateCol }} FROM input_0
- Run your pipeline -- common config is picked up automatically:
Add rules.yml and transformations.yml as needed. You only need the files that are useful for your project -- all three are optional.
Tips¶
- Keep IDs short and descriptive:
traffic, nottraffic_daily_source_v2. - Use
rules.prefix for WHERE-clause conditions andtransforms.for JOIN/snippet blocks. - Common config works in both
seeknal runandseeknal dry-run(preview). - You can combine common config with built-in functions:
WHERE {{ traffic.dateCol }} = '{{ today }}'.
Next: Learn about Sources or return to Building Blocks