Aggregations¶
Aggregations compute metrics and rollups across entities and time periods.
Overview¶
Seeknal supports first-order aggregations (standard GROUP BY) and second-order aggregations (hierarchical rollups).
First-Order Aggregations¶
Basic Aggregations¶
name: customer_metrics
kind: aggregation
entity: customer
time_grain: day
metrics:
- name: total_revenue
expression: SUM(amount)
- name: transaction_count
expression: COUNT(*)
- name: avg_transaction_value
expression: AVG(amount)
Window Functions¶
name: rolling_metrics
kind: aggregation
entity: customer
window: 7d
metrics:
- name: rolling_7day_revenue
expression: SUM(amount)
- name: rolling_7day_transactions
expression: COUNT(*)
Second-Order Aggregations¶
Multi-Level Rollups¶
name: regional_customer_metrics
kind: second_order_aggregation
base_entity: customer
aggregate_entity: region
metrics:
- name: regional_total_revenue
expression: SUM(customer.total_revenue)
- name: regional_avg_customer_revenue
expression: AVG(customer.total_revenue)
Conditional Aggregations¶
name: customer_segment_metrics
kind: second_order_aggregation
base_entity: customer
aggregate_entity: segment
condition: "signup_date >= '2024-01-01'"
metrics:
- name: new_customer_revenue
expression: SUM(total_revenue)
Python Aggregations¶
Define aggregations using Python:
from seeknal.workflow.decorators import aggregation
@aggregation(
name="python_aggregation",
entity="customer",
output="customer_features"
)
def compute_features(df):
return df.groupby("customer_id").agg({
"amount": ["sum", "count", "mean"]
})
Aggregation Configuration¶
Common Options¶
| Option | Type | Description | Default |
|---|---|---|---|
name |
string | Unique identifier | Required |
entity |
string | Entity to aggregate over | Required |
time_grain |
string | Time granularity (day, week, month) | Optional |
metrics |
list | Metric definitions | Required |
window |
string | Rolling window size | Optional |
Best Practices¶
- Define clear entities for aggregation
- Use time grains for temporal aggregations
- Document metric logic in descriptions
- Test with sample data first
- Consider performance for large datasets
Related Topics¶
- Second-Order Aggregations - Conceptual overview
- Feature Groups - Aggregations for ML features
- Transforms - Data preparation
Next: Learn about Feature Groups or return to Building Blocks