Examples¶
This section provides practical code examples demonstrating common Seeknal usage patterns. Each example is designed to be self-contained and illustrates best practices for working with the Seeknal platform.
Available Examples¶
| Example | Description |
|---|---|
| Initialization | Project and entity setup, basic configuration |
| FeatureStore | Feature group creation, materialization, and retrieval |
| Flows | Data pipelines: YAML, Python decorators, and Spark flows |
| DAG Tutorial | DAG dependency tracking, manifest, and incremental builds |
| Error Handling | Exception handling and debugging patterns |
| Configuration | Config file usage and environment variables |
Quick Start¶
Seeknal supports three primary workflows. Choose the one that fits your use case.
Workflow 1: CLI Pipeline (Recommended for Data Engineering)¶
The draft → dry-run → apply CLI workflow is the primary way to build data pipelines:
# 1. Initialize a project
seeknal init --name my_project
# 2. Define sources and transforms in seeknal/ directory (YAML or Python)
# See the tutorials for detailed examples
# 3. Preview what will be executed
seeknal dry-run
# 4. Execute the pipeline
seeknal apply
# 5. Query results interactively
seeknal repl
Workflow 2: Python Decorator Pipeline¶
Define pipelines in Python using @source, @transform, and @materialize decorators:
# seeknal/pipelines/my_pipeline.py
# /// script
# dependencies = ["pandas", "duckdb"]
# ///
from seeknal.pipeline.decorators import source, transform, materialize
@source(name="raw_orders", source="csv", table="data/orders.csv")
def raw_orders():
pass
@transform(name="order_summary", inputs=["source.raw_orders"])
@materialize(type="iceberg", table="atlas.analytics.order_summary")
def order_summary(ctx):
df = ctx.ref("source.raw_orders")
return ctx.duckdb.sql("""
SELECT customer_id, COUNT(*) AS order_count, SUM(amount) AS total
FROM df GROUP BY customer_id
""").df()
Workflow 3: Python API (Feature Store)¶
Programmatic feature group creation for ML feature engineering:
from seeknal.featurestore.duckdbengine.feature_group import (
FeatureGroupDuckDB, Materialization,
)
from seeknal.entity import Entity
import pandas as pd
# Create feature group with DuckDB engine
entity = Entity(name="user", join_keys=["user_id"])
fg = FeatureGroupDuckDB(
name="user_features",
entity=entity,
materialization=Materialization(event_time_col="event_date"),
)
# Load data and write features
df = pd.read_parquet("data/user_activity.parquet")
fg.set_dataframe(df).set_features()
fg.write(feature_start_time=datetime(2024, 1, 1))
Prerequisites¶
Before running the examples, ensure you have:
- Seeknal installed:
pip install seeknalor from source (see Installation Guide) - Project initialized:
seeknal init --name my_project - Data engine: DuckDB (included) or Apache Spark (optional, for distributed processing)
Example Categories¶
Getting Started¶
If you're new to Seeknal, start with these:
- Initialization - Set up projects and entities
- Configuration - Understand configuration options
Data Engineering¶
For data pipeline and transformation workflows:
- Flows - YAML pipelines, Python decorators, and Spark flows
- DAG Tutorial - Dependency tracking and incremental builds
- Error Handling - Handle errors gracefully
Feature Engineering¶
For ML feature management:
- FeatureStore - Manage features for ML models (DuckDB and Spark)
Running Examples¶
Most examples can be run in a Python environment with Seeknal installed:
# Activate your virtual environment
source .venv/bin/activate
# CLI workflow
seeknal init --name example_project
seeknal apply
# Interactive REPL
seeknal repl
# Run Python scripts
python examples/my_example.py
Or in a Jupyter notebook for interactive exploration.
Best Practices¶
Code Patterns
- Use
seeknal initto bootstrap new projects - Use the
draft → dry-run → applyCLI workflow for data pipelines - Use
@source,@transform,@materializedecorators for Python pipelines - Use
FeatureGroupDuckDBfor single-node ML feature engineering - Always use
get_or_create()for idempotent Python API operations
Production Considerations
- Never hardcode credentials in your code
- Use environment variables or
profiles.ymlfor connection configuration - Use virtual environments (
seeknal plan dev/seeknal run --env dev) for safe testing - Test your pipelines with
seeknal dry-runbefore production runs