Welcome to Seeknal¶
Build data pipelines and ML features in minutes, not days.
Seeknal is an all-in-one platform for data and AI/ML engineering. Define transformations in YAML or Python, run them with DuckDB or Spark, and deploy features to production with a single CLI.
What Can You Build With Seeknal?¶
| For ML Engineers | For Data Engineers | For Analytics Engineers |
|---|---|---|
| Feature stores with point-in-time correctness | ELT pipelines with incremental execution | Semantic layers with consistent metrics |
| Training datasets from raw events | Data transformations with SQL | Business metrics with change tracking |
| Online serving for real-time inference | Multi-engine workflows (DuckDB + Spark) | Self-serve analytics for stakeholders |
Common use cases: Recommendation systems, churn prediction, customer segmentation, real-time dashboards, A/B test analysis, fraud detection.
Get Started in 10 Minutes¶
- Install Seeknal (
pip install seeknalor from GitHub Releases) - Load your data (CSV, Parquet, database)
- Transform with SQL
- Run your first pipeline
No infrastructure required. Works on your laptop.
How Seeknal Works: The Pipeline Builder¶
Seeknal's workflow is inspired by modern infrastructure tools like terraform and kubectl:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ init │ → │ draft │ → │ apply │ → │ run --env │
│ (setup project)│ │ (write YAML) │ │ (save changes) │ │ (execute) │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
Step-by-Step¶
seeknal init- Create a new projectseeknal draft- Generate YAML templates for sources, transforms, feature groupsseeknal apply- Save your pipeline definition (likegit commit)seeknal run --env prod- Execute in production with safety checks
Key benefits: Dry-run validation, change detection, rollback support, multi-environment support, and multi-target materialization (PostgreSQL + Iceberg).
Choose Your Learning Path¶
Seeknal supports different workflows depending on your role and goals.
🆕 New to Seeknal?¶
Start here if you're evaluating or just getting started:
Install, load data, create features, and run your first pipeline.
🏗️ Data Engineer Path¶
Goal: Build reliable ELT pipelines with incremental execution and production safety.
Start with: YAML Pipeline Tutorial (75 min)
Then learn: - Environment Management - Safe development with isolated environments - Incremental Models - Efficient incremental processing - Change Categorization - Understand breaking vs. non-breaking changes
Typical use case: "I need to transform raw data into analytics-ready tables, incrementally, with production safety."
📊 Analytics Engineer Path¶
→ Start Analytics Engineer Path
Goal: Define metrics and build a semantic layer for self-serve analytics.
Start with: YAML Pipeline Tutorial (75 min)
Then learn: - Semantic Layer & Metrics - Define and query consistent metrics - Change Categorization - Track metric changes over time - Testing & Audits - Validate data quality
Typical use case: "I need consistent metrics across dashboards and tools, with change tracking."
🤖 ML Engineer Path¶
Goal: Build feature stores with point-in-time joins for ML models.
Start with: Getting Started (30 min)
Then learn: - Python Pipelines - Feature engineering with Python - Training to Serving - End-to-end ML workflow - Parallel Execution - Speed up large pipelines
Typical use case: "I need features for training that prevent data leakage, with online serving."
Concepts¶
Learn the mental model behind Seeknal.
- Glossary — Definitions of all key terms
- Point-in-Time Joins — Prevent data leakage in ML features
- Second-Order Aggregations — Hierarchical rollups and multi-level analytics
- Virtual Environments — Isolated workspaces for safe development
- Change Categorization — BREAKING, NON_BREAKING, and METADATA changes
- Python API vs YAML Workflows — Choose the right paradigm
Guides¶
Task-oriented walkthroughs for specific workflows.
- Python Pipelines — Write Python feature transforms and custom logic
- Testing & Audits — Data quality validation with
seeknal audit - Semantic Layer & Metrics — Define and query metrics with
seeknal query - Training to Serving — End-to-end ML feature workflow
- Seeknal vs dbt vs SQLMesh vs Feast — Feature comparison
Reference¶
Lookup documentation for commands, schemas, and configuration.
- CLI Commands — All 35+ commands with flags and examples
- YAML Schema — Every field for all node kinds
- Configuration — Project files, profiles, and environment variables
- Python API — Module reference
- CLI Docs Search — Search documentation from the terminal (
seeknal docs)
Tutorials¶
Step-by-step learning paths with copy-pasteable code.
- YAML Pipeline Tutorial — Build a complete pipeline from scratch (75 min)
- Mixed YAML + Python — Combine both paradigms (60 min)
- Environment Management — Safe development with environments (45 min)
- Parallel Execution — Speed up large pipelines (45 min)
- Change Categorization — Understand change impact (20 min)
- E-Commerce Walkthrough — Real-world example
Additional Resources¶
- DuckDB Getting Started — DuckDB engine quickstart
- DuckDB Flow Guide — DuckDB flow patterns
- Spark Transformers Reference — Spark-specific reference
- Iceberg Materialization — Apache Iceberg integration
- DAGRunner Documentation — Workflow runner internals