Welcome to Seeknal¶

Build data pipelines and ML features in minutes, not days.

Seeknal is an all-in-one platform for data and AI/ML engineering. Define transformations in YAML or Python, run them with DuckDB or Spark, and deploy features to production with a single CLI.

What Can You Build With Seeknal?¶

For ML Engineers	For Data Engineers	For Analytics Engineers
Feature stores with point-in-time correctness	ELT pipelines with incremental execution	Semantic layers with consistent metrics
Training datasets from raw events	Data transformations with SQL	Business metrics with change tracking
Online serving for real-time inference	Multi-engine workflows (DuckDB + Spark)	Self-serve analytics for stakeholders

Common use cases: Recommendation systems, churn prediction, customer segmentation, real-time dashboards, A/B test analysis, fraud detection.

Get Started in 10 Minutes¶

→ Quick Start Guide

Install Seeknal (pip install seeknal or from GitHub Releases)
Load your data (CSV, Parquet, database)
Transform with SQL
Run your first pipeline

No infrastructure required. Works on your laptop.

How Seeknal Works: The Pipeline Builder¶

Seeknal's workflow is inspired by modern infrastructure tools like terraform and kubectl:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│     init        │ →   │     draft       │ →   │     apply       │ →   │   run --env     │
│  (setup project)│     │  (write YAML)   │     │  (save changes) │     │  (execute)      │
└─────────────────┘     └─────────────────┘     └─────────────────┘     └─────────────────┘

Step-by-Step¶

seeknal init - Create a new project
seeknal draft - Generate YAML templates for sources, transforms, feature groups
seeknal apply - Save your pipeline definition (like git commit)
seeknal run --env prod - Execute in production with safety checks

Key benefits: Dry-run validation, change detection, rollback support, multi-environment support, and multi-target materialization (PostgreSQL + Iceberg).

Choose Your Learning Path¶

Seeknal supports different workflows depending on your role and goals.

🆕 New to Seeknal?¶

Start here if you're evaluating or just getting started:

→ Quick Start Guide (10 min)

Install, load data, create features, and run your first pipeline.

🏗️ Data Engineer Path¶

→ Start Data Engineer Path

Goal: Build reliable ELT pipelines with incremental execution and production safety.

Start with: YAML Pipeline Tutorial (75 min)

Then learn: - Environment Management - Safe development with isolated environments - Incremental Models - Efficient incremental processing - Change Categorization - Understand breaking vs. non-breaking changes

Typical use case: "I need to transform raw data into analytics-ready tables, incrementally, with production safety."

📊 Analytics Engineer Path¶

→ Start Analytics Engineer Path

Goal: Define metrics and build a semantic layer for self-serve analytics.

Start with: YAML Pipeline Tutorial (75 min)

Then learn: - Semantic Layer & Metrics - Define and query consistent metrics - Change Categorization - Track metric changes over time - Testing & Audits - Validate data quality

Typical use case: "I need consistent metrics across dashboards and tools, with change tracking."

🤖 ML Engineer Path¶

→ Start ML Engineer Path

Goal: Build feature stores with point-in-time joins for ML models.

Start with: Getting Started (30 min)

Then learn: - Python Pipelines - Feature engineering with Python - Training to Serving - End-to-end ML workflow - Parallel Execution - Speed up large pipelines

Typical use case: "I need features for training that prevent data leakage, with online serving."

Concepts¶

Learn the mental model behind Seeknal.

Glossary — Definitions of all key terms
Point-in-Time Joins — Prevent data leakage in ML features
Second-Order Aggregations — Hierarchical rollups and multi-level analytics
Virtual Environments — Isolated workspaces for safe development
Change Categorization — BREAKING, NON_BREAKING, and METADATA changes
Python API vs YAML Workflows — Choose the right paradigm

Guides¶

Task-oriented walkthroughs for specific workflows.

Python Pipelines — Write Python feature transforms and custom logic
Testing & Audits — Data quality validation with seeknal audit
Semantic Layer & Metrics — Define and query metrics with seeknal query
Training to Serving — End-to-end ML feature workflow
Seeknal vs dbt vs SQLMesh vs Feast — Feature comparison

Reference¶

Lookup documentation for commands, schemas, and configuration.

CLI Commands — All 35+ commands with flags and examples
YAML Schema — Every field for all node kinds
Configuration — Project files, profiles, and environment variables
Python API — Module reference
CLI Docs Search — Search documentation from the terminal (seeknal docs)

Tutorials¶

Step-by-step learning paths with copy-pasteable code.

YAML Pipeline Tutorial — Build a complete pipeline from scratch (75 min)
Mixed YAML + Python — Combine both paradigms (60 min)
Environment Management — Safe development with environments (45 min)
Parallel Execution — Speed up large pipelines (45 min)
Change Categorization — Understand change impact (20 min)
E-Commerce Walkthrough — Real-world example

Additional Resources¶

DuckDB Getting Started — DuckDB engine quickstart
DuckDB Flow Guide — DuckDB flow patterns
Spark Transformers Reference — Spark-specific reference
Iceberg Materialization — Apache Iceberg integration
DAGRunner Documentation — Workflow runner internals