Data Engineer Path¶
Duration: ~75 minutes | Format: YAML & Python | Prerequisites: SQL, CLI experience
Build production-ready ELT pipelines with Seeknal, from data ingestion to production deployment.
What You'll Learn¶
The Data Engineer path teaches you to build production-grade ELT pipelines with Seeknal. You'll learn to:
- Build ELT Pipelines - Load data from CSV/Parquet sources, transform with DuckDB, save results
- Add Incremental Models - Implement CDC, scheduling, and change data capture
- Deploy to Production - Use virtual environments, promotion workflows, and rollback procedures
Prerequisites¶
Before starting this path, ensure you have:
- Quick Start completed — Basic pipeline builder workflow
- Basic SQL knowledge (JOINs, window functions, aggregations)
- Familiarity with command line tools
- Understanding of data warehouse concepts
Chapters¶
Chapter 1: Build ELT Pipeline (~25 minutes)¶
Learn to build a complete e-commerce order processing pipeline:
You'll build: - CSV source that loads order data - DuckDB transformation with data quality checks - Clean output saved to Parquet format - Data validation and deduplication logic
Chapter 2: Add Incremental Models (~30 minutes)¶
Optimize your pipeline with incremental processing:
You'll build: - Incremental source that only processes new/changed data - CDC (Change Data Capture) patterns for detecting updates - Scheduled pipeline runs for automated processing - Monitoring and validation for incremental updates
Chapter 3: Deploy to Production Environments (~35 minutes)¶
Deploy your pipeline safely using virtual environments:
You'll learn: - Virtual environments for safe development and testing - Plan-apply-promote workflow for controlled deployments - Change categorization (breaking vs non-breaking changes) - Production promotion with approvals and rollback procedures
What You'll Build¶
By the end of this path, you'll have a complete production-ready pipeline:
| Component | Technology | Purpose |
|---|---|---|
| CSV/Parquet Source | File-based | Load order data |
| DuckDB Transform | SQL | Clean, validate, deduplicate |
| Parquet Output | Columnar storage | Store processed data |
| Incremental Loading | CDC | Only process new data |
| Scheduling | Cron | Automated pipeline runs |
| Virtual Environments | Isolation | Safe dev → prod workflow |
Key Commands You'll Learn¶
# Initialize a new project
seeknal init --name ecommerce-pipeline
# Draft a new source
seeknal draft source orders_data
# Validate and apply to project
seeknal dry-run seeknal/sources/orders_data.yml
seeknal apply seeknal/sources/orders_data.yml
# Run pipeline
seeknal run
# Plan changes for environment
seeknal plan dev
# Apply in isolated environment
seeknal env apply dev
# Promote to production
seeknal env promote dev prod
Resources¶
Reference¶
- CLI Reference — All commands and flags
- YAML Schema Reference — Pipeline YAML reference
- Troubleshooting — Debug common issues
Concepts¶
- Virtual Environments — Isolate dev and prod
- Change Categorization — Breaking vs non-breaking changes
- Pipeline Builder — Core workflow for all pipelines
Related Paths¶
- Analytics Engineer Path — Build semantic layers and metrics
- ML Engineer Path — Feature stores and ML pipelines
Getting Help¶
If you get stuck:
1. Check the Troubleshooting Guide
2. Review the CLI Reference
3. Run seeknal <command> --help for command-specific help
Last updated: February 2026 | Seeknal Documentation