API Reference¶
This section provides comprehensive API documentation for all Seeknal modules. The documentation is auto-generated from Python docstrings using mkdocstrings.
Module Overview¶
Seeknal is organized into several key modules, each serving a specific purpose in the data and ML engineering workflow.
Core Modules¶
| Module | Description |
|---|---|
| Core | Core classes including Project, Entity, Flow, and Context |
| FeatureStore | Feature store management, feature groups, and materialization |
| Tasks | Task definitions for different data processing engines |
Workflow Modules¶
| Module | Description |
|---|---|
| Intervals | Interval tracking and cron-based scheduling for incremental processing |
| Change Detection | SQL-aware change detection for efficient incremental rebuilds |
| State Backends | Pluggable state storage for distributed execution |
| Environments | Environment management for plan/apply workflow |
| Prefect Integration | Prefect flows for distributed execution |
Module Hierarchy¶
seeknal/
├── project.py # Project management
├── entity.py # Entity definitions
├── flow.py # Data pipeline flows
├── context.py # Execution context
├── configuration.py # Configuration management
├── featurestore/ # Feature store module
│ ├── featurestore.py # Feature store core
│ └── feature_group.py# Feature group definitions
├── tasks/ # Task definitions
│ ├── base.py # Base task class
│ ├── sparkengine/ # Spark engine tasks
│ └── duckdb/ # DuckDB tasks
└── exceptions/ # Exception classes
Quick Navigation¶
Core Classes¶
- Project - Manage Seeknal projects and their lifecycle
- Entity - Define entities with join keys for feature stores
- Flow - Create and manage data transformation pipelines
- Context - Execution context and session management
FeatureStore Classes¶
- FeatureStore - Feature store management
- FeatureGroup - Define and materialize feature groups
- Materialization - Configure feature materialization
- HistoricalFeatures - Retrieve historical feature values
Task Classes¶
- Task - Base class for all tasks
- SparkEngineTask - Spark-based data processing
- DuckDBTask - DuckDB-based data processing
Docstring Format¶
All API documentation follows the Google Python Style Guide for docstrings. Each documented item includes:
- Description: What the class/function does
- Args/Parameters: Input parameters with types and descriptions
- Returns: Return value type and description
- Raises: Exceptions that may be raised
- Example: Usage examples where applicable
Filtering¶
Private Members
Private members (those starting with _) are excluded from the documentation by default. Only public APIs are documented.