Troubleshooting Guide¶
Comprehensive troubleshooting for common Seeknal issues, errors, and problems.
Navigation: - Installation Issues - Configuration Issues - CLI Issues - Feature Store Issues - Performance Issues - DuckDB-Specific Issues - Spark-Specific Issues - Virtual Environment Issues - Getting Help
Quick Diagnosis¶
Before diving into specific issues, try these quick diagnostic steps:
# 1. Check Seeknal version
seeknal info
# 2. Validate configuration
seeknal validate
# 3. Test database connection
python -c "from seeknal.request import get_db_session; print('DB OK')"
# 4. Check Python version (must be 3.11+)
python --version
Installation Issues¶
"ModuleNotFoundError: No module named 'seeknal'"¶
Symptoms:
Causes: 1. Seeknal not installed in active Python environment 2. Virtual environment not activated 3. Wrong Python interpreter
Solutions:
# Check which Python you're using
which python
python --version
# Ensure virtual environment is activated
source .venv/bin/activate # Linux/macOS
.\.venv\Scripts\activate # Windows
# Verify Seeknal is installed
pip show seeknal
# If not installed, install it
pip install seeknal
"ImportError: cannot import name 'DuckDBTask'"¶
Symptoms:
Causes: 1. Outdated Seeknal version 2. Corrupted installation
Solutions:
# Check Seeknal version
pip show seeknal
# Reinstall
pip uninstall seeknal
pip install seeknal
# Verify installation
python -c "from seeknal.tasks.duckdb import DuckDBTask; print('OK')"
"SSLError: SSL: CERTIFICATE_VERIFY_FAILED"¶
Symptoms:
Causes: 1. SSL certificate issues on macOS 2. Corporate proxy blocking downloads
Solutions:
# macOS: Install certificates
/Applications/Python\ 3.11/Install\ Certificates.command
# Or use pip with trusted host
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org seeknal
# Behind proxy: Set proxy variables
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
Configuration Issues¶
"Config file not found"¶
Symptoms:
Causes: 1. Configuration directory doesn't exist 2. Config file missing
Solutions:
# Create config directory
mkdir -p ~/.seeknal
# Create minimal config file
cat > ~/.seeknal/config.toml << EOF
[default]
profile = "default"
EOF
# Verify config
seeknal validate
"profiles.yml not found"¶
Symptoms:
Causes: 1. Project not initialized 2. profiles.yml deleted
Solutions:
# Re-initialize project
seeknal init --name my_project
# Or create profiles.yml manually
cat > profiles.yml << EOF
default:
target: dev
outputs:
dev:
type: duckdb
path: target/dev.duckdb
EOF
"Database connection failed"¶
Symptoms:
Causes: 1. Database file corrupted 2. Insufficient permissions 3. Disk full
Solutions:
# Check database file exists and is readable
ls -la ~/.seeknal/metadata.db
# Check disk space
df -h
# Recreate database if corrupted
rm ~/.seeknal/metadata.db
seeknal init --name my_project
CLI Issues¶
"No project found"¶
Symptoms:
Causes: 1. Not in a project directory 2. seeknal_project.yml missing
Solutions:
# Check if project file exists
ls -la seeknal_project.yml
# Initialize project
seeknal init --name my_project
# Or navigate to existing project
cd /path/to/project
seeknal run
"Command not found: seeknal"¶
Symptoms:
Causes: 1. Seeknal installed but not in PATH 2. Virtual environment not activated
Solutions:
# Activate virtual environment
source .venv/bin/activate
# Check if seeknal is available
which seeknal
# If not, reinstall with pip
pip install seeknal
# Or use python -m seeknal
python -m seeknal.cli --help
"YAML parse error"¶
Symptoms:
Causes: 1. Invalid YAML syntax 2. Indentation errors 3. Special characters not quoted
Solutions:
# Validate YAML before applying
seeknal dry-run seeknal/sources/my_source.yml
# Check YAML syntax
python -c "import yaml; yaml.safe_load(open('seeknal/sources/my_source.yml'))"
# Common fixes:
# - Use spaces, not tabs for indentation
# - Quote strings with special characters
# - Ensure proper list formatting with dashes
Feature Store Issues¶
"Feature group not found"¶
Symptoms:
Causes: 1. Feature group not created 2. Wrong project context 3. Typo in name
Solutions:
# List available feature groups
seeknal list feature-groups
# Show specific feature group
seeknal show feature-group user_features
# Check if in correct project
seeknal info
# Create feature group if needed
# (Use Python API or YAML definition)
"Entity mismatch"¶
Symptoms:
Causes: 1. Wrong entity used for query 2. Entity keys don't match
Solutions:
# Check entity definition
from seeknal.entity import Entity
entity = Entity(name="user", join_keys=["user_id"])
# Verify feature group entity matches
fg = FeatureGroup.load(name="user_features")
print(fg.entity.name) # Should be "user"
print(fg.entity.join_keys) # Should be ["user_id"]
"Point-in-time join failed"¶
Symptoms:
Causes: 1. Duplicate entity keys in source data 2. Event time column not properly configured
Solutions:
# Check for duplicates
df[df.duplicated(subset=['user_id', 'event_time'], keep=False)]
# Configure event time column
from seeknal.featurestore.duckdbengine.feature_group import Materialization
materialization = Materialization(
event_time_col="event_time",
create_date_columns=True,
)
# Remove duplicates before writing
df = df.drop_duplicates(subset=['user_id', 'event_time'])
"Online table not found"¶
Symptoms:
Causes: 1. Entity not consolidated 2. Features not written to feature group
Solutions:
# Run the pipeline to materialize features
seeknal run --nodes feature_group.user_features
# Consolidate entities
seeknal consolidate
# Verify entity exists
seeknal entity show user
In transforms, use ctx.features():
@transform(name="predictions")
def predictions(ctx):
# Get features from consolidated entity store
features = ctx.features("user", [
"user_features.total_spend",
"user_features.order_count",
])
return features
Performance Issues¶
"Transformation is slow"¶
Symptoms: - DuckDB task taking >10 seconds for small datasets - Memory usage spiking
Diagnosis:
import time
start = time.time()
result = task.transform()
print(f"Duration: {time.time() - start:.2f}s")
Solutions:
-
Use Parquet instead of CSV:
-
Add filters early in SQL:
-
Use appropriate engine:
"Out of memory"¶
Symptoms:
Solutions:
-
Process data in chunks:
-
Use DuckDB instead of Pandas for large data:
-
Reduce data types:
"Virtual environment apply is slow"¶
Symptoms:
Solutions:
# Use parallel execution
seeknal env apply dev --parallel --max-workers 8
# Limit nodes to run
seeknal env apply dev --nodes transform.clean_data
# Show plan first to see what will run
seeknal env plan dev
DuckDB-Specific Issues¶
"Catalog Error"¶
Symptoms:
Causes: 1. DuckDB not properly initialized 2. Table not registered
Solutions:
import duckdb
# Create connection
con = duckdb.connect(database=':memory:')
# Register table explicitly
con.register('my_table', df)
# Or use SQL to create table
con.execute("CREATE TABLE my_table AS SELECT * FROM df")
"Type conversion error"¶
Symptoms:
Solutions:
# Convert types before DuckDB operation
df['amount'] = pd.to_numeric(df['amount'], errors='coerce')
df['timestamp'] = pd.to_datetime(df['timestamp'])
# Explicitly specify schema in DuckDB
con.execute("""
CREATE TABLE my_table (
user_id VARCHAR,
amount DOUBLE,
timestamp TIMESTAMP
)
""")
Spark-Specific Issues¶
"Java gateway process exited"¶
Symptoms:
Py4JJavaError: An error occurred while calling o0.pyWriteDynamic.
: java.lang.OutOfMemoryError: Java heap space
Causes: 1. Insufficient memory for Spark 2. JVM not properly configured
Solutions:
import os
from pyspark.sql import SparkSession
# Increase memory allocation
os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-memory 4g --executor-memory 4g pyspark-shell'
spark = SparkSession.builder \
.config("spark.driver.memory", "4g") \
.config("spark.executor.memory", "4g") \
.getOrCreate()
"Delta Lake not found"¶
Symptoms:
Solutions:
# Install delta-spark
pip install delta-spark
# Configure Spark to use Delta Lake
from delta import configure_spark_with_delta_pip
spark = configure_spark_with_delta_pip(
SparkSession.builder
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
).getOrCreate()
Virtual Environment Issues¶
"Plan is stale"¶
Symptoms:
Causes: 1. YAML files changed after plan was created 2. Environment state out of sync
Solutions:
# Re-create plan
seeknal env plan dev
# Force apply even if stale
seeknal env apply dev --force
# Or use latest plan with --no-stale-check
seeknal env apply dev
"Environment not found"¶
Symptoms:
Solutions:
# List available environments
seeknal env list
# Create environment first
seeknal env plan dev
# Then apply
seeknal env apply dev
"Dependency cycle detected"¶
Symptoms:
Causes: 1. Circular dependency in YAML definitions 2. Incorrect deps configuration
Solutions:
# Incorrect: Circular dependency
# source_a.yml
deps:
- transform.b
# transform_b.yml
deps:
- source.source_a
# Correct: Remove cycle
# source_a.yml
deps: [] # No dependencies
# transform_b.yml
deps:
- source.source_a # Depends on source
Validation and Debugging¶
Enable Debug Logging¶
import logging
logging.basicConfig(level=logging.DEBUG)
# Or for specific module
logging.getLogger('seeknal').setLevel(logging.DEBUG)
Use Dry Run Mode¶
# Validate without executing
seeknal dry-run seeknal/transforms/my_transform.yml
# Show execution plan without running
seeknal run --show-plan
Inspect Intermediate Results¶
# Check intermediate data
from pathlib import Path
intermediate_dir = Path("target/intermediate")
for file in intermediate_dir.glob("*.parquet"):
print(f"{file.name}: {pd.read_parquet(file).shape}")
Getting Help¶
Diagnostic Information Collection¶
Before seeking help, collect this information:
# System information
seeknal info
python --version
pip list | grep seeknal
# Configuration
cat seeknal_project.yml
cat profiles.yml
# Error reproduction
# 1. Copy the exact error message
# 2. Note the command that failed
# 3. Include relevant YAML/Python code
Where to Get Help¶
- Documentation: Check Getting Started Guide
- CLI Reference: Review CLI Command Reference
- GitHub Issues: Search github.com/mta-tech/seeknal/issues
- Discussions: Ask in GitHub Discussions
When Creating an Issue¶
Include:
- Seeknal version:
seeknal info - Python version:
python --version - Operating system:
uname -a(Linux/macOS) or systeminfo (Windows) - Error message: Full traceback
- Steps to reproduce: Minimal code/example
- Expected behavior: What you expected to happen
- Actual behavior: What actually happened
Common Debugging Commands¶
# Validate entire project
seeknal validate
# Check database state
python -c "from seeknal.request import get_db_session; from sqlmodel import select; from seeknal.models import FeatureGroupTable; print([fg.name for fg in get_db_session().exec(select(FeatureGroupTable)).all()])"
# Test feature group loading
python -c "from seeknal.featurestore.feature_group import FeatureGroup; fg = FeatureGroup.load(name='my_features'); print(f'Loaded: {fg.name}')"
# Inspect DAG
seeknal parse --format json > manifest.json
cat manifest.json | jq '.nodes | length'
Last updated: February 2026 | Seeknal Documentation