Configuration¶
This guide demonstrates how to configure Seeknal using configuration files and environment variables. Proper configuration is essential for connecting to your data infrastructure and customizing Seeknal's behavior.
Overview¶
Seeknal uses a hierarchical configuration system that supports:
- TOML configuration files - For persistent settings
- Environment variables - For sensitive values and deployment flexibility
- Variable interpolation - Reference other config values or environment variables
- Programmatic access - Dot notation for easy access to nested settings
Configuration Files¶
Seeknal uses TOML format for configuration files. By default, it looks for:
- User config:
~/config.toml(customizable via__USER_CONFIG_PATHenv var) - Backend config:
~/backend.toml(customizable via__BACKEND_CONFIG_PATHenv var)
Basic Configuration File¶
Create a configuration file at ~/config.toml:
# Seeknal Configuration File
[logging]
level = "INFO"
format = "%(asctime)s - %(levelname)s - %(message)s"
datefmt = "%Y-%m-%d %H:%M:%S"
[tasks]
[tasks.defaults]
max_retries = 3
retry_delay = 60 # seconds
timeout = 3600 # 1 hour in seconds
[storage]
base_path = "/data/seeknal"
offline_store = "${storage.base_path}/offline"
online_store = "${storage.base_path}/online"
[database]
host = "localhost"
port = 5432
name = "seeknal_db"
Loading Configuration¶
from seeknal.configuration import load_configuration, config
# The default configuration is loaded automatically
# Access it via the global `config` object
print(f"Log level: {config.logging.level}")
# Or load a custom configuration
custom_config = load_configuration(
user_config_path="/path/to/custom/config.toml",
backend_config_path="/path/to/backend.toml",
env_var_prefix="SEEKNAL"
)
Environment Variables¶
Environment variables provide a flexible way to configure Seeknal, especially for sensitive values like credentials.
Setting Configuration via Environment Variables¶
Environment variables follow the pattern: PREFIX__SECTION__KEY
# Set configuration via environment variables
export SEEKNAL__LOGGING__LEVEL="DEBUG"
export SEEKNAL__DATABASE__HOST="production-db.example.com"
export SEEKNAL__DATABASE__PORT="5432"
export SEEKNAL__STORAGE__BASE_PATH="/mnt/data/seeknal"
Loading Configuration with Environment Variable Prefix¶
from seeknal.configuration import load_configuration
# Load configuration with environment variable override support
config = load_configuration(
user_config_path="~/.config.toml",
env_var_prefix="SEEKNAL"
)
# Environment variables override file settings
print(f"Database host: {config.database.host}")
# Output: production-db.example.com (from environment)
Type Conversion¶
Environment variable values are automatically converted to appropriate Python types:
# Boolean values (case-insensitive)
export SEEKNAL__DEBUG="true"
export SEEKNAL__VERBOSE="FALSE"
# Numeric values
export SEEKNAL__MAX_WORKERS="4"
export SEEKNAL__TIMEOUT="3.5"
# Lists and dicts (Python literal syntax)
export SEEKNAL__ALLOWED_HOSTS='["localhost", "127.0.0.1"]'
from seeknal.configuration import string_to_type
# Type conversion examples
print(string_to_type("true")) # True (bool)
print(string_to_type("FALSE")) # False (bool)
print(string_to_type("42")) # 42 (int)
print(string_to_type("3.14")) # 3.14 (float)
print(string_to_type('["a", "b"]')) # ["a", "b"] (list)
print(string_to_type("hello")) # "hello" (str)
Variable Interpolation¶
Seeknal supports variable interpolation using the ${section.key} syntax.
Referencing Other Config Values¶
# config.toml
[storage]
base_path = "/data/seeknal"
# Reference other config values with ${section.key}
offline_store = "${storage.base_path}/offline"
online_store = "${storage.base_path}/online"
archive_store = "${storage.base_path}/archive"
[project]
name = "my_project"
feature_path = "${storage.offline_store}/${project.name}/features"
Referencing Environment Variables¶
# config.toml
[database]
# Reference environment variables with standard shell syntax
host = "$DB_HOST"
port = "$DB_PORT"
username = "$DB_USER"
password = "$DB_PASSWORD"
[storage]
# Supports home directory expansion
base_path = "~/seeknal_data"
import os
from seeknal.configuration import load_configuration
# Set environment variables
os.environ["DB_HOST"] = "db.example.com"
os.environ["DB_PORT"] = "5432"
os.environ["DB_USER"] = "seeknal_user"
os.environ["DB_PASSWORD"] = "secret"
# Load configuration - environment variables are interpolated
config = load_configuration(user_config_path="config.toml")
print(f"Database: {config.database.username}@{config.database.host}")
# Output: seeknal_user@db.example.com
Nested Interpolation¶
from seeknal.configuration import interpolate_env_vars
# Supports nested environment variable expansion
os.environ["BASE"] = "$HOME/projects"
os.environ["PROJECT"] = "seeknal"
result = interpolate_env_vars("$BASE/$PROJECT/data")
print(result) # /home/user/projects/seeknal/data
DotDict Access¶
Configuration values can be accessed using dot notation for cleaner code.
Accessing Configuration Values¶
from seeknal.configuration import config
# Dot notation access
log_level = config.logging.level
db_host = config.database.host
# Dictionary-style access also works
log_level = config["logging"]["level"]
db_host = config.get("database", {}).get("host", "localhost")
# Mixed access patterns
storage_config = config.storage
offline_path = storage_config["offline_store"]
Working with DotDict¶
from seeknal.configuration import DotDict
# Create a DotDict from a regular dictionary
settings = DotDict({
"database": {
"host": "localhost",
"port": 5432
},
"logging": {
"level": "INFO"
}
})
# Access with dot notation
print(settings.database.host) # localhost
print(settings.logging.level) # INFO
# Modify values
settings.database.host = "production.example.com"
settings.database.ssl = True # Add new keys
# Convert back to regular dict
regular_dict = settings.to_dict()
Creating Nested Configurations¶
from seeknal.configuration import DotDict, as_nested_dict
# Convert nested dictionaries to DotDict for dot access
raw_config = {
"spark": {
"master": "local[*]",
"app_name": "seeknal",
"config": {
"spark.sql.shuffle.partitions": "200"
}
}
}
# Convert to nested DotDict
config = as_nested_dict(raw_config, dct_class=DotDict)
# Now accessible with dots
print(config.spark.master) # local[*]
print(config.spark.config["spark.sql.shuffle.partitions"]) # 200
Complete Configuration Example¶
Here's a comprehensive configuration setup for a production environment:
Configuration File (config.toml)¶
# Seeknal Production Configuration
[logging]
level = "INFO"
format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
datefmt = "%Y-%m-%d %H:%M:%S"
log_attributes = "[]"
[tasks]
[tasks.defaults]
max_retries = 3
retry_delay = 60
timeout = 7200
[storage]
base_path = "$SEEKNAL_DATA_PATH"
offline_store = "${storage.base_path}/offline"
online_store = "${storage.base_path}/online"
[database]
host = "$SEEKNAL_DB_HOST"
port = "$SEEKNAL_DB_PORT"
name = "$SEEKNAL_DB_NAME"
username = "$SEEKNAL_DB_USER"
password = "$SEEKNAL_DB_PASSWORD"
[spark]
master = "$SPARK_MASTER"
app_name = "seeknal-production"
executor_memory = "4g"
driver_memory = "2g"
Application Setup¶
import os
from seeknal.configuration import load_configuration, Config
def setup_environment():
"""Set up environment variables for Seeknal."""
# These would typically come from a secrets manager or deployment config
os.environ["SEEKNAL_DATA_PATH"] = "/mnt/data/seeknal"
os.environ["SEEKNAL_DB_HOST"] = "db.production.example.com"
os.environ["SEEKNAL_DB_PORT"] = "5432"
os.environ["SEEKNAL_DB_NAME"] = "seeknal_prod"
os.environ["SEEKNAL_DB_USER"] = "seeknal_app"
os.environ["SEEKNAL_DB_PASSWORD"] = os.getenv("DB_SECRET", "")
os.environ["SPARK_MASTER"] = "spark://spark-master:7077"
def load_app_config() -> Config:
"""Load the application configuration."""
config = load_configuration(
user_config_path="config.toml",
env_var_prefix="SEEKNAL"
)
return config
def main():
# Setup environment
setup_environment()
# Load configuration
config = load_app_config()
# Use configuration
print(f"Storage path: {config.storage.base_path}")
print(f"Database: {config.database.host}:{config.database.port}")
print(f"Spark master: {config.spark.master}")
print(f"Task timeout: {config.tasks.defaults.timeout} seconds")
if __name__ == "__main__":
main()
Converting Configuration to Environment Variables¶
Export configuration values as environment variables for subprocess calls:
from seeknal.configuration import config, to_environment_variables
# Convert entire configuration to environment variables
env_vars = to_environment_variables(config, prefix="SEEKNAL")
print(env_vars)
# Output: {'SEEKNAL__LOGGING__LEVEL': 'INFO', 'SEEKNAL__DATABASE__HOST': '...', ...}
# Convert only specific keys
env_vars = to_environment_variables(
config,
include=["logging.level", "database.host"],
prefix="APP"
)
print(env_vars)
# Output: {'APP__LOGGING__LEVEL': 'INFO', 'APP__DATABASE__HOST': 'localhost'}
Best Practices¶
Configuration Best Practices
- Never hardcode credentials - Use environment variables for sensitive data
- Use interpolation - Avoid repeating paths and values
- Environment-specific files - Maintain separate configs for dev/staging/prod
- Validate early - Load and validate configuration at application startup
Security Considerations
- Store sensitive values in environment variables or a secrets manager
- Never commit configuration files with credentials to version control
- Use file permissions to protect configuration files
- Rotate credentials regularly
Development vs Production¶
import os
from seeknal.configuration import load_configuration
def get_config():
"""Load environment-appropriate configuration."""
env = os.getenv("SEEKNAL_ENV", "development")
if env == "production":
config_path = "/etc/seeknal/config.toml"
elif env == "staging":
config_path = "~/.seeknal/staging.toml"
else:
config_path = "~/.seeknal/dev.toml"
return load_configuration(
user_config_path=config_path,
env_var_prefix="SEEKNAL"
)
Configuration Validation¶
from seeknal.configuration import load_configuration, validate_config
def load_and_validate():
"""Load configuration with validation."""
config = load_configuration(user_config_path="config.toml")
# Built-in validation is called automatically
# Add custom validation as needed
required_keys = ["database.host", "storage.base_path"]
for key in required_keys:
parts = key.split(".")
current = config
for part in parts:
if not hasattr(current, part) or getattr(current, part) is None:
raise ValueError(f"Required configuration missing: {key}")
current = getattr(current, part)
return config
Next Steps¶
After configuring Seeknal, you can:
- Initialize Projects - Set up your feature store (Initialization)
- Create Pipelines - Build data pipelines (Flows & Pipelines)
- Feature Store - Manage ML features (FeatureStore)
- Handle Errors - Implement robust error handling (Error Handling)