Core API Reference¶
This page documents the core Seeknal classes that form the foundation of the library. These classes handle project management, entity definitions, data flows, and execution context.
Overview¶
The core module provides the essential building blocks for working with Seeknal:
| Class | Purpose |
|---|---|
Project |
Manage Seeknal projects and their lifecycle |
Entity |
Define entities with join keys for feature stores |
Flow |
Create and manage data transformation pipelines |
Context |
Execution context and session management |
Project¶
The Project class is the top-level container for organizing Seeknal resources. Projects provide namespace isolation and resource management.
Classes¶
Project(name: str, description: str = '')
dataclass
¶
A class used to define and manage projects in Seeknal.
Projects serve as the top-level organizational unit for grouping related entities, feature views, and data sources. Each project has a unique name and optional description.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
The name of the project. Will be converted to snake_case.
TYPE:
|
description
|
A description of the project. Defaults to empty string.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
project_id |
The unique identifier assigned after saving to the database. Only available after calling get_or_create().
TYPE:
|
Example
project = Project(name="my_project", description="My feature store project") project = project.get_or_create() print(project.project_id)
Functions¶
get_or_create()
¶
Get an existing project by name or create a new one if it doesn't exist.
This method checks if a project with the current name exists in the database. If found, it loads the existing project's data into this instance. If not found, it creates a new project with the current attributes. The project_id is set on the global context for subsequent operations.
| RETURNS | DESCRIPTION |
|---|---|
Project
|
The current instance with project_id populated. |
Example
project = Project(name="my_project", description="Test project") project = project.get_or_create() print(project.project_id)
Source code in src/seeknal/project.py
update(name=None, description=None)
¶
Update the project's name and/or description.
Updates the project in the database with the provided values. If a parameter is None, the existing value is preserved. The project must be loaded via get_or_create() before calling this method.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
New name for the project. If None, keeps current name.
TYPE:
|
description
|
New description for the project. If None, keeps current description.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Project
|
The current instance with updated attributes. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the project has not been loaded via get_or_create(). |
ProjectNotFoundError
|
If the project no longer exists in the database. |
Example
project = Project(name="my_project").get_or_create() project = project.update(description="Updated description")
Source code in src/seeknal/project.py
list()
staticmethod
¶
List all projects in a tabular format.
Retrieves all projects from the database and displays them in a formatted table using GitHub-style markdown format. The table includes the project name, description, creation time, and last update time.
Note
This is a static method that outputs directly to the console. It does not return any value.
Example
Project.list() | name | description | created_at | updated_at | |------------|---------------|---------------------|---------------------| | my_project | Test project | 2024-01-15 10:30:00 | 2024-01-15 10:30:00 |
Source code in src/seeknal/project.py
get_by_id(id)
staticmethod
¶
Retrieve a project by its unique identifier.
Fetches a project from the database using its ID and returns a new Project instance with the loaded data.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
The unique identifier of the project to retrieve.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Project
|
A new Project instance with the loaded data. |
| RAISES | DESCRIPTION |
|---|---|
ProjectNotFoundError
|
If no project exists with the given ID. |
Example
project = Project.get_by_id(1) print(project.name)
Source code in src/seeknal/project.py
Functions¶
Entity¶
The Entity class defines entities with join keys that serve as the primary identifiers for feature lookups in the feature store.
Classes¶
Entity(name: str, join_keys: Optional[List[str]] = None, pii_keys: Optional[List[str]] = None, description: Optional[str] = None)
dataclass
¶
Represents an entity in the feature store.
An entity defines a domain object (e.g., user, product, transaction) that features are associated with. Entities have join keys that uniquely identify instances and can optionally specify PII (Personally Identifiable Information) keys for data privacy compliance.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
The entity name. Will be converted to snake_case automatically.
TYPE:
|
join_keys |
List of column names that uniquely identify entity instances. These keys are used for joining features during retrieval.
TYPE:
|
pii_keys |
Optional list of column names containing personally identifiable information. Used for data privacy and compliance purposes.
TYPE:
|
description |
Optional human-readable description of the entity.
TYPE:
|
Example
entity = Entity( ... name="customer", ... join_keys=["customer_id"], ... pii_keys=["email", "phone"], ... description="Customer entity for retail features" ... ) entity.get_or_create()
Functions¶
get_or_create()
¶
Retrieve an existing entity or create a new one.
This method checks if an entity with the same name already exists in the feature store. If found, it loads the existing entity's properties into this instance. If not found, it creates a new entity with the current instance's properties.
After calling this method, the entity will have an 'entity_id' attribute set, which is required for operations like update().
| RETURNS | DESCRIPTION |
|---|---|
Entity
|
The current instance with entity_id set and properties synchronized with the persisted entity. |
Example
entity = Entity(name="user", join_keys=["user_id"]) entity = entity.get_or_create() print(entity.entity_id) # Now has an ID
Source code in src/seeknal/entity.py
list()
staticmethod
¶
List all registered entities in the feature store.
Retrieves all entities from the feature store and displays them in a formatted table. The table includes entity name, join keys, PII keys, and description.
This is a static method that can be called without instantiating an Entity object.
Example
Entity.list() | name | join_keys | pii_keys | description | |----------|-------------|----------|----------------------| | customer | customer_id | email | Customer entity | | product | product_id | None | Product catalog item |
Source code in src/seeknal/entity.py
update(name=None, description=None, pii_keys=None)
¶
Update the entity's properties in the feature store.
Updates the entity with new values for name, description, or PII keys. Only the provided parameters will be updated; others retain their current values. The entity must have been previously saved via get_or_create() before calling this method.
Note
Join keys cannot be updated after entity creation as they define the entity's identity.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Optional new name for the entity. Will be converted to snake_case.
DEFAULT:
|
description
|
Optional new description for the entity.
DEFAULT:
|
pii_keys
|
Optional new list of PII key column names.
DEFAULT:
|
| RAISES | DESCRIPTION |
|---|---|
EntityNotSavedError
|
If the entity has not been saved via get_or_create() first. |
EntityNotFoundError
|
If the entity no longer exists in the feature store. |
Example
entity = Entity(name="user", join_keys=["user_id"]) entity.get_or_create() entity.update(description="Updated user entity")
Source code in src/seeknal/entity.py
set_key_values(*args)
¶
Set specific values for the entity's join keys.
Maps positional arguments to the entity's join keys in order, storing them in the key_values attribute. This is useful for point lookups when retrieving features for a specific entity instance.
| PARAMETER | DESCRIPTION |
|---|---|
*args
|
Values for each join key, in the same order as defined in join_keys. The number of arguments must match the number of join keys.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
Entity
|
The current instance with key_values set. |
Example
entity = Entity(name="order", join_keys=["user_id", "order_id"]) entity.get_or_create() entity.set_key_values("user123", "order456") print(entity.key_values)
Source code in src/seeknal/entity.py
Functions¶
require_saved(func)
¶
Decorator that ensures an entity has been saved before method execution.
This decorator checks if the entity instance has an 'entity_id' attribute, which indicates it has been persisted via get_or_create(). If not, it raises an EntityNotSavedError.
| PARAMETER | DESCRIPTION |
|---|---|
func
|
The method to wrap.
|
| RETURNS | DESCRIPTION |
|---|---|
|
A wrapper function that validates the entity is saved before calling |
|
|
the original method. |
| RAISES | DESCRIPTION |
|---|---|
EntityNotSavedError
|
If the entity has not been saved or loaded. |
Source code in src/seeknal/entity.py
Flow¶
The Flow class enables the creation and execution of data transformation pipelines. Flows connect inputs, tasks, and outputs to build complete data processing workflows.
Classes¶
FlowOutputEnum
¶
Bases: str, Enum
Enumeration of supported flow output types.
Defines the possible output formats for Flow execution results.
| ATTRIBUTE | DESCRIPTION |
|---|---|
SPARK_DATAFRAME |
Output as a PySpark DataFrame.
|
ARROW_DATAFRAME |
Output as a PyArrow Table.
|
PANDAS_DATAFRAME |
Output as a Pandas DataFrame.
|
HIVE_TABLE |
Write output to a Hive table.
|
PARQUET |
Write output to Parquet files.
|
LOADER |
Use a custom loader for output.
|
FEATURE_GROUP |
Output to a feature group.
|
FEATURE_SERVING |
Output for feature serving.
|
FlowInputEnum
¶
Bases: str, Enum
Enumeration of supported flow input types.
Defines the possible input sources for Flow data ingestion.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Read input from a Hive table.
|
PARQUET |
Read input from Parquet files.
|
FEATURE_GROUP |
Read input from a feature group.
|
EXTRACTOR |
Use a custom extractor for input.
|
SOURCE |
Read input from a defined Source.
|
FlowInput(value: Optional[Union[str, dict, Extractor]] = None, kind: FlowInputEnum = FlowInputEnum.HIVE_TABLE)
dataclass
¶
Configuration for flow input data source.
Defines how data is loaded into a Flow for processing. Supports multiple input types including Hive tables, Parquet files, extractors, and sources.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The input specification. Can be a table name (str), path (str), configuration (dict), or Extractor instance depending on the kind.
TYPE:
|
kind |
The type of input source (default: HIVE_TABLE).
TYPE:
|
Example
Read from a Hive table¶
flow_input = FlowInput(value="my_database.my_table", kind=FlowInputEnum.HIVE_TABLE)
Read from Parquet files¶
flow_input = FlowInput(value="/path/to/data.parquet", kind=FlowInputEnum.PARQUET)
FlowOutput(value: Optional[Any] = None, kind: Optional[FlowOutputEnum] = None)
dataclass
¶
Configuration for flow output destination.
Defines how flow results are returned or persisted. Supports multiple output formats including DataFrames, Hive tables, and Parquet files.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The output destination. For file-based outputs, this is the path or table name. For DataFrame outputs, this is typically None.
TYPE:
|
kind |
The type of output format (default: None, returns data as-is).
TYPE:
|
Example
Return as Spark DataFrame¶
output = FlowOutput(kind=FlowOutputEnum.SPARK_DATAFRAME)
Write to Hive table¶
output = FlowOutput(value="my_db.output_table", kind=FlowOutputEnum.HIVE_TABLE)
Flow(name: str, input: Optional[FlowInput] = None, input_date_col: Optional[dict] = None, tasks: Optional[List[Task]] = None, output: Optional[FlowOutput] = None, description: str = '')
dataclass
¶
A data processing pipeline that chains inputs, tasks, and outputs.
Flow is the core abstraction for defining data pipelines in seeknal. It connects a data source (input), a series of transformation tasks, and an output destination. Flows can be saved to and loaded from the seeknal backend for reuse and scheduling.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
Unique identifier for the flow (automatically converted to snake_case).
TYPE:
|
input |
Configuration for the input data source.
TYPE:
|
input_date_col |
Optional date column configuration for filtering input data. Contains 'dateCol' (column name) and 'datePattern' (date format).
TYPE:
|
tasks |
Optional list of Task instances to execute in sequence.
TYPE:
|
output |
Configuration for the output destination.
TYPE:
|
description |
Human-readable description of the flow's purpose.
TYPE:
|
Example
from seeknal.flow import Flow, FlowInput, FlowOutput, FlowInputEnum, FlowOutputEnum from seeknal.tasks.sparkengine import SparkEngineTask
Create a simple flow¶
flow = Flow( ... name="my_etl_flow", ... input=FlowInput(value="source_table", kind=FlowInputEnum.HIVE_TABLE), ... tasks=[SparkEngineTask()], ... output=FlowOutput(kind=FlowOutputEnum.SPARK_DATAFRAME), ... description="ETL flow for processing source data" ... )
Run the flow¶
result = flow.run(start_date="2024-01-01", end_date="2024-01-31")
Functions¶
require_saved(func)
¶
Decorator that ensures the flow has been saved before method execution.
| PARAMETER | DESCRIPTION |
|---|---|
func
|
The method to wrap.
|
| RETURNS | DESCRIPTION |
|---|---|
|
Wrapped function that checks for flow_id before execution. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the flow has not been saved (no flow_id). |
Source code in src/seeknal/flow.py
set_input_date_col(date_col: str, date_pattern: str = 'yyyyMMdd')
¶
Configure the date column for input data filtering.
Sets up date-based filtering on the input data, allowing the flow to process data within specific date ranges.
| PARAMETER | DESCRIPTION |
|---|---|
date_col
|
Name of the column containing date values.
TYPE:
|
date_pattern
|
Date format pattern (default: "yyyyMMdd").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Self for method chaining. |
Example
flow.set_input_date_col("event_date", "yyyy-MM-dd")
Source code in src/seeknal/flow.py
run(params=None, filters=None, date=None, start_date=None, end_date=None)
¶
Execute the flow pipeline.
Runs the complete flow: loads input data, applies filters, executes all tasks in sequence, and returns the output in the configured format.
| PARAMETER | DESCRIPTION |
|---|---|
params
|
Optional dictionary of parameters to pass to tasks.
DEFAULT:
|
filters
|
Optional filters to apply to the input data.
DEFAULT:
|
date
|
Optional single date for filtering (mutually exclusive with start_date/end_date).
DEFAULT:
|
start_date
|
Optional start date for date range filtering.
DEFAULT:
|
end_date
|
Optional end date for date range filtering.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
The processed data in the format specified by the output configuration. |
Example
Run with date range¶
result = flow.run(start_date="2024-01-01", end_date="2024-01-31")
Run with parameters¶
result = flow.run(params={"threshold": 0.5})
Source code in src/seeknal/flow.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 | |
as_dict()
¶
Convert the flow to a dictionary representation.
Serializes the flow configuration to a dictionary suitable for storage or transmission. Removes Spark context references.
| RETURNS | DESCRIPTION |
|---|---|
|
Dictionary containing the flow's configuration including |
|
|
name, input, output, tasks, and description. |
Source code in src/seeknal/flow.py
as_yaml()
¶
Convert the flow to a YAML string representation.
| RETURNS | DESCRIPTION |
|---|---|
|
YAML-formatted string of the flow configuration. |
from_dict(flow_dict: dict)
staticmethod
¶
Create a Flow instance from a dictionary.
Deserializes a flow configuration dictionary back into a Flow object. Reconstructs input, output, and task configurations.
| PARAMETER | DESCRIPTION |
|---|---|
flow_dict
|
Dictionary containing flow configuration with keys like 'name', 'input', 'output', 'tasks'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Flow instance with the configured settings. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If a task in the dictionary is missing 'class_name'. |
Example
flow_config = {"name": "my_flow", "input": {...}, "output": {...}} flow = Flow.from_dict(flow_config)
Source code in src/seeknal/flow.py
get_or_create()
¶
Save or retrieve the flow from the backend.
If a flow with the same name exists in the current project, loads its configuration. Otherwise, saves this flow as a new entry.
| RETURNS | DESCRIPTION |
|---|---|
|
Self with flow_id populated. |
Note
Requires an active workspace and project context.
Source code in src/seeknal/flow.py
list()
staticmethod
¶
List all flows in the current project.
Displays a formatted table of flows including name, description, specification, and timestamps.
Note
Requires an active workspace and project context. Outputs directly to the console using typer.echo.
Source code in src/seeknal/flow.py
update(name: Optional[str] = None, input: Optional[FlowInput] = None, tasks: Optional[List[Task]] = None, output: Optional[FlowOutput] = None, description: str = '')
¶
Update the flow configuration in the backend.
Updates the saved flow with new configuration values. Any parameter not provided will retain its existing value.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
New name for the flow.
TYPE:
|
input
|
New input configuration.
TYPE:
|
tasks
|
New list of tasks.
TYPE:
|
output
|
New output configuration.
TYPE:
|
description
|
New description.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the flow has not been saved yet or not found. |
Note
Requires the flow to be saved first via get_or_create().
Source code in src/seeknal/flow.py
delete()
¶
Delete the flow from the backend.
Removes the saved flow from the seeknal backend permanently.
| RETURNS | DESCRIPTION |
|---|---|
|
Result of the delete operation. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the flow has not been saved yet. |
Note
Requires the flow to be saved first via get_or_create().
Source code in src/seeknal/flow.py
Functions¶
run_flow(flow_name: Optional[str] = None, flow: Optional[Flow] = None, params=None, filters=None, date=None, start_date=None, end_date=None, name='run_flow')
¶
Execute a flow by name or instance.
Convenience function to run a flow either by providing its name (loads from backend) or a Flow instance directly.
| PARAMETER | DESCRIPTION |
|---|---|
flow_name
|
Name of a saved flow to load and run.
TYPE:
|
flow
|
Flow instance to run directly.
TYPE:
|
params
|
Optional dictionary of parameters to pass to tasks.
DEFAULT:
|
filters
|
Optional filters to apply to the input data.
DEFAULT:
|
date
|
Optional single date for filtering.
DEFAULT:
|
start_date
|
Optional start date for date range filtering.
DEFAULT:
|
end_date
|
Optional end date for date range filtering.
DEFAULT:
|
name
|
Internal name for the operation (default: "run_flow").
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
The processed data from the flow execution. |
Example
Run by flow name¶
result = run_flow(flow_name="my_saved_flow", start_date="2024-01-01")
Run by instance¶
result = run_flow(flow=my_flow_instance, params={"key": "value"})
Source code in src/seeknal/flow.py
Context¶
The Context class manages the execution context and session state for Seeknal operations.
Classes¶
Context(*args: Any, **kwargs: Any)
¶
Bases: DotDict, local
A thread safe context store for seeknal data.
The Context is a DotDict subclass, and can be instantiated the same way.
| PARAMETER | DESCRIPTION |
|---|---|
- *args
|
arguments to provide to the
TYPE:
|
- **kwargs
|
any key / value pairs to initialize this context with
TYPE:
|
Source code in src/seeknal/context.py
Functions¶
get(key: str, default: Any = None) -> Any
¶
This method is defined for MyPy, which otherwise tries to type
the inherited .get() method incorrectly.
| PARAMETER | DESCRIPTION |
|---|---|
- key
|
the key to retrieve
TYPE:
|
- default
|
a default value to return if the key is not found
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
|
Source code in src/seeknal/configuration.py
copy() -> DotDict
¶
to_dict() -> dict
¶
Converts current DotDict (and any DotDicts contained within)
to an appropriate nested dictionary.
Functions¶
configure_logging(testing: bool = False) -> logging.Logger
¶
Creates a "seeknal" root logger with a StreamHandler that has level and formatting
set from seeknal.config.
| PARAMETER | DESCRIPTION |
|---|---|
- testing
|
a boolean specifying whether this configuration is for testing purposes only; this helps us isolate any global state during testing by configuring a "seeknal-test-logger" instead of the standard "seeknal" logger
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Logger
|
|
Source code in src/seeknal/context.py
get_logger(name: str = None) -> logging.Logger
¶
Returns a logger.
| PARAMETER | DESCRIPTION |
|---|---|
- name
|
if
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Logger
|
|
Source code in src/seeknal/context.py
Configuration¶
The Configuration module handles configuration management, including loading and validating configuration files.
Classes¶
DotDict(init_dict: Optional[DictLike] = None, **kwargs: Any)
¶
Bases: MutableMapping
A dict that also supports attribute ("dot") access. Think of this as an extension
to the standard python dict object. Note: while any hashable object can be added to
a DotDict, only valid Python identifiers can be accessed with the dot syntax; this excludes
strings which begin in numbers, special characters, or double underscores.
| PARAMETER | DESCRIPTION |
|---|---|
- init_dict
|
dictionary to initialize the
TYPE:
|
- **kwargs
|
key, value pairs with which to initialize the
TYPE:
|
Example
Source code in src/seeknal/configuration.py
Functions¶
get(key: str, default: Any = None) -> Any
¶
This method is defined for MyPy, which otherwise tries to type
the inherited .get() method incorrectly.
| PARAMETER | DESCRIPTION |
|---|---|
- key
|
the key to retrieve
TYPE:
|
- default
|
a default value to return if the key is not found
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
|
Source code in src/seeknal/configuration.py
copy() -> DotDict
¶
to_dict() -> dict
¶
Converts current DotDict (and any DotDicts contained within)
to an appropriate nested dictionary.
Config
¶
Bases: Box
A config is a Box subclass
Functions¶
copy() -> Config
¶
Create a recursive copy of the config. Each level of the Config is a new Config object, so modifying keys won't affect the original Config object. However, values are not deep-copied, and mutations can affect the original.
Source code in src/seeknal/configuration.py
Functions¶
merge_dicts(d1: DictLike, d2: DictLike) -> DictLike
¶
Updates d1 from d2 by replacing each (k, v1) pair in d1 with the
corresponding (k, v2) pair in d2.
If the value of each pair is itself a dict, then the value is updated recursively.
| PARAMETER | DESCRIPTION |
|---|---|
- d1
|
A dictionary to be replaced
TYPE:
|
- d2
|
A dictionary used for replacement
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictLike
|
|
Source code in src/seeknal/configuration.py
as_nested_dict(obj: Union[DictLike, Iterable[DictLike]], dct_class: type = DotDict) -> Union[DictLike, Iterable[DictLike]]
¶
Given a obj formatted as a dictionary, transforms it (and any nested dictionaries) into the provided dct_class
| PARAMETER | DESCRIPTION |
|---|---|
- obj
|
An object that is formatted as a
TYPE:
|
- dct_class
|
the
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[DictLike, Iterable[DictLike]]
|
|
Source code in src/seeknal/configuration.py
flatdict_to_dict(dct: dict, dct_class: Optional[Type[D]] = None) -> D
¶
Converts a flattened dictionary back to a nested dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
- dct
|
The dictionary to be nested. Each key should be a
TYPE:
|
- dct_class
|
the type of the result; defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
D
|
|
Source code in src/seeknal/configuration.py
string_to_type(val: str) -> Union[bool, int, float, str]
¶
Helper function for transforming string env var values into typed values.
Maps
- "true" (any capitalization) to
True - "false" (any capitalization) to
False - any other valid literal Python syntax interpretable by ast.literal_eval
| PARAMETER | DESCRIPTION |
|---|---|
- val
|
the string value of an environment variable
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[bool, int, float, str]
|
Union[bool, int, float, str, dict, list, None, tuple]: the type-cast env var value |
Source code in src/seeknal/configuration.py
interpolate_env_vars(env_var: str) -> Optional[Union[bool, int, float, str]]
¶
Expands (potentially nested) env vars by repeatedly applying
expandvars and expanduser until interpolation stops having
any effect.
Source code in src/seeknal/configuration.py
create_user_config(dest_path: str, source_path: str = '') -> None
¶
Copies the default configuration to a user-customizable file at dest_path
Source code in src/seeknal/configuration.py
dict_to_flatdict(dct: DictLike, parent: Optional[CompoundKey] = None) -> dict
¶
Converts a (nested) dictionary to a flattened representation.
Each key of the flat dict will be a CompoundKey tuple containing the "chain of keys" for the corresponding value.
| PARAMETER | DESCRIPTION |
|---|---|
- dct
|
The dictionary to flatten
TYPE:
|
- parent
|
Defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
|
Source code in src/seeknal/configuration.py
process_task_defaults(config: Config) -> Config
¶
Converts task defaults from basic types to Python objects like timedeltas
| PARAMETER | DESCRIPTION |
|---|---|
- config
|
the configuration to modify
TYPE:
|
Source code in src/seeknal/configuration.py
to_environment_variables(config: Config, include: Optional[Iterable[str]] = None, prefix: str = '') -> dict
¶
Convert a configuration object to environment variables
Values will be cast to strings using 'str'
| PARAMETER | DESCRIPTION |
|---|---|
- config
|
The configuration object to parse
|
- include
|
An optional set of keys to include. Each key to include should be formatted as 'section.key' or 'section.section.key'
|
- prefix
|
The prefix for the environment variables. Defaults to "".
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
|
Source code in src/seeknal/configuration.py
validate_config(config: Config) -> None
¶
Validates that the configuration file is valid. - keys do not shadow Config methods
Note that this is performed when the config is first loaded, but not after.
Source code in src/seeknal/configuration.py
load_toml(path: str) -> dict
¶
interpolate_config(config: dict, env_var_prefix: Optional[str] = None) -> Config
¶
Processes a config dictionary, such as the one loaded from load_toml.
Source code in src/seeknal/configuration.py
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 | |
load_configuration(path: str = '', user_config_path: Optional[str] = None, backend_config_path: Optional[str] = None, env_var_prefix: Optional[str] = None) -> Config
¶
Loads a configuration.
| PARAMETER | DESCRIPTION |
|---|---|
- path
|
DEPRECATED - no longer used
TYPE:
|
- user_config_path
|
an optional path to a user config file. If a user config is provided, it will be used to update the main config prior to interpolation
TYPE:
|
- backend_config_path
|
an optional path to a backend config file
TYPE:
|
- env_var_prefix
|
any env vars matching this prefix will be used to create configuration values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Config
|
|
Source code in src/seeknal/configuration.py
Validation Utilities¶
Seeknal provides comprehensive validation utilities for SQL identifiers, table names, column names, and file paths to prevent injection attacks and ensure data integrity.
SQL Identifier and Path Validation Module.
This module provides functions to validate SQL identifiers (table names, column names, database names) and file paths to prevent SQL injection attacks, DoS through long names, and schema confusion.
Classes¶
Functions¶
validate_sql_identifier(identifier: str, identifier_type: str = 'identifier', max_length: int = SQL_IDENTIFIER_MAX_LENGTH) -> str
¶
Validate a SQL identifier (table name, column name, database name).
SQL identifiers must: - Start with a letter (a-z, A-Z) or underscore () - Contain only alphanumeric characters (a-z, A-Z, 0-9) and underscores () - Be no longer than max_length characters (default 128) - Not be empty
| PARAMETER | DESCRIPTION |
|---|---|
identifier
|
The SQL identifier to validate.
TYPE:
|
identifier_type
|
Type of identifier for error messages (e.g., "table name").
TYPE:
|
max_length
|
Maximum allowed length for the identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated identifier (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the identifier is invalid. |
Source code in src/seeknal/validation.py
validate_table_name(table_name: str) -> str
¶
Validate a SQL table name.
Table names must follow SQL identifier rules: - Start with a letter or underscore - Contain only alphanumeric characters and underscores - Be no longer than 128 characters
| PARAMETER | DESCRIPTION |
|---|---|
table_name
|
The table name to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated table name (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the table name is invalid. |
Source code in src/seeknal/validation.py
validate_column_name(column_name: str) -> str
¶
Validate a SQL column name.
Column names must follow SQL identifier rules: - Start with a letter or underscore - Contain only alphanumeric characters and underscores - Be no longer than 128 characters
| PARAMETER | DESCRIPTION |
|---|---|
column_name
|
The column name to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated column name (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the column name is invalid. |
Source code in src/seeknal/validation.py
validate_database_name(database_name: str) -> str
¶
Validate a SQL database name.
Database names must follow SQL identifier rules: - Start with a letter or underscore - Contain only alphanumeric characters and underscores - Be no longer than 128 characters
| PARAMETER | DESCRIPTION |
|---|---|
database_name
|
The database name to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated database name (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the database name is invalid. |
Source code in src/seeknal/validation.py
validate_file_path(file_path: str, max_length: int = FILE_PATH_MAX_LENGTH, forbidden_chars: Optional[List[str]] = None) -> str
¶
Validate a file path to ensure it doesn't contain SQL injection characters.
File paths are validated to: - Not be empty - Not exceed max_length characters (default 4096) - Not contain forbidden characters that could be used for SQL injection
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
The file path to validate.
TYPE:
|
max_length
|
Maximum allowed length for the path.
TYPE:
|
forbidden_chars
|
List of forbidden character sequences. Defaults to common SQL injection characters: ', ", ;, --, /*, */
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated file path (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidPathError
|
If the file path is invalid or contains forbidden characters. |
Source code in src/seeknal/validation.py
validate_sql_value(value: str, value_type: str = 'value', forbidden_patterns: Optional[List[str]] = None) -> str
¶
Validate a value used in SQL WHERE clauses to prevent SQL injection.
This function checks for common SQL injection patterns in values that will be used in WHERE clauses or other SQL contexts.
| PARAMETER | DESCRIPTION |
|---|---|
value
|
The value to validate.
TYPE:
|
value_type
|
Type of value for error messages (e.g., "filter value").
TYPE:
|
forbidden_patterns
|
List of forbidden patterns. Defaults to common SQL injection patterns like UNION, DROP, DELETE, etc.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated value (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the value contains forbidden patterns. |
Source code in src/seeknal/validation.py
validate_column_names(column_names: List[str]) -> List[str]
¶
Validate a list of SQL column names.
| PARAMETER | DESCRIPTION |
|---|---|
column_names
|
List of column names to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[str]
|
The validated list of column names (unchanged if all valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If any column name is invalid. |
Source code in src/seeknal/validation.py
validate_schema_name(schema_name: str) -> str
¶
Validate a SQL schema name.
Schema names must follow SQL identifier rules: - Start with a letter or underscore - Contain only alphanumeric characters and underscores - Be no longer than 128 characters
| PARAMETER | DESCRIPTION |
|---|---|
schema_name
|
The schema name to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated schema name (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the schema name is invalid. |
Source code in src/seeknal/validation.py
Exceptions¶
Core exception classes for error handling throughout the Seeknal library.