FeatureStore API Reference¶
This page documents the FeatureStore module, which provides the core feature store functionality for managing and serving ML features for both offline (batch) and online (real-time) use cases.
Overview¶
The FeatureStore module provides essential classes for feature engineering and serving:
| Class | Purpose |
|---|---|
FeatureGroup |
Define and manage groups of features with customizable materialization |
FeatureLookup |
Specify feature lookups from feature groups |
HistoricalFeatures |
Retrieve historical feature data with point-in-time correctness |
OnlineFeatures |
Serve features in real-time for model inference |
OfflineStore |
Configure offline storage backends |
OnlineStore |
Configure online storage backends |
FeatureGroup¶
The FeatureGroup class is the primary abstraction for defining and managing groups of related features. It handles feature materialization to both offline and online stores.
Bases: FeatureStore
A feature group representing a set of features created from a data source.
A FeatureGroup is a logical grouping of related features that share the same entity and are typically computed from the same data source (Flow or DataFrame). It supports both offline and online materialization for feature storage and serving.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
The unique name of the feature group within the project.
TYPE:
|
materialization |
Configuration for how features are stored and served, including offline/online storage settings and TTL configurations.
TYPE:
|
source |
The data source for computing features. Can be a Flow pipeline or a Spark DataFrame. If Flow, output will be set to SPARK_DATAFRAME.
TYPE:
|
description |
Optional human-readable description of the feature group.
TYPE:
|
features |
List of Feature objects to register. If None, all columns except join_keys and event_time will be registered as features.
TYPE:
|
tag |
Optional list of tags for categorizing and filtering feature groups.
TYPE:
|
validation_config |
Configuration for data validation including validators to run and validation mode (FAIL or WARN). When set, enables declarative validation that can be used with the validate() method.
TYPE:
|
feature_group_id |
Unique identifier assigned by the system after creation.
TYPE:
|
offline_watermarks |
List of timestamps indicating when offline data was materialized. Used for tracking data freshness.
TYPE:
|
online_watermarks |
List of timestamps indicating when online data was materialized. Used for tracking data freshness.
TYPE:
|
version |
Schema version number for the feature group.
TYPE:
|
created_at |
Timestamp when the feature group was created.
TYPE:
|
updated_at |
Timestamp when the feature group was last updated.
TYPE:
|
avro_schema |
Avro schema definition for the feature data structure.
TYPE:
|
Example
from seeknal.featurestore import FeatureGroup, Materialization from seeknal.entity import Entity
Create a feature group from a flow¶
fg = FeatureGroup( ... name="user_features", ... materialization=Materialization(event_time_col="event_date"), ... ) fg.entity = Entity(name="user", join_keys=["user_id"]) fg.set_flow(my_flow).set_features().get_or_create()
Note
A SparkContext must be active before creating a FeatureGroup instance. Initialize your Project and Workspace first if you encounter SparkContext errors.
Functions¶
set_flow(flow: Flow)
¶
Set the data flow pipeline as the source for this feature group.
Configures a Flow as the data source for computing features. The flow's output will be automatically set to SPARK_DATAFRAME if not already configured. If the flow hasn't been persisted, it will be created.
| PARAMETER | DESCRIPTION |
|---|---|
flow
|
The Flow pipeline to use for feature computation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroup
|
The current instance for method chaining. |
Example
fg = FeatureGroup(name="user_features") fg.set_flow(my_transformation_flow)
Source code in src/seeknal/featurestore/feature_group.py
set_dataframe(dataframe: DataFrame)
¶
Set a Spark DataFrame as the source for this feature group.
Configures a pre-computed Spark DataFrame as the data source for features. This is useful when features have already been computed or when using ad-hoc data that doesn't require a Flow pipeline.
| PARAMETER | DESCRIPTION |
|---|---|
dataframe
|
The Spark DataFrame containing the feature data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroup
|
The current instance for method chaining. |
Example
fg = FeatureGroup(name="user_features") fg.set_dataframe(my_spark_df)
Source code in src/seeknal/featurestore/feature_group.py
set_validation_config(config: ValidationConfig)
¶
Set validation configuration for this feature group.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
Validation configuration containing validators to run and validation mode (FAIL or WARN).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
self
|
Returns the FeatureGroup instance for method chaining. |
Example
from seeknal.feature_validation.models import ValidationConfig, ValidatorConfig
config = ValidationConfig( ... mode=ValidationMode.WARN, ... validators=[ ... ValidatorConfig(validator_type="null", columns=["user_id"]), ... ValidatorConfig(validator_type="range", columns=["age"], ... params={"min_val": 0, "max_val": 120}) ... ] ... ) feature_group.set_validation_config(config)
Source code in src/seeknal/featurestore/feature_group.py
validate(validators: List[BaseValidator], mode: Union[str, ValidationMode] = ValidationMode.FAIL, reference_date: Optional[str] = None) -> ValidationSummary
¶
Validate the feature group data using the provided validators.
This method runs a list of validators against the feature group's data and returns a summary of the validation results. The validation can be configured to either warn on failures (continue execution) or fail immediately on the first validation failure.
| PARAMETER | DESCRIPTION |
|---|---|
validators
|
List of validators to run against the feature group data. Each validator should be an instance of a class that inherits from BaseValidator (e.g., NullValidator, RangeValidator, UniquenessValidator, FreshnessValidator, or CustomValidator).
TYPE:
|
mode
|
Validation execution mode. - ValidationMode.FAIL or "fail": Raise exception on first failure. - ValidationMode.WARN or "warn": Log failures but continue execution. Defaults to ValidationMode.FAIL.
TYPE:
|
reference_date
|
Reference date for running the source Flow. Only used when source is a Flow. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ValidationSummary
|
A summary containing all validation results, pass/fail status, and counts.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If source is not set (use @require_set_source decorator). |
ValidationException
|
If mode is FAIL and any validator fails. |
Example
from seeknal.feature_validation.validators import NullValidator, RangeValidator from seeknal.feature_validation.models import ValidationMode
Create validators¶
validators = [ ... NullValidator(columns=["user_id", "email"]), ... RangeValidator(column="age", min_val=0, max_val=120) ... ]
Run validation in warn mode (continues on failures)¶
summary = feature_group.validate(validators, mode=ValidationMode.WARN) print(f"Passed: {summary.passed}, Failed: {summary.failed_count}")
Run validation in fail mode (stops on first failure)¶
try: ... summary = feature_group.validate(validators, mode="fail") ... except ValidationException as e: ... print(f"Validation failed: {e.message}")
Source code in src/seeknal/featurestore/feature_group.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | |
set_features(features: Optional[List[Feature]] = None, reference_date: Optional[str] = None)
¶
Set features to be used for this feature group. If features set as None, then it will use all columns except join_keys and event_time as features
| PARAMETER | DESCRIPTION |
|---|---|
features
|
Specify features. If this None, then automatically get features from transformation result. In addition, user may tell the feature name and description, then the detail about datatype automatically fetch from transformation result. Defaults to None.
TYPE:
|
reference_date
|
Specify date can be used as reference for get features from the transformation. Defaults to None.
TYPE:
|
validate_with_source
|
If set as true, it won't validate with transformation result. Defaults to True.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If specify features not found from the transformation result |
| RETURNS | DESCRIPTION |
|---|---|
|
Populate features of the feature group |
Source code in src/seeknal/featurestore/feature_group.py
418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 | |
get_or_create(version=None)
¶
The get_or_create function retrieves an existing feature group or creates a new one based on the
provided parameters.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
The
DEFAULT:
|
group to retrieve. If a version is provided, the code will load the feature group with that specific version. If no version is provided, the code will load the latest version of the feature group.
| RETURNS | DESCRIPTION |
|---|---|
|
The method |
operations and updating its attributes.
Source code in src/seeknal/featurestore/feature_group.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 | |
update_materialization(offline: Optional[bool] = None, online: Optional[bool] = None, offline_materialization: Optional[OfflineMaterialization] = None, online_materialization: Optional[OnlineMaterialization] = None)
¶
Update the materialization settings for this feature group.
Modifies the materialization configuration and persists the changes to the feature store backend. This allows changing storage settings, TTL values, and enabling/disabling offline or online storage.
| PARAMETER | DESCRIPTION |
|---|---|
offline
|
Enable or disable offline storage. If None, keeps current setting.
TYPE:
|
online
|
Enable or disable online storage. If None, keeps current setting.
TYPE:
|
offline_materialization
|
New offline materialization configuration. If None, keeps current setting.
TYPE:
|
online_materialization
|
New online materialization configuration. If None, keeps current setting.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroup
|
The current instance for method chaining. |
Example
fg.update_materialization( ... online=True, ... online_materialization=OnlineMaterialization(ttl=2880) ... )
Source code in src/seeknal/featurestore/feature_group.py
list_versions()
¶
List all versions of this feature group.
Returns a list of dictionaries containing version metadata including: - version: The version number - avro_schema: The Avro schema for this version (as dict) - created_at: When the version was created - updated_at: When the version was last updated - feature_count: Number of features in this version
| RETURNS | DESCRIPTION |
|---|---|
|
List[dict]: A list of version metadata dictionaries, ordered by version number descending (latest first). Returns an empty list if the feature group has not been saved or has no versions. |
Example
fg = FeatureGroup(name="user_features").get_or_create() versions = fg.list_versions() for v in versions: ... print(f"Version {v['version']}: {v['feature_count']} features")
Source code in src/seeknal/featurestore/feature_group.py
get_version(version: int) -> Optional[dict]
¶
Get metadata for a specific version of this feature group.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
The version number to retrieve.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[dict]
|
Optional[dict]: A dictionary containing version metadata if found, None if the version doesn't exist. The dictionary includes: - version: The version number - avro_schema: The Avro schema for this version (as dict) - created_at: When the version was created - updated_at: When the version was last updated - feature_count: Number of features in this version |
Example
fg = FeatureGroup(name="user_features").get_or_create() v1 = fg.get_version(1) if v1: ... print(f"Version 1 has {v1['feature_count']} features")
Source code in src/seeknal/featurestore/feature_group.py
compare_versions(from_version: int, to_version: int) -> Optional[dict]
¶
Compare schemas between two versions of this feature group.
Identifies added, removed, and modified features between the two versions by comparing their Avro schemas.
| PARAMETER | DESCRIPTION |
|---|---|
from_version
|
The base version number to compare from.
TYPE:
|
to_version
|
The target version number to compare to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[dict]
|
Optional[dict]: A dictionary containing the comparison result if both versions exist, None if either version doesn't exist or the feature group has not been saved. The dictionary includes: - from_version: The base version number - to_version: The target version number - added: List of field names added in to_version - removed: List of field names removed in to_version - modified: List of dicts with field name and type changes |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If from_version equals to_version. |
Example
fg = FeatureGroup(name="user_features").get_or_create() diff = fg.compare_versions(1, 2) if diff: ... print(f"Added features: {diff['added']}") ... print(f"Removed features: {diff['removed']}") ... print(f"Modified features: {diff['modified']}")
Source code in src/seeknal/featurestore/feature_group.py
delete()
¶
Delete this feature group and its associated data.
Removes the feature group from the feature store backend along with any data stored in the offline store. This operation is irreversible.
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroupRequest
|
The request object used to perform the deletion. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the feature group has not been saved (no feature_group_id). |
Note
This requires an active workspace and project context. The feature group must have been previously saved using get_or_create().
Source code in src/seeknal/featurestore/feature_group.py
write(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None, output_date_pattern: str = 'yyyyMMdd')
¶
Writes the feature group data to the offline store, using the specified feature start and end times and output date pattern.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
The start time for the feature data. If None, the current date is used.
TYPE:
|
feature_end_time
|
The end time for the feature data. If None, all available data is used.
TYPE:
|
output_date_pattern
|
The output date pattern for the feature data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in src/seeknal/featurestore/feature_group.py
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 | |
FeatureLookup¶
The FeatureLookup class specifies how to look up features from a feature group, with optional feature selection and exclusion.
A class that represents a feature lookup operation in a feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
source |
The feature store to perform the lookup on.
TYPE:
|
features |
A list of feature names to include in the lookup. If None, all features in the store will be included.
TYPE:
|
exclude_features |
A list of feature names to exclude from the lookup. If None, no features will be excluded.
TYPE:
|
Materialization¶
The Materialization class configures how features are materialized to offline and online stores.
Bases: BaseModel
Materialization options
| ATTRIBUTE | DESCRIPTION |
|---|---|
event_time_col |
Specify which column that contains event time. Default to None.
TYPE:
|
date_pattern |
Date pattern that use in event_time_col. Default to "yyyy-MM-dd".
TYPE:
|
offline |
Set the feature group should be stored in offline-store. Default to True.
TYPE:
|
online |
Set the feature group should be stored in online-store. Default to False.
TYPE:
|
serving_ttl_days |
Look back window for features defined at the online-store. This parameters determines how long features will live in the online store. The unit is in days. Shorter TTLs improve performance and reduce computation. Default to 1. For example, if we set TTLs as 1 then only one day data available in online-store
TYPE:
|
force_update_online |
force to update the data in online-store. This will not consider to check whether the data going materialized newer than the data that already stored in online-store. Default to False.
TYPE:
|
online_write_mode |
Write mode when materialize to online-store. Default to "Append"
TYPE:
|
schema_version |
Determine which schema version for the feature group. Default to None.
TYPE:
|
HistoricalFeatures¶
The HistoricalFeatures class retrieves historical feature data with point-in-time correctness for training ML models.
A class for retrieving historical features from a feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
lookups |
A list of FeatureLookup objects representing the features to retrieve.
TYPE:
|
Functions¶
using_spine(spine: pd.DataFrame, date_col: Optional[str] = None, offset: int = 0, length: Optional[int] = None, keep_cols: Optional[List[str]] = None)
¶
Adds a spine DataFrame to the feature store serving pipeline.
| PARAMETER | DESCRIPTION |
|---|---|
spine
|
The spine DataFrame to add to the pipeline.
TYPE:
|
date_col
|
The name of the column containing the date to use for point-in-time joins. If not provided, point-in-time joins will not be performed.
TYPE:
|
offset
|
number of days to use as a reference point for join. E.g. offset=3, how='past' means that features dates equal (and older than) to three days before application date will be joined. Defaults to 0.
TYPE:
|
length
|
when how is not equal to 'point in time' limit the period of feature dates to join. Defaults to no limit.
TYPE:
|
keep_cols
|
A list of column names to keep from the spine DataFrame. If not provided, none columns will be kept.
TYPE:
|
Source code in src/seeknal/featurestore/feature_group.py
1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 | |
to_dataframe(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None) -> DataFrame
¶
Returns a pandas DataFrame containing the transformed feature data within the specified time range.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
The start time of the time range to filter the feature data.
TYPE:
|
feature_end_time
|
The end time of the time range to filter the feature data.
TYPE:
|
Source code in src/seeknal/featurestore/feature_group.py
OnlineFeatures¶
The OnlineFeatures class serves features in real-time for model inference with low-latency access patterns.
GetLatestTimeStrategy¶
The GetLatestTimeStrategy class defines strategies for retrieving the latest feature values based on timestamp.
OfflineStore¶
The OfflineStore class configures offline storage backends for batch feature storage. Supports Hive tables and Delta files.
Configuration for offline feature store storage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The storage configuration (path for FILE, database for HIVE_TABLE). A security warning will be logged if a file path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
The storage type (FILE or HIVE_TABLE).
TYPE:
|
name |
Optional name for this offline store configuration.
TYPE:
|
Functions¶
get_or_create()
¶
Retrieve an existing offline store or create a new one.
If an offline store with the specified name exists, it is retrieved and its configuration is loaded. Otherwise, a new offline store is created with the current configuration.
| RETURNS | DESCRIPTION |
|---|---|
OfflineStore
|
The current instance with id populated. |
Note
The store name defaults to "default" if not specified.
Source code in src/seeknal/featurestore/featurestore.py
list()
staticmethod
¶
List all registered offline stores.
Displays a formatted table of all offline stores with their names, kinds, and configuration values. If no stores are found, displays an appropriate message.
| RETURNS | DESCRIPTION |
|---|---|
None
|
Output is printed to the console. |
Source code in src/seeknal/featurestore/featurestore.py
delete(spark: Optional[SparkSession] = None, *args, **kwargs) -> bool
¶
Delete storage for a feature group from the offline store.
For FILE type: Deletes the directory containing the delta table. For HIVE_TABLE type: Drops the Hive table using Spark SQL.
| PARAMETER | DESCRIPTION |
|---|---|
spark
|
SparkSession instance (required for HIVE_TABLE, optional for FILE).
TYPE:
|
**kwargs
|
Must include 'project' and 'entity' to construct the table name.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful or resource didn't exist.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If required kwargs (project, entity) are missing. |
Source code in src/seeknal/featurestore/featurestore.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | |
OnlineStore¶
The OnlineStore class configures online storage backends for real-time feature serving.
Configuration for online feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The storage configuration (file path or hive table). A security warning will be logged if a file path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
Type of online store (FILE or HIVE_TABLE).
TYPE:
|
name |
Optional name for the store.
TYPE:
|
Functions¶
delete(*args, **kwargs)
¶
Delete feature data from the online store.
Removes the directory containing the feature data for the specified project and feature name.
| PARAMETER | DESCRIPTION |
|---|---|
*args
|
Additional positional arguments (unused).
DEFAULT:
|
**kwargs
|
Keyword arguments including: - project (str): Project name for file naming. - name (str): Feature name for file naming.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the deletion was successful or if the path did not exist. |
Source code in src/seeknal/featurestore/featurestore.py
Feature¶
The Feature class represents an individual feature definition with name, data type, and fill null handling.
Bases: BaseModel
Define a Feature.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
Feature name.
TYPE:
|
feature_id |
Feature ID (assigned by seeknal).
TYPE:
|
description |
Feature description.
TYPE:
|
data_type |
Data type for the feature.
TYPE:
|
online_data_type |
Data type when stored in online-store.
TYPE:
|
created_at |
Creation timestamp.
TYPE:
|
updated_at |
Last update timestamp.
TYPE:
|
Functions¶
to_dict()
¶
Convert the feature definition to a dictionary representation.
Creates a dictionary suitable for API requests with metadata and data type information.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with structure: - metadata: dict containing 'name' and optionally 'description' - datatype: The feature's data type - onlineDatatype: The feature's online store data type |
Source code in src/seeknal/featurestore/featurestore.py
FillNull¶
The FillNull class defines how null values should be handled for features.
Bases: BaseModel
Configuration for filling null values in columns.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
Value to use for filling nulls.
TYPE:
|
dataType |
Data type for the value (e.g., 'double', 'string').
TYPE:
|
columns |
Optional list of columns to fill. If None, applies to all columns.
TYPE:
|
Storage Enums¶
Enumerations for storage backend configuration.
OfflineStoreEnum¶
Bases: str, Enum
Enumeration of supported offline storage types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Store features as a Hive table in a database.
|
FILE |
Store features as files on the filesystem (e.g., Delta format).
|
ICEBERG |
Store features in Apache Iceberg tables with ACID transactions, time travel, and cloud storage compatibility.
|
OnlineStoreEnum¶
Bases: str, Enum
Enumeration of supported online storage types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Store features as a Hive table for online serving.
|
FILE |
Store features as Parquet files for online serving.
|
FileKindEnum¶
Bases: str, Enum
Enumeration of supported file formats for feature storage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
DELTA |
Delta Lake format, providing ACID transactions and versioning.
|
Output Configurations¶
Classes for configuring feature store output destinations.
FeatureStoreHiveTableOutput¶
FeatureStoreFileOutput¶
Configuration for file-based feature store output.
| ATTRIBUTE | DESCRIPTION |
|---|---|
path |
The filesystem path for storing feature data. A security warning will be logged if this path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
The file format to use (default: DELTA).
TYPE:
|
Module Reference¶
Complete module reference for the featurestore package.
Feature store module for managing and serving ML features.
This module provides the core feature store functionality for Seeknal, enabling storage, management, and serving of machine learning features for both offline (batch) and online (real-time) use cases.
Key Components
- FeatureGroup: Define and manage groups of features with customizable materialization options for offline and online storage.
- FeatureLookup: Specify feature lookups from feature groups with optional feature selection and exclusion.
- HistoricalFeatures: Retrieve historical feature data with point-in-time correctness for training ML models.
- OnlineFeatures: Serve features in real-time for model inference with low-latency access patterns.
- OfflineStore: Configure offline storage backends (Hive tables or Delta files).
- OnlineStore: Configure online storage backends for real-time serving.
Storage Backends
- Hive Tables: Managed table storage with SQL access.
- Delta Files: File-based storage with ACID transactions and time travel.
Typical Usage
from seeknal.featurestore import FeatureGroup, FeatureLookup
from seeknal.featurestore import HistoricalFeatures, OnlineFeatures
# Define a feature group
feature_group = FeatureGroup(
name="user_features",
entity=user_entity,
materialization=Materialization(
offline=True,
online=True,
),
)
# Create and materialize features
feature_group.set_flow(my_flow).set_features().get_or_create()
feature_group.write()
# Retrieve historical features for training
historical = HistoricalFeatures(
lookups=[FeatureLookup(source=feature_group)]
)
training_df = historical.using_spine(spine_df).to_dataframe()
# Serve features online for inference
online = OnlineFeatures(
lookup_key=user_entity,
lookups=[FeatureLookup(source=feature_group)],
)
features = online.get_features(keys=[{"user_id": "123"}])
See Also
seeknal.entity: Entity definitions for feature store join keys. seeknal.flow: Data flow definitions for feature transformations. seeknal.tasks: Task definitions for data processing pipelines.
Classes¶
Modules¶
duckdbengine
¶
DuckDB-based Feature Store implementation.
This module provides a feature store implementation using DuckDB instead of Spark, enabling in-process feature engineering and serving without distributed infrastructure.
Classes¶
OfflineStoreDuckDB(value: Optional[Union[str, FeatureStoreFileOutput]] = None, kind: OfflineStoreEnum = OfflineStoreEnum.PARQUET, name: Optional[str] = None, connection: Optional[duckdb.DuckDBPyConnection] = None)
dataclass
¶
DuckDB-based offline feature store.
Stores features in DuckDB tables or Parquet files with metadata tracking. Provides ACID-like guarantees through atomic file operations and metadata.
Functions¶
write(df: pd.DataFrame, project: str, entity: str, name: str, mode: str = 'overwrite', start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, **kwargs) -> None
¶Write features to offline store.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Features to write
TYPE:
|
project
|
Project name
TYPE:
|
entity
|
Entity name
TYPE:
|
name
|
Feature group name
TYPE:
|
mode
|
Write mode - 'overwrite', 'append', or 'merge'
TYPE:
|
start_date
|
Start date for this batch
TYPE:
|
end_date
|
End date for this batch
TYPE:
|
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
read(project: str, entity: str, name: str, start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, **kwargs) -> pd.DataFrame
¶Read features from offline store.
| PARAMETER | DESCRIPTION |
|---|---|
project
|
Project name
TYPE:
|
entity
|
Entity name
TYPE:
|
name
|
Feature group name
TYPE:
|
start_date
|
Filter start date
TYPE:
|
end_date
|
Filter end date
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
get_watermarks(project: str, entity: str, name: str) -> list
¶Get watermarks for a feature group.
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
delete(project: str, entity: str, name: str) -> bool
¶Delete a feature group from the offline store.
Removes all data files and metadata associated with the feature group.
| PARAMETER | DESCRIPTION |
|---|---|
project
|
Project name
TYPE:
|
entity
|
Entity name
TYPE:
|
name
|
Feature group name
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful, False otherwise |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
OnlineStoreDuckDB(value: Optional[Union[str, FeatureStoreFileOutput]] = None, kind: OnlineStoreEnum = OnlineStoreEnum.DUCKDB_TABLE, name: Optional[str] = None, connection: Optional[duckdb.DuckDBPyConnection] = None)
dataclass
¶
DuckDB-based online feature store.
Stores features for low-latency serving using DuckDB tables.
Functions¶
write(df: pd.DataFrame, table_name: str, **kwargs) -> None
¶Write features to online store.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Features to write
TYPE:
|
table_name
|
Name of the online table
TYPE:
|
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
read(table_name: str, keys: Optional[list] = None, **kwargs) -> pd.DataFrame
¶Read features from online store.
| PARAMETER | DESCRIPTION |
|---|---|
table_name
|
Name of the online table
TYPE:
|
keys
|
Filter by specific keys (list of dicts with key-value pairs)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
delete(name: str, project: str, entity: Optional[str] = None) -> bool
¶Delete an online table from the store.
Removes the DuckDB table and any associated parquet files.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Table name
TYPE:
|
project
|
Project name
TYPE:
|
entity
|
Entity name (optional, used for file-based storage path)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful, False otherwise |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
FeatureStoreFileOutput(path: str)
dataclass
¶
File-based feature store output configuration.
FeatureGroupDuckDB(name: str, entity: Entity, materialization: Materialization = Materialization(), dataframe: Optional[pd.DataFrame] = None, description: Optional[str] = None, features: Optional[List[str]] = None, project: str = 'default', offline_watermarks: List[str] = list(), online_watermarks: List[str] = list(), version: Optional[int] = None)
dataclass
¶
DuckDB-based Feature Group.
A feature group is a collection of related features computed from source data.
Functions¶
set_dataframe(dataframe: pd.DataFrame) -> FeatureGroupDuckDB
¶set_features(features: Optional[List[str]] = None) -> FeatureGroupDuckDB
¶Set which features to include.
If features is None, all columns except entity keys and event_time are used.
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
write(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None, mode: str = 'overwrite') -> None
¶Materialize features to offline/online stores.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
Start time for this batch
TYPE:
|
feature_end_time
|
End time for this batch
TYPE:
|
mode
|
Write mode - 'overwrite', 'append', or 'merge'
TYPE:
|
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
get_or_create() -> FeatureGroupDuckDB
¶Get or create this feature group (idempotent operation).
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
delete() -> bool
¶Delete this feature group.
Removes all data files from the offline store.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
HistoricalFeaturesDuckDB(lookups: List[FeatureLookup], fill_nulls: Optional[List[FillNull]] = None, spine: Optional[pd.DataFrame] = None, date_col: Optional[str] = None, keep_cols: Optional[List[str]] = None, latest_strategy: Optional[GetLatestTimeStrategy] = None)
dataclass
¶
Point-in-time historical feature retrieval using DuckDB.
Performs point-in-time correct joins to get features as they existed at specific points in time.
Functions¶
to_dataframe(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None) -> pd.DataFrame
¶Retrieve historical features as a DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
Start time for features
TYPE:
|
feature_end_time
|
End time for features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
using_spine(spine: pd.DataFrame, date_col: str, keep_cols: Optional[List[str]] = None) -> HistoricalFeaturesDuckDB
¶Use a spine (entity-date pairs) for point-in-time feature retrieval.
| PARAMETER | DESCRIPTION |
|---|---|
spine
|
DataFrame with entity keys and application dates
TYPE:
|
date_col
|
Name of the date column in spine
TYPE:
|
keep_cols
|
Columns from spine to keep in the result
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
HistoricalFeaturesDuckDB
|
Self for chaining |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
using_latest(fetch_strategy: GetLatestTimeStrategy = GetLatestTimeStrategy.REQUIRE_ALL) -> HistoricalFeaturesDuckDB
¶Get the latest available features.
| PARAMETER | DESCRIPTION |
|---|---|
fetch_strategy
|
Strategy for handling missing features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
HistoricalFeaturesDuckDB
|
Self for chaining |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
to_dataframe_with_spine() -> pd.DataFrame
¶Retrieve features using point-in-time join with spine.
For each row in the spine, gets features as they existed at or before the application date.
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | |
serve(name: Optional[str] = None, target: Optional[OnlineStoreDuckDB] = None, ttl: Optional[timedelta] = None) -> OnlineFeaturesDuckDB
¶Materialize features to online store for serving.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Name for the online table
TYPE:
|
target
|
Target online store
TYPE:
|
ttl
|
Time-to-live for features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
OnlineFeaturesDuckDB
|
OnlineFeaturesDuckDB instance for serving |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
OnlineFeaturesDuckDB(name: str, lookup_key: Entity, online_store: Optional[OnlineStoreDuckDB] = None, lookups: Optional[List[FeatureLookup]] = None, project: str = 'default', id: Optional[str] = None)
dataclass
¶
Online feature serving using DuckDB.
Provides low-latency feature lookups for real-time predictions. Also known as OnlineTableDuckDB in some contexts.
Functions¶
get_features(keys: List[Union[Entity, Dict[str, Any]]]) -> pd.DataFrame
¶Get features for specific entity keys.
| PARAMETER | DESCRIPTION |
|---|---|
keys
|
List of entity instances or key dictionaries
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features for the requested keys |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
delete() -> bool
¶Delete this online table.
Removes all data files from the online store and cleans up metadata from the database.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If file deletion fails (metadata cleanup still attempted) |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
FeatureLookup(source: FeatureGroupDuckDB, features: Optional[List[str]] = None)
dataclass
¶
Defines which features to retrieve from a feature group.
FillNull(value: str, dataType: str)
dataclass
¶
Configuration for filling null values.
GetLatestTimeStrategy
¶
Bases: str, Enum
Strategy for getting latest features.
Materialization(event_time_col: Optional[str] = None, offline: bool = True, online: bool = False, offline_store: Optional[OfflineStoreDuckDB] = None, online_store: Optional[OnlineStoreDuckDB] = None)
dataclass
¶
Materialization configuration for a feature group.
Modules¶
feature_group
¶
DuckDB-based Feature Group implementation.
Provides feature group management, historical feature retrieval, and online serving using DuckDB instead of Spark.
Classes¶
GetLatestTimeStrategy
¶
Bases: str, Enum
Strategy for getting latest features.
FillNull(value: str, dataType: str)
dataclass
¶Configuration for filling null values.
FeatureLookup(source: FeatureGroupDuckDB, features: Optional[List[str]] = None)
dataclass
¶Defines which features to retrieve from a feature group.
Materialization(event_time_col: Optional[str] = None, offline: bool = True, online: bool = False, offline_store: Optional[OfflineStoreDuckDB] = None, online_store: Optional[OnlineStoreDuckDB] = None)
dataclass
¶Materialization configuration for a feature group.
FeatureGroupDuckDB(name: str, entity: Entity, materialization: Materialization = Materialization(), dataframe: Optional[pd.DataFrame] = None, description: Optional[str] = None, features: Optional[List[str]] = None, project: str = 'default', offline_watermarks: List[str] = list(), online_watermarks: List[str] = list(), version: Optional[int] = None)
dataclass
¶DuckDB-based Feature Group.
A feature group is a collection of related features computed from source data.
set_dataframe(dataframe: pd.DataFrame) -> FeatureGroupDuckDB
¶set_features(features: Optional[List[str]] = None) -> FeatureGroupDuckDB
¶Set which features to include.
If features is None, all columns except entity keys and event_time are used.
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
write(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None, mode: str = 'overwrite') -> None
¶Materialize features to offline/online stores.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
Start time for this batch
TYPE:
|
feature_end_time
|
End time for this batch
TYPE:
|
mode
|
Write mode - 'overwrite', 'append', or 'merge'
TYPE:
|
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
get_or_create() -> FeatureGroupDuckDB
¶Get or create this feature group (idempotent operation).
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
delete() -> bool
¶Delete this feature group.
Removes all data files from the offline store.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
HistoricalFeaturesDuckDB(lookups: List[FeatureLookup], fill_nulls: Optional[List[FillNull]] = None, spine: Optional[pd.DataFrame] = None, date_col: Optional[str] = None, keep_cols: Optional[List[str]] = None, latest_strategy: Optional[GetLatestTimeStrategy] = None)
dataclass
¶Point-in-time historical feature retrieval using DuckDB.
Performs point-in-time correct joins to get features as they existed at specific points in time.
to_dataframe(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None) -> pd.DataFrame
¶Retrieve historical features as a DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
Start time for features
TYPE:
|
feature_end_time
|
End time for features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
using_spine(spine: pd.DataFrame, date_col: str, keep_cols: Optional[List[str]] = None) -> HistoricalFeaturesDuckDB
¶Use a spine (entity-date pairs) for point-in-time feature retrieval.
| PARAMETER | DESCRIPTION |
|---|---|
spine
|
DataFrame with entity keys and application dates
TYPE:
|
date_col
|
Name of the date column in spine
TYPE:
|
keep_cols
|
Columns from spine to keep in the result
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
HistoricalFeaturesDuckDB
|
Self for chaining |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
using_latest(fetch_strategy: GetLatestTimeStrategy = GetLatestTimeStrategy.REQUIRE_ALL) -> HistoricalFeaturesDuckDB
¶Get the latest available features.
| PARAMETER | DESCRIPTION |
|---|---|
fetch_strategy
|
Strategy for handling missing features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
HistoricalFeaturesDuckDB
|
Self for chaining |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
to_dataframe_with_spine() -> pd.DataFrame
¶Retrieve features using point-in-time join with spine.
For each row in the spine, gets features as they existed at or before the application date.
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | |
serve(name: Optional[str] = None, target: Optional[OnlineStoreDuckDB] = None, ttl: Optional[timedelta] = None) -> OnlineFeaturesDuckDB
¶Materialize features to online store for serving.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Name for the online table
TYPE:
|
target
|
Target online store
TYPE:
|
ttl
|
Time-to-live for features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
OnlineFeaturesDuckDB
|
OnlineFeaturesDuckDB instance for serving |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
OnlineFeaturesDuckDB(name: str, lookup_key: Entity, online_store: Optional[OnlineStoreDuckDB] = None, lookups: Optional[List[FeatureLookup]] = None, project: str = 'default', id: Optional[str] = None)
dataclass
¶Online feature serving using DuckDB.
Provides low-latency feature lookups for real-time predictions. Also known as OnlineTableDuckDB in some contexts.
get_features(keys: List[Union[Entity, Dict[str, Any]]]) -> pd.DataFrame
¶Get features for specific entity keys.
| PARAMETER | DESCRIPTION |
|---|---|
keys
|
List of entity instances or key dictionaries
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features for the requested keys |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
delete() -> bool
¶Delete this online table.
Removes all data files from the online store and cleans up metadata from the database.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If file deletion fails (metadata cleanup still attempted) |
Source code in src/seeknal/featurestore/duckdbengine/feature_group.py
featurestore
¶
DuckDB-based offline and online feature stores.
This module replaces Spark/Delta Lake with DuckDB for feature storage and retrieval.
Classes¶
OfflineStoreEnum
¶
Bases: str, Enum
Offline store backend types.
OnlineStoreEnum
¶
Bases: str, Enum
Online store backend types.
FeatureStoreFileOutput(path: str)
dataclass
¶File-based feature store output configuration.
OfflineStoreDuckDB(value: Optional[Union[str, FeatureStoreFileOutput]] = None, kind: OfflineStoreEnum = OfflineStoreEnum.PARQUET, name: Optional[str] = None, connection: Optional[duckdb.DuckDBPyConnection] = None)
dataclass
¶DuckDB-based offline feature store.
Stores features in DuckDB tables or Parquet files with metadata tracking. Provides ACID-like guarantees through atomic file operations and metadata.
write(df: pd.DataFrame, project: str, entity: str, name: str, mode: str = 'overwrite', start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, **kwargs) -> None
¶Write features to offline store.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Features to write
TYPE:
|
project
|
Project name
TYPE:
|
entity
|
Entity name
TYPE:
|
name
|
Feature group name
TYPE:
|
mode
|
Write mode - 'overwrite', 'append', or 'merge'
TYPE:
|
start_date
|
Start date for this batch
TYPE:
|
end_date
|
End date for this batch
TYPE:
|
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
read(project: str, entity: str, name: str, start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, **kwargs) -> pd.DataFrame
¶Read features from offline store.
| PARAMETER | DESCRIPTION |
|---|---|
project
|
Project name
TYPE:
|
entity
|
Entity name
TYPE:
|
name
|
Feature group name
TYPE:
|
start_date
|
Filter start date
TYPE:
|
end_date
|
Filter end date
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
get_watermarks(project: str, entity: str, name: str) -> list
¶Get watermarks for a feature group.
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
delete(project: str, entity: str, name: str) -> bool
¶Delete a feature group from the offline store.
Removes all data files and metadata associated with the feature group.
| PARAMETER | DESCRIPTION |
|---|---|
project
|
Project name
TYPE:
|
entity
|
Entity name
TYPE:
|
name
|
Feature group name
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful, False otherwise |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
OnlineStoreDuckDB(value: Optional[Union[str, FeatureStoreFileOutput]] = None, kind: OnlineStoreEnum = OnlineStoreEnum.DUCKDB_TABLE, name: Optional[str] = None, connection: Optional[duckdb.DuckDBPyConnection] = None)
dataclass
¶DuckDB-based online feature store.
Stores features for low-latency serving using DuckDB tables.
write(df: pd.DataFrame, table_name: str, **kwargs) -> None
¶Write features to online store.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Features to write
TYPE:
|
table_name
|
Name of the online table
TYPE:
|
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
read(table_name: str, keys: Optional[list] = None, **kwargs) -> pd.DataFrame
¶Read features from online store.
| PARAMETER | DESCRIPTION |
|---|---|
table_name
|
Name of the online table
TYPE:
|
keys
|
Filter by specific keys (list of dicts with key-value pairs)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with features |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
delete(name: str, project: str, entity: Optional[str] = None) -> bool
¶Delete an online table from the store.
Removes the DuckDB table and any associated parquet files.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Table name
TYPE:
|
project
|
Project name
TYPE:
|
entity
|
Entity name (optional, used for file-based storage path)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful, False otherwise |
Source code in src/seeknal/featurestore/duckdbengine/featurestore.py
feature_group
¶
Classes¶
OfflineStoreEnum
¶
Bases: str, Enum
Enumeration of supported offline storage types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Store features as a Hive table in a database.
|
FILE |
Store features as files on the filesystem (e.g., Delta format).
|
ICEBERG |
Store features in Apache Iceberg tables with ACID transactions, time travel, and cloud storage compatibility.
|
OnlineStoreEnum
¶
Bases: str, Enum
Enumeration of supported online storage types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Store features as a Hive table for online serving.
|
FILE |
Store features as Parquet files for online serving.
|
FileKindEnum
¶
Bases: str, Enum
Enumeration of supported file formats for feature storage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
DELTA |
Delta Lake format, providing ACID transactions and versioning.
|
FeatureStoreFileOutput(path: str, kind: FileKindEnum = FileKindEnum.DELTA)
dataclass
¶
Configuration for file-based feature store output.
| ATTRIBUTE | DESCRIPTION |
|---|---|
path |
The filesystem path for storing feature data. A security warning will be logged if this path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
The file format to use (default: DELTA).
TYPE:
|
FeatureStoreHiveTableOutput(database: str)
dataclass
¶
IcebergStoreOutput(table: str, catalog: str = 'lakekeeper', namespace: str = 'default', warehouse: Optional[str] = None, mode: str = 'append')
dataclass
¶
Iceberg storage configuration for feature group materialization.
This configuration enables storing features in Apache Iceberg tables with ACID transactions, time travel, and cloud storage compatibility.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name within namespace
TYPE:
|
catalog
|
Catalog name from profiles.yml (default: "lakekeeper")
TYPE:
|
namespace
|
Iceberg namespace/database (default: "default")
TYPE:
|
warehouse
|
Optional warehouse path override (s3://, gs://, azure://)
TYPE:
|
mode
|
Write mode - "append" or "overwrite" (default: "append")
TYPE:
|
OfflineStore(value: Optional[Union[str, FeatureStoreFileOutput, FeatureStoreHiveTableOutput, IcebergStoreOutput]] = None, kind: OfflineStoreEnum = OfflineStoreEnum.HIVE_TABLE, name: Optional[str] = None)
dataclass
¶
Configuration for offline feature store storage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The storage configuration (path for FILE, database for HIVE_TABLE). A security warning will be logged if a file path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
The storage type (FILE or HIVE_TABLE).
TYPE:
|
name |
Optional name for this offline store configuration.
TYPE:
|
Functions¶
get_or_create()
¶Retrieve an existing offline store or create a new one.
If an offline store with the specified name exists, it is retrieved and its configuration is loaded. Otherwise, a new offline store is created with the current configuration.
| RETURNS | DESCRIPTION |
|---|---|
OfflineStore
|
The current instance with id populated. |
Note
The store name defaults to "default" if not specified.
Source code in src/seeknal/featurestore/featurestore.py
list()
staticmethod
¶List all registered offline stores.
Displays a formatted table of all offline stores with their names, kinds, and configuration values. If no stores are found, displays an appropriate message.
| RETURNS | DESCRIPTION |
|---|---|
None
|
Output is printed to the console. |
Source code in src/seeknal/featurestore/featurestore.py
delete(spark: Optional[SparkSession] = None, *args, **kwargs) -> bool
¶Delete storage for a feature group from the offline store.
For FILE type: Deletes the directory containing the delta table. For HIVE_TABLE type: Drops the Hive table using Spark SQL.
| PARAMETER | DESCRIPTION |
|---|---|
spark
|
SparkSession instance (required for HIVE_TABLE, optional for FILE).
TYPE:
|
**kwargs
|
Must include 'project' and 'entity' to construct the table name.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful or resource didn't exist.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If required kwargs (project, entity) are missing. |
Source code in src/seeknal/featurestore/featurestore.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | |
OnlineStore(value: Optional[Union[str, FeatureStoreFileOutput, FeatureStoreHiveTableOutput]] = None, kind: OnlineStoreEnum = OnlineStoreEnum.FILE, name: Optional[str] = None)
dataclass
¶
Configuration for online feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The storage configuration (file path or hive table). A security warning will be logged if a file path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
Type of online store (FILE or HIVE_TABLE).
TYPE:
|
name |
Optional name for the store.
TYPE:
|
Functions¶
delete(*args, **kwargs)
¶Delete feature data from the online store.
Removes the directory containing the feature data for the specified project and feature name.
| PARAMETER | DESCRIPTION |
|---|---|
*args
|
Additional positional arguments (unused).
DEFAULT:
|
**kwargs
|
Keyword arguments including: - project (str): Project name for file naming. - name (str): Feature name for file naming.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the deletion was successful or if the path did not exist. |
Source code in src/seeknal/featurestore/featurestore.py
OfflineMaterialization
¶
Bases: BaseModel
Configuration for offline materialization.
| ATTRIBUTE | DESCRIPTION |
|---|---|
store |
The offline store configuration.
TYPE:
|
mode |
Write mode ('overwrite', 'append', 'merge').
TYPE:
|
ttl |
Time-to-live in days for data retention.
TYPE:
|
OnlineMaterialization
¶
Bases: BaseModel
Configuration for online materialization.
| ATTRIBUTE | DESCRIPTION |
|---|---|
store |
The online store configuration.
TYPE:
|
ttl |
Time-to-live in minutes for online store data.
TYPE:
|
Materialization
¶
Bases: BaseModel
Materialization options
| ATTRIBUTE | DESCRIPTION |
|---|---|
event_time_col |
Specify which column that contains event time. Default to None.
TYPE:
|
date_pattern |
Date pattern that use in event_time_col. Default to "yyyy-MM-dd".
TYPE:
|
offline |
Set the feature group should be stored in offline-store. Default to True.
TYPE:
|
online |
Set the feature group should be stored in online-store. Default to False.
TYPE:
|
serving_ttl_days |
Look back window for features defined at the online-store. This parameters determines how long features will live in the online store. The unit is in days. Shorter TTLs improve performance and reduce computation. Default to 1. For example, if we set TTLs as 1 then only one day data available in online-store
TYPE:
|
force_update_online |
force to update the data in online-store. This will not consider to check whether the data going materialized newer than the data that already stored in online-store. Default to False.
TYPE:
|
online_write_mode |
Write mode when materialize to online-store. Default to "Append"
TYPE:
|
schema_version |
Determine which schema version for the feature group. Default to None.
TYPE:
|
FeatureGroup(name: str, entity: Optional[Entity] = None, id: Optional[str] = None)
dataclass
¶
Bases: FeatureStore
A feature group representing a set of features created from a data source.
A FeatureGroup is a logical grouping of related features that share the same entity and are typically computed from the same data source (Flow or DataFrame). It supports both offline and online materialization for feature storage and serving.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
The unique name of the feature group within the project.
TYPE:
|
materialization |
Configuration for how features are stored and served, including offline/online storage settings and TTL configurations.
TYPE:
|
source |
The data source for computing features. Can be a Flow pipeline or a Spark DataFrame. If Flow, output will be set to SPARK_DATAFRAME.
TYPE:
|
description |
Optional human-readable description of the feature group.
TYPE:
|
features |
List of Feature objects to register. If None, all columns except join_keys and event_time will be registered as features.
TYPE:
|
tag |
Optional list of tags for categorizing and filtering feature groups.
TYPE:
|
validation_config |
Configuration for data validation including validators to run and validation mode (FAIL or WARN). When set, enables declarative validation that can be used with the validate() method.
TYPE:
|
feature_group_id |
Unique identifier assigned by the system after creation.
TYPE:
|
offline_watermarks |
List of timestamps indicating when offline data was materialized. Used for tracking data freshness.
TYPE:
|
online_watermarks |
List of timestamps indicating when online data was materialized. Used for tracking data freshness.
TYPE:
|
version |
Schema version number for the feature group.
TYPE:
|
created_at |
Timestamp when the feature group was created.
TYPE:
|
updated_at |
Timestamp when the feature group was last updated.
TYPE:
|
avro_schema |
Avro schema definition for the feature data structure.
TYPE:
|
Example
from seeknal.featurestore import FeatureGroup, Materialization from seeknal.entity import Entity
Create a feature group from a flow¶
fg = FeatureGroup( ... name="user_features", ... materialization=Materialization(event_time_col="event_date"), ... ) fg.entity = Entity(name="user", join_keys=["user_id"]) fg.set_flow(my_flow).set_features().get_or_create()
Note
A SparkContext must be active before creating a FeatureGroup instance. Initialize your Project and Workspace first if you encounter SparkContext errors.
Functions¶
set_flow(flow: Flow)
¶Set the data flow pipeline as the source for this feature group.
Configures a Flow as the data source for computing features. The flow's output will be automatically set to SPARK_DATAFRAME if not already configured. If the flow hasn't been persisted, it will be created.
| PARAMETER | DESCRIPTION |
|---|---|
flow
|
The Flow pipeline to use for feature computation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroup
|
The current instance for method chaining. |
Example
fg = FeatureGroup(name="user_features") fg.set_flow(my_transformation_flow)
Source code in src/seeknal/featurestore/feature_group.py
set_dataframe(dataframe: DataFrame)
¶Set a Spark DataFrame as the source for this feature group.
Configures a pre-computed Spark DataFrame as the data source for features. This is useful when features have already been computed or when using ad-hoc data that doesn't require a Flow pipeline.
| PARAMETER | DESCRIPTION |
|---|---|
dataframe
|
The Spark DataFrame containing the feature data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroup
|
The current instance for method chaining. |
Example
fg = FeatureGroup(name="user_features") fg.set_dataframe(my_spark_df)
Source code in src/seeknal/featurestore/feature_group.py
set_validation_config(config: ValidationConfig)
¶Set validation configuration for this feature group.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
Validation configuration containing validators to run and validation mode (FAIL or WARN).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
self
|
Returns the FeatureGroup instance for method chaining. |
Example
from seeknal.feature_validation.models import ValidationConfig, ValidatorConfig
config = ValidationConfig( ... mode=ValidationMode.WARN, ... validators=[ ... ValidatorConfig(validator_type="null", columns=["user_id"]), ... ValidatorConfig(validator_type="range", columns=["age"], ... params={"min_val": 0, "max_val": 120}) ... ] ... ) feature_group.set_validation_config(config)
Source code in src/seeknal/featurestore/feature_group.py
validate(validators: List[BaseValidator], mode: Union[str, ValidationMode] = ValidationMode.FAIL, reference_date: Optional[str] = None) -> ValidationSummary
¶Validate the feature group data using the provided validators.
This method runs a list of validators against the feature group's data and returns a summary of the validation results. The validation can be configured to either warn on failures (continue execution) or fail immediately on the first validation failure.
| PARAMETER | DESCRIPTION |
|---|---|
validators
|
List of validators to run against the feature group data. Each validator should be an instance of a class that inherits from BaseValidator (e.g., NullValidator, RangeValidator, UniquenessValidator, FreshnessValidator, or CustomValidator).
TYPE:
|
mode
|
Validation execution mode. - ValidationMode.FAIL or "fail": Raise exception on first failure. - ValidationMode.WARN or "warn": Log failures but continue execution. Defaults to ValidationMode.FAIL.
TYPE:
|
reference_date
|
Reference date for running the source Flow. Only used when source is a Flow. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ValidationSummary
|
A summary containing all validation results, pass/fail status, and counts.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If source is not set (use @require_set_source decorator). |
ValidationException
|
If mode is FAIL and any validator fails. |
Example
from seeknal.feature_validation.validators import NullValidator, RangeValidator from seeknal.feature_validation.models import ValidationMode
Create validators¶
validators = [ ... NullValidator(columns=["user_id", "email"]), ... RangeValidator(column="age", min_val=0, max_val=120) ... ]
Run validation in warn mode (continues on failures)¶
summary = feature_group.validate(validators, mode=ValidationMode.WARN) print(f"Passed: {summary.passed}, Failed: {summary.failed_count}")
Run validation in fail mode (stops on first failure)¶
try: ... summary = feature_group.validate(validators, mode="fail") ... except ValidationException as e: ... print(f"Validation failed: {e.message}")
Source code in src/seeknal/featurestore/feature_group.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | |
set_features(features: Optional[List[Feature]] = None, reference_date: Optional[str] = None)
¶Set features to be used for this feature group. If features set as None, then it will use all columns except join_keys and event_time as features
| PARAMETER | DESCRIPTION |
|---|---|
features
|
Specify features. If this None, then automatically get features from transformation result. In addition, user may tell the feature name and description, then the detail about datatype automatically fetch from transformation result. Defaults to None.
TYPE:
|
reference_date
|
Specify date can be used as reference for get features from the transformation. Defaults to None.
TYPE:
|
validate_with_source
|
If set as true, it won't validate with transformation result. Defaults to True.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If specify features not found from the transformation result |
| RETURNS | DESCRIPTION |
|---|---|
|
Populate features of the feature group |
Source code in src/seeknal/featurestore/feature_group.py
418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 | |
get_or_create(version=None)
¶The get_or_create function retrieves an existing feature group or creates a new one based on the
provided parameters.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
The
DEFAULT:
|
group to retrieve. If a version is provided, the code will load the feature group with that specific version. If no version is provided, the code will load the latest version of the feature group.
| RETURNS | DESCRIPTION |
|---|---|
|
The method |
operations and updating its attributes.
Source code in src/seeknal/featurestore/feature_group.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 | |
update_materialization(offline: Optional[bool] = None, online: Optional[bool] = None, offline_materialization: Optional[OfflineMaterialization] = None, online_materialization: Optional[OnlineMaterialization] = None)
¶Update the materialization settings for this feature group.
Modifies the materialization configuration and persists the changes to the feature store backend. This allows changing storage settings, TTL values, and enabling/disabling offline or online storage.
| PARAMETER | DESCRIPTION |
|---|---|
offline
|
Enable or disable offline storage. If None, keeps current setting.
TYPE:
|
online
|
Enable or disable online storage. If None, keeps current setting.
TYPE:
|
offline_materialization
|
New offline materialization configuration. If None, keeps current setting.
TYPE:
|
online_materialization
|
New online materialization configuration. If None, keeps current setting.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroup
|
The current instance for method chaining. |
Example
fg.update_materialization( ... online=True, ... online_materialization=OnlineMaterialization(ttl=2880) ... )
Source code in src/seeknal/featurestore/feature_group.py
list_versions()
¶List all versions of this feature group.
Returns a list of dictionaries containing version metadata including: - version: The version number - avro_schema: The Avro schema for this version (as dict) - created_at: When the version was created - updated_at: When the version was last updated - feature_count: Number of features in this version
| RETURNS | DESCRIPTION |
|---|---|
|
List[dict]: A list of version metadata dictionaries, ordered by version number descending (latest first). Returns an empty list if the feature group has not been saved or has no versions. |
Example
fg = FeatureGroup(name="user_features").get_or_create() versions = fg.list_versions() for v in versions: ... print(f"Version {v['version']}: {v['feature_count']} features")
Source code in src/seeknal/featurestore/feature_group.py
get_version(version: int) -> Optional[dict]
¶Get metadata for a specific version of this feature group.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
The version number to retrieve.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[dict]
|
Optional[dict]: A dictionary containing version metadata if found, None if the version doesn't exist. The dictionary includes: - version: The version number - avro_schema: The Avro schema for this version (as dict) - created_at: When the version was created - updated_at: When the version was last updated - feature_count: Number of features in this version |
Example
fg = FeatureGroup(name="user_features").get_or_create() v1 = fg.get_version(1) if v1: ... print(f"Version 1 has {v1['feature_count']} features")
Source code in src/seeknal/featurestore/feature_group.py
compare_versions(from_version: int, to_version: int) -> Optional[dict]
¶Compare schemas between two versions of this feature group.
Identifies added, removed, and modified features between the two versions by comparing their Avro schemas.
| PARAMETER | DESCRIPTION |
|---|---|
from_version
|
The base version number to compare from.
TYPE:
|
to_version
|
The target version number to compare to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[dict]
|
Optional[dict]: A dictionary containing the comparison result if both versions exist, None if either version doesn't exist or the feature group has not been saved. The dictionary includes: - from_version: The base version number - to_version: The target version number - added: List of field names added in to_version - removed: List of field names removed in to_version - modified: List of dicts with field name and type changes |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If from_version equals to_version. |
Example
fg = FeatureGroup(name="user_features").get_or_create() diff = fg.compare_versions(1, 2) if diff: ... print(f"Added features: {diff['added']}") ... print(f"Removed features: {diff['removed']}") ... print(f"Modified features: {diff['modified']}")
Source code in src/seeknal/featurestore/feature_group.py
delete()
¶Delete this feature group and its associated data.
Removes the feature group from the feature store backend along with any data stored in the offline store. This operation is irreversible.
| RETURNS | DESCRIPTION |
|---|---|
FeatureGroupRequest
|
The request object used to perform the deletion. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the feature group has not been saved (no feature_group_id). |
Note
This requires an active workspace and project context. The feature group must have been previously saved using get_or_create().
Source code in src/seeknal/featurestore/feature_group.py
write(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None, output_date_pattern: str = 'yyyyMMdd')
¶Writes the feature group data to the offline store, using the specified feature start and end times and output date pattern.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
The start time for the feature data. If None, the current date is used.
TYPE:
|
feature_end_time
|
The end time for the feature data. If None, all available data is used.
TYPE:
|
output_date_pattern
|
The output date pattern for the feature data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in src/seeknal/featurestore/feature_group.py
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 | |
FeatureLookup
dataclass
¶
A class that represents a feature lookup operation in a feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
source |
The feature store to perform the lookup on.
TYPE:
|
features |
A list of feature names to include in the lookup. If None, all features in the store will be included.
TYPE:
|
exclude_features |
A list of feature names to exclude from the lookup. If None, no features will be excluded.
TYPE:
|
HistoricalFeatures
dataclass
¶
A class for retrieving historical features from a feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
lookups |
A list of FeatureLookup objects representing the features to retrieve.
TYPE:
|
Functions¶
using_spine(spine: pd.DataFrame, date_col: Optional[str] = None, offset: int = 0, length: Optional[int] = None, keep_cols: Optional[List[str]] = None)
¶Adds a spine DataFrame to the feature store serving pipeline.
| PARAMETER | DESCRIPTION |
|---|---|
spine
|
The spine DataFrame to add to the pipeline.
TYPE:
|
date_col
|
The name of the column containing the date to use for point-in-time joins. If not provided, point-in-time joins will not be performed.
TYPE:
|
offset
|
number of days to use as a reference point for join. E.g. offset=3, how='past' means that features dates equal (and older than) to three days before application date will be joined. Defaults to 0.
TYPE:
|
length
|
when how is not equal to 'point in time' limit the period of feature dates to join. Defaults to no limit.
TYPE:
|
keep_cols
|
A list of column names to keep from the spine DataFrame. If not provided, none columns will be kept.
TYPE:
|
Source code in src/seeknal/featurestore/feature_group.py
1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 | |
to_dataframe(feature_start_time: Optional[datetime] = None, feature_end_time: Optional[datetime] = None) -> DataFrame
¶Returns a pandas DataFrame containing the transformed feature data within the specified time range.
| PARAMETER | DESCRIPTION |
|---|---|
feature_start_time
|
The start time of the time range to filter the feature data.
TYPE:
|
feature_end_time
|
The end time of the time range to filter the feature data.
TYPE:
|
Source code in src/seeknal/featurestore/feature_group.py
Functions¶
validate_database_name(database_name: str) -> str
¶
Validate a SQL database name.
Database names must follow SQL identifier rules: - Start with a letter or underscore - Contain only alphanumeric characters and underscores - Be no longer than 128 characters
| PARAMETER | DESCRIPTION |
|---|---|
database_name
|
The database name to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated database name (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the database name is invalid. |
Source code in src/seeknal/validation.py
validate_table_name(table_name: str) -> str
¶
Validate a SQL table name.
Table names must follow SQL identifier rules: - Start with a letter or underscore - Contain only alphanumeric characters and underscores - Be no longer than 128 characters
| PARAMETER | DESCRIPTION |
|---|---|
table_name
|
The table name to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated table name (unchanged if valid). |
| RAISES | DESCRIPTION |
|---|---|
InvalidIdentifierError
|
If the table name is invalid. |
Source code in src/seeknal/validation.py
warn_if_insecure_path(path: str, context: Optional[str] = None, logger: Optional[logging.Logger] = None) -> Tuple[bool, Optional[str]]
¶
Log a warning if the path is in an insecure location.
This function checks if the provided path is in a world-writable or otherwise insecure location, and logs a warning with a secure alternative recommendation.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
The filesystem path to check.
TYPE:
|
context
|
Optional context string describing what the path is used for (e.g., "offline store", "feature store").
TYPE:
|
logger
|
Optional logger instance. If not provided, uses the module logger.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
A tuple of (is_insecure, secure_alternative) where: |
Optional[str]
|
|
Tuple[bool, Optional[str]]
|
|
Examples:
>>> is_insecure, alt = warn_if_insecure_path("/tmp/data", "offline store")
>>> is_insecure
True
>>> alt # Will be something like '/home/user/.seeknal/data'
Source code in src/seeknal/utils/path_security.py
featurestore
¶
Classes¶
OfflineStoreEnum
¶
Bases: str, Enum
Enumeration of supported offline storage types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Store features as a Hive table in a database.
|
FILE |
Store features as files on the filesystem (e.g., Delta format).
|
ICEBERG |
Store features in Apache Iceberg tables with ACID transactions, time travel, and cloud storage compatibility.
|
OnlineStoreEnum
¶
Bases: str, Enum
Enumeration of supported online storage types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
HIVE_TABLE |
Store features as a Hive table for online serving.
|
FILE |
Store features as Parquet files for online serving.
|
FileKindEnum
¶
Bases: str, Enum
Enumeration of supported file formats for feature storage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
DELTA |
Delta Lake format, providing ACID transactions and versioning.
|
FeatureStoreFileOutput(path: str, kind: FileKindEnum = FileKindEnum.DELTA)
dataclass
¶
Configuration for file-based feature store output.
| ATTRIBUTE | DESCRIPTION |
|---|---|
path |
The filesystem path for storing feature data. A security warning will be logged if this path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
The file format to use (default: DELTA).
TYPE:
|
FeatureStoreHiveTableOutput(database: str)
dataclass
¶
IcebergStoreOutput(table: str, catalog: str = 'lakekeeper', namespace: str = 'default', warehouse: Optional[str] = None, mode: str = 'append')
dataclass
¶
Iceberg storage configuration for feature group materialization.
This configuration enables storing features in Apache Iceberg tables with ACID transactions, time travel, and cloud storage compatibility.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name within namespace
TYPE:
|
catalog
|
Catalog name from profiles.yml (default: "lakekeeper")
TYPE:
|
namespace
|
Iceberg namespace/database (default: "default")
TYPE:
|
warehouse
|
Optional warehouse path override (s3://, gs://, azure://)
TYPE:
|
mode
|
Write mode - "append" or "overwrite" (default: "append")
TYPE:
|
OfflineStore(value: Optional[Union[str, FeatureStoreFileOutput, FeatureStoreHiveTableOutput, IcebergStoreOutput]] = None, kind: OfflineStoreEnum = OfflineStoreEnum.HIVE_TABLE, name: Optional[str] = None)
dataclass
¶
Configuration for offline feature store storage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The storage configuration (path for FILE, database for HIVE_TABLE). A security warning will be logged if a file path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
The storage type (FILE or HIVE_TABLE).
TYPE:
|
name |
Optional name for this offline store configuration.
TYPE:
|
Functions¶
get_or_create()
¶Retrieve an existing offline store or create a new one.
If an offline store with the specified name exists, it is retrieved and its configuration is loaded. Otherwise, a new offline store is created with the current configuration.
| RETURNS | DESCRIPTION |
|---|---|
OfflineStore
|
The current instance with id populated. |
Note
The store name defaults to "default" if not specified.
Source code in src/seeknal/featurestore/featurestore.py
list()
staticmethod
¶List all registered offline stores.
Displays a formatted table of all offline stores with their names, kinds, and configuration values. If no stores are found, displays an appropriate message.
| RETURNS | DESCRIPTION |
|---|---|
None
|
Output is printed to the console. |
Source code in src/seeknal/featurestore/featurestore.py
delete(spark: Optional[SparkSession] = None, *args, **kwargs) -> bool
¶Delete storage for a feature group from the offline store.
For FILE type: Deletes the directory containing the delta table. For HIVE_TABLE type: Drops the Hive table using Spark SQL.
| PARAMETER | DESCRIPTION |
|---|---|
spark
|
SparkSession instance (required for HIVE_TABLE, optional for FILE).
TYPE:
|
**kwargs
|
Must include 'project' and 'entity' to construct the table name.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if deletion was successful or resource didn't exist.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If required kwargs (project, entity) are missing. |
Source code in src/seeknal/featurestore/featurestore.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | |
OnlineStore(value: Optional[Union[str, FeatureStoreFileOutput, FeatureStoreHiveTableOutput]] = None, kind: OnlineStoreEnum = OnlineStoreEnum.FILE, name: Optional[str] = None)
dataclass
¶
Configuration for online feature store.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
The storage configuration (file path or hive table). A security warning will be logged if a file path is in an insecure location (e.g., /tmp).
TYPE:
|
kind |
Type of online store (FILE or HIVE_TABLE).
TYPE:
|
name |
Optional name for the store.
TYPE:
|
Functions¶
delete(*args, **kwargs)
¶Delete feature data from the online store.
Removes the directory containing the feature data for the specified project and feature name.
| PARAMETER | DESCRIPTION |
|---|---|
*args
|
Additional positional arguments (unused).
DEFAULT:
|
**kwargs
|
Keyword arguments including: - project (str): Project name for file naming. - name (str): Feature name for file naming.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the deletion was successful or if the path did not exist. |
Source code in src/seeknal/featurestore/featurestore.py
FillNull
¶
Bases: BaseModel
Configuration for filling null values in columns.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value |
Value to use for filling nulls.
TYPE:
|
dataType |
Data type for the value (e.g., 'double', 'string').
TYPE:
|
columns |
Optional list of columns to fill. If None, applies to all columns.
TYPE:
|
OfflineMaterialization
¶
Bases: BaseModel
Configuration for offline materialization.
| ATTRIBUTE | DESCRIPTION |
|---|---|
store |
The offline store configuration.
TYPE:
|
mode |
Write mode ('overwrite', 'append', 'merge').
TYPE:
|
ttl |
Time-to-live in days for data retention.
TYPE:
|
OnlineMaterialization
¶
Bases: BaseModel
Configuration for online materialization.
| ATTRIBUTE | DESCRIPTION |
|---|---|
store |
The online store configuration.
TYPE:
|
ttl |
Time-to-live in minutes for online store data.
TYPE:
|
Feature
¶
Bases: BaseModel
Define a Feature.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
Feature name.
TYPE:
|
feature_id |
Feature ID (assigned by seeknal).
TYPE:
|
description |
Feature description.
TYPE:
|
data_type |
Data type for the feature.
TYPE:
|
online_data_type |
Data type when stored in online-store.
TYPE:
|
created_at |
Creation timestamp.
TYPE:
|
updated_at |
Last update timestamp.
TYPE:
|
Functions¶
to_dict()
¶Convert the feature definition to a dictionary representation.
Creates a dictionary suitable for API requests with metadata and data type information.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with structure: - metadata: dict containing 'name' and optionally 'description' - datatype: The feature's data type - onlineDatatype: The feature's online store data type |
Source code in src/seeknal/featurestore/featurestore.py
FeatureStore(name: str, entity: Optional[Entity] = None, id: Optional[str] = None)
dataclass
¶
Bases: ABC
Abstract base class for feature stores.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
Feature store name.
TYPE:
|
entity |
Associated entity.
TYPE:
|
id |
Feature store ID.
TYPE:
|
Functions¶
get_or_create()
abstractmethod
¶Retrieve an existing feature store or create a new one.
This abstract method must be implemented by subclasses to handle the creation or retrieval of feature store instances.
| RETURNS | DESCRIPTION |
|---|---|
FeatureStore
|
The feature store instance. |