Validation¶
Schema validation and dry-run preview for TransformPlan pipelines.
Overview¶
TransformPlan validates operations against DataFrame schemas before execution. This catches errors like:
- Referencing non-existent columns
- Applying string operations to numeric columns
- Creating columns that already exist
from transformplan import TransformPlan
plan = TransformPlan().col_drop("nonexistent")
result = plan.validate(df)
if not result.is_valid:
for error in result.errors:
print(error)
# Step 1 (col_drop): Column 'nonexistent' does not exist
ValidationResult¶
ValidationResult
¶
Result of schema validation.
Initialize an empty validation result.
Source code in transformplan/validation.py
is_valid
property
¶
Check if validation passed.
Returns:
| Type | Description |
|---|---|
bool
|
True if no errors, False otherwise. |
errors
property
¶
errors: list[ValidationError]
Get list of validation errors.
Returns:
| Type | Description |
|---|---|
list[ValidationError]
|
List of ValidationError instances. |
add_error
¶
raise_if_invalid
¶
Raise SchemaValidationError if validation failed.
Raises:
| Type | Description |
|---|---|
SchemaValidationError
|
If validation failed with errors. |
Source code in transformplan/validation.py
ValidationError¶
ValidationError
dataclass
¶
A single validation error.
__str__
¶
Return error message string.
Returns:
| Type | Description |
|---|---|
str
|
Formatted error message. |
SchemaValidationError¶
SchemaValidationError
¶
Bases: Exception
Raised when schema validation fails.
DryRunResult¶
DryRunResult
¶
DryRunResult(
input_schema: dict[str, DataType],
steps: list[DryRunStep],
validation: ValidationResult,
)
Result of a dry run showing what a pipeline will do.
Initialize DryRunResult.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_schema
|
dict[str, DataType]
|
Initial schema as column name to dtype mapping. |
required |
steps
|
list[DryRunStep]
|
List of dry run steps. |
required |
validation
|
ValidationResult
|
Validation result with any errors. |
required |
Source code in transformplan/validation.py
is_valid
property
¶
Whether the pipeline passed validation.
Returns:
| Type | Description |
|---|---|
bool
|
True if validation passed, False otherwise. |
errors
property
¶
errors: list[ValidationError]
steps
property
¶
steps: list[DryRunStep]
input_schema
property
¶
Input schema.
Returns:
| Type | Description |
|---|---|
dict[str, DataType]
|
Dictionary mapping column names to dtypes. |
output_schema
property
¶
Predicted output schema after all operations.
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Dictionary mapping column names to dtype names. |
input_columns
property
¶
Input column names.
Returns:
| Type | Description |
|---|---|
list[str]
|
List of input column names. |
output_columns
property
¶
Predicted output column names.
Returns:
| Type | Description |
|---|---|
list[str]
|
List of predicted output column names. |
summary
¶
Generate a human-readable summary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
show_params
|
bool
|
Whether to show operation parameters. |
True
|
show_schema
|
bool
|
Whether to show full schema at each step. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted string. |
Source code in transformplan/validation.py
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 | |
print
¶
DryRunStep¶
DryRunStep
dataclass
¶
DryRunStep(
step: int,
operation: str,
params: dict[str, Any],
schema_before: dict[str, str],
schema_after: dict[str, str],
columns_added: list[str],
columns_removed: list[str],
columns_modified: list[str],
error: str | None = None,
)
A single step in a dry run.
Example: Validation¶
from transformplan import TransformPlan, Col
df = pl.DataFrame({
"name": ["Alice", "Bob"],
"age": [25, 30],
"salary": [50000, 60000]
})
plan = (
TransformPlan()
.col_drop("age")
.rows_filter(Col("age") > 18) # Error: age was dropped!
)
result = plan.validate(df)
print(result)
# ValidationResult(valid=False, errors=1)
for error in result.errors:
print(error)
# Step 2 (rows_filter): Column 'age' does not exist
Example: Dry Run¶
plan = (
TransformPlan()
.col_drop("temp")
.col_add("bonus", value=1000)
.math_multiply("salary", 1.1)
)
preview = plan.dry_run(df)
preview.print()
Output:
======================================================================
DRY RUN PREVIEW
======================================================================
Validation: PASSED
----------------------------------------------------------------------
Input: 3 columns
----------------------------------------------------------------------
# Operation Columns Changes
----------------------------------------------------------------------
1 col_drop 2 -['temp']
-> column='temp'
2 col_add 3 +['bonus']
-> new_column='bonus', value=1000
3 math_multiply 3 ~['salary']
-> column='salary', value=1.1
======================================================================
Output: 3 columns
Type Checking¶
Validation includes type checking for operations that require specific types:
| Operation Type | Required Column Type |
|---|---|
math_* |
Numeric (Int, Float) |
str_* |
String (Utf8) |
dt_* |
Datetime (Date, Datetime, Time) |