Backends¶
TransformPlan uses a pluggable backend system. Each backend implements the Backend ABC, providing all 88 operations plus meta methods for hashing, schema inspection, and type classification.
Overview¶
The backend determines how data is stored and transformed:
- PolarsBackend (default): Operates on Polars DataFrames using native Polars expressions
- DuckDBBackend (optional): Operates on DuckDB relations using SQL generation
A TransformPlan is a pure, backend-agnostic recipe of operations. The backend is chosen at execution time by passing it to process(), validate(), or dry_run(). If no backend is specified, PolarsBackend is used by default. Pipelines serialized with one backend can be loaded and executed with another.
from transformplan import TransformPlan
# Build a plan — no backend needed
plan = TransformPlan().col_drop("temp").math_add("age", 1)
# Execute with default PolarsBackend
result, protocol = plan.process(polars_df)
# Execute with DuckDB backend
import duckdb
from transformplan.backends.duckdb import DuckDBBackend
con = duckdb.connect()
result, protocol = plan.process(duckdb_rel, backend=DuckDBBackend(con))
Backend ABC¶
The abstract base class that all backends must implement.
Backend
¶
Bases: ABC
Abstract base class defining the operation interface for backends.
Each backend must implement all 89 operations. Methods receive data and operation-specific parameters, and return transformed data.
Subclasses must set the name class variable to a unique identifier
(e.g. "polars", "duckdb").
PolarsBackend¶
The default backend, using Polars DataFrames.
PolarsBackend
¶
import polars as pl
from transformplan import TransformPlan
df = pl.DataFrame({"name": ["Alice", "Bob"], "age": [25, 30]})
plan = TransformPlan().rows_filter(Col("age") >= 18)
result, protocol = plan.process(df)
DuckDBBackend¶
Optional backend using DuckDB relations and SQL generation. Requires duckdb to be installed.
Optional Dependency
Install DuckDB separately: pip install duckdb or uv add duckdb
DuckDBBackend
¶
Bases: Backend
Backend implementation using DuckDB for all operations.
Initialize DuckDBBackend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
con
|
DuckDBPyConnection | None
|
DuckDB connection. If None, creates an in-memory connection. |
None
|
Source code in transformplan/backends/duckdb.py
compute_hash
¶
Source code in transformplan/backends/duckdb.py
get_shape
¶
get_schema
¶
import duckdb
from transformplan import TransformPlan, Col
from transformplan.backends.duckdb import DuckDBBackend
con = duckdb.connect()
rel = con.sql("SELECT * FROM 'data.parquet'")
plan = (
TransformPlan()
.col_rename(column="ID", new_name="id")
.rows_filter(Col("age") >= 18)
.math_standardize(column="score", new_column="z_score")
)
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
Cross-Backend Serialization¶
Pipelines are inherently backend-agnostic. The same serialized plan can be executed with any backend:
import duckdb
from transformplan import TransformPlan, Col
from transformplan.backends.duckdb import DuckDBBackend
# Build and serialize
plan = (
TransformPlan()
.col_rename(column="ID", new_name="id")
.rows_filter(Col("age") >= 18)
)
plan.to_json("pipeline.json")
# Load and execute with Polars (default)
restored = TransformPlan.from_json("pipeline.json")
result, protocol = restored.process(polars_df)
# Or execute with DuckDB
con = duckdb.connect()
rel = con.sql("SELECT * FROM 'data.parquet'")
result, protocol = restored.process(rel, backend=DuckDBBackend(con))
Type System¶
Each backend classifies its native types into categories used by the validation system:
| Method | PolarsBackend | DuckDBBackend |
|---|---|---|
is_numeric_type() |
Polars Int/Float/Decimal dtypes | INTEGER, BIGINT, DOUBLE, FLOAT, etc. |
is_string_type() |
pl.Utf8, pl.String |
VARCHAR, TEXT, etc. |
is_datetime_type() |
pl.Date, pl.Datetime, pl.Time |
DATE, TIMESTAMP, TIME, etc. |
is_boolean_type() |
pl.Boolean |
BOOLEAN |
Type factory methods (float_type(), int_type(), string_type(), bool_type()) return the appropriate native type for each backend, used by operations that create new columns.