API Reference¶
This section provides detailed API documentation for all TransformPlan classes and functions.
Core Classes¶
| Class | Description |
|---|---|
TransformPlan |
Main class for building transformation pipelines |
Protocol |
Audit trail capturing transformation history |
Col |
Column reference for building filter expressions |
Filter |
Base class for serializable filter expressions |
Validation Classes¶
| Class | Description |
|---|---|
ValidationResult |
Result of schema validation |
DryRunResult |
Preview of pipeline execution |
SchemaValidationError |
Exception raised on validation failure |
Chunked Processing Classes¶
| Class | Description |
|---|---|
ChunkedProtocol |
Protocol for tracking chunked file processing |
ChunkValidationResult |
Result of validating pipeline for chunked processing |
ChunkingError |
Exception raised when pipeline is incompatible with chunking |
Operation Categories¶
TransformPlan provides operations organized by category:
| Category | Description | Examples |
|---|---|---|
| Column Operations | Add, drop, rename, cast columns | col_drop, col_rename, col_cast |
| Math Operations | Arithmetic on numeric columns | math_add, math_multiply, math_round |
| Row Operations | Filter, sort, deduplicate rows | rows_filter, rows_sort, rows_unique |
| String Operations | Text manipulation | str_replace, str_lower, str_split |
| Datetime Operations | Date and time extraction | dt_year, dt_month, dt_parse |
| Map Operations | Value mapping and discretization | map_values, map_discretize |
Complete Method Reference¶
All TransformPlan operations at a glance. Click method names for detailed documentation.
Column Operations¶
| Method | Description |
|---|---|
col_drop |
Drop a column from the DataFrame |
col_rename |
Rename a column |
col_cast |
Cast a column to a different dtype |
col_reorder |
Reorder columns (drops unlisted) |
col_select |
Keep only the specified columns |
col_duplicate |
Duplicate a column under a new name |
col_fill_null |
Fill null values in a column |
col_drop_null |
Drop rows with null values in specified columns |
col_drop_zero |
Drop rows where the specified column is zero |
col_add |
Add a new column with a constant value or expression |
col_add_uuid |
Add a column with unique random identifiers |
col_hash |
Hash one or more columns into a new column |
col_coalesce |
Take the first non-null value across multiple columns |
Math Operations¶
| Method | Description |
|---|---|
math_add |
Add a scalar value to a column |
math_subtract |
Subtract a scalar value from a column |
math_multiply |
Multiply a column by a scalar value |
math_divide |
Divide a column by a scalar value |
math_clamp |
Clamp column values to a range |
math_abs |
Take absolute value of a column |
math_round |
Round a column to specified decimal places |
math_set_min |
Set a minimum value for a column |
math_set_max |
Set a maximum value for a column |
math_add_columns |
Add two columns together into a new column |
math_subtract_columns |
Subtract one column from another |
math_multiply_columns |
Multiply two columns together |
math_divide_columns |
Divide one column by another |
math_percent_of |
Calculate percentage of one column relative to another |
math_cumsum |
Calculate cumulative sum (optionally grouped) |
math_rank |
Calculate rank of values |
math_standardize |
Z-score standardization (mean=0, std=1) |
math_minmax |
Min-max normalization to a range |
math_robust_scale |
Robust scaling using median and IQR |
math_log |
Logarithmic transform |
math_sqrt |
Square root transform |
math_power |
Power transform |
math_winsorize |
Clip values to percentiles or bounds |
Row Operations¶
| Method | Description |
|---|---|
rows_filter |
Filter rows using a Filter expression |
rows_drop |
Drop rows matching a filter |
rows_drop_nulls |
Drop rows with null values |
rows_flag |
Add a flag column based on a filter condition |
rows_unique |
Keep unique rows based on specified columns |
rows_deduplicate |
Deduplicate by keeping first/last based on sort order |
rows_sort |
Sort rows by one or more columns |
rows_head |
Keep only the first n rows |
rows_tail |
Keep only the last n rows |
rows_sample |
Sample rows from the DataFrame |
rows_explode |
Explode a list column into multiple rows |
rows_melt |
Unpivot from wide to long format |
rows_pivot |
Pivot from long to wide format |
String Operations¶
| Method | Description |
|---|---|
str_lower |
Convert string column to lowercase |
str_upper |
Convert string column to uppercase |
str_strip |
Strip leading and trailing characters |
str_pad |
Pad a string column to a specified length |
str_slice |
Extract a substring from a string column |
str_truncate |
Truncate strings to a maximum length |
str_replace |
Replace occurrences of a pattern |
str_extract |
Extract substring using regex capture group |
str_split |
Split a string column by separator |
str_concat |
Concatenate multiple string columns |
Datetime Operations¶
| Method | Description |
|---|---|
dt_year |
Extract year from a datetime column |
dt_month |
Extract month from a datetime column |
dt_day |
Extract day from a datetime column |
dt_week |
Extract ISO week number |
dt_quarter |
Extract quarter (1-4) |
dt_year_month |
Create a year-month string |
dt_quarter_year |
Create a quarter-year string (e.g., 'Q1-2024') |
dt_calendar_week |
Create a year-week string (e.g., '2024-W05') |
dt_format |
Format a datetime column as a string |
dt_parse |
Parse a string column into a datetime |
dt_diff_days |
Calculate difference in days between two dates |
dt_age_years |
Calculate age in years from a birth date |
dt_truncate |
Truncate datetime to a specified precision |
dt_is_between |
Check if date falls within a range |
Map Operations¶
| Method | Description |
|---|---|
map_values |
Map values in a column using a dictionary |
map_case |
Apply case-when logic to a column |
map_from_column |
Map values using another column as lookup |
map_discretize |
Discretize a numeric column into bins |
map_bool_to_int |
Convert boolean to integer (True=1, False=0) |
map_null_to_value |
Replace null values with a specific value |
map_value_to_null |
Replace a specific value with null |
Utility Functions¶
| Function | Description |
|---|---|
frame_hash |
Compute deterministic hash of a DataFrame |
validate_chunked_pipeline |
Validate pipeline compatibility with chunked processing |