Skip to content

API Reference

This section provides detailed API documentation for all TransformPlan classes and functions.

Core Classes

Class Description
TransformPlan Main class for building transformation pipelines
Protocol Audit trail capturing transformation history
Col Column reference for building filter expressions
Filter Base class for serializable filter expressions

Validation Classes

Class Description
ValidationResult Result of schema validation
DryRunResult Preview of pipeline execution
SchemaValidationError Exception raised on validation failure

Chunked Processing Classes

Class Description
ChunkedProtocol Protocol for tracking chunked file processing
ChunkValidationResult Result of validating pipeline for chunked processing
ChunkingError Exception raised when pipeline is incompatible with chunking

Operation Categories

TransformPlan provides operations organized by category:

Category Description Examples
Column Operations Add, drop, rename, cast columns col_drop, col_rename, col_cast
Math Operations Arithmetic on numeric columns math_add, math_multiply, math_round
Row Operations Filter, sort, deduplicate rows rows_filter, rows_sort, rows_unique
String Operations Text manipulation str_replace, str_lower, str_split
Datetime Operations Date and time extraction dt_year, dt_month, dt_parse
Map Operations Value mapping and discretization map_values, map_discretize

Complete Method Reference

All TransformPlan operations at a glance. Click method names for detailed documentation.

Column Operations

Method Description
col_drop Drop a column from the DataFrame
col_rename Rename a column
col_cast Cast a column to a different dtype
col_reorder Reorder columns (drops unlisted)
col_select Keep only the specified columns
col_duplicate Duplicate a column under a new name
col_fill_null Fill null values in a column
col_drop_null Drop rows with null values in specified columns
col_drop_zero Drop rows where the specified column is zero
col_add Add a new column with a constant value or expression
col_add_uuid Add a column with unique random identifiers
col_hash Hash one or more columns into a new column
col_coalesce Take the first non-null value across multiple columns

Math Operations

Method Description
math_add Add a scalar value to a column
math_subtract Subtract a scalar value from a column
math_multiply Multiply a column by a scalar value
math_divide Divide a column by a scalar value
math_clamp Clamp column values to a range
math_abs Take absolute value of a column
math_round Round a column to specified decimal places
math_set_min Set a minimum value for a column
math_set_max Set a maximum value for a column
math_add_columns Add two columns together into a new column
math_subtract_columns Subtract one column from another
math_multiply_columns Multiply two columns together
math_divide_columns Divide one column by another
math_percent_of Calculate percentage of one column relative to another
math_cumsum Calculate cumulative sum (optionally grouped)
math_rank Calculate rank of values
math_standardize Z-score standardization (mean=0, std=1)
math_minmax Min-max normalization to a range
math_robust_scale Robust scaling using median and IQR
math_log Logarithmic transform
math_sqrt Square root transform
math_power Power transform
math_winsorize Clip values to percentiles or bounds

Row Operations

Method Description
rows_filter Filter rows using a Filter expression
rows_drop Drop rows matching a filter
rows_drop_nulls Drop rows with null values
rows_flag Add a flag column based on a filter condition
rows_unique Keep unique rows based on specified columns
rows_deduplicate Deduplicate by keeping first/last based on sort order
rows_sort Sort rows by one or more columns
rows_head Keep only the first n rows
rows_tail Keep only the last n rows
rows_sample Sample rows from the DataFrame
rows_explode Explode a list column into multiple rows
rows_melt Unpivot from wide to long format
rows_pivot Pivot from long to wide format

String Operations

Method Description
str_lower Convert string column to lowercase
str_upper Convert string column to uppercase
str_strip Strip leading and trailing characters
str_pad Pad a string column to a specified length
str_slice Extract a substring from a string column
str_truncate Truncate strings to a maximum length
str_replace Replace occurrences of a pattern
str_extract Extract substring using regex capture group
str_split Split a string column by separator
str_concat Concatenate multiple string columns

Datetime Operations

Method Description
dt_year Extract year from a datetime column
dt_month Extract month from a datetime column
dt_day Extract day from a datetime column
dt_week Extract ISO week number
dt_quarter Extract quarter (1-4)
dt_year_month Create a year-month string
dt_quarter_year Create a quarter-year string (e.g., 'Q1-2024')
dt_calendar_week Create a year-week string (e.g., '2024-W05')
dt_format Format a datetime column as a string
dt_parse Parse a string column into a datetime
dt_diff_days Calculate difference in days between two dates
dt_age_years Calculate age in years from a birth date
dt_truncate Truncate datetime to a specified precision
dt_is_between Check if date falls within a range

Map Operations

Method Description
map_values Map values in a column using a dictionary
map_case Apply case-when logic to a column
map_from_column Map values using another column as lookup
map_discretize Discretize a numeric column into bins
map_bool_to_int Convert boolean to integer (True=1, False=0)
map_null_to_value Replace null values with a specific value
map_value_to_null Replace a specific value with null

Utility Functions

Function Description
frame_hash Compute deterministic hash of a DataFrame
validate_chunked_pipeline Validate pipeline compatibility with chunked processing