API Reference¶

This section provides detailed API documentation for all TransformPlan classes and functions.

Core Classes¶

Class	Description
`TransformPlan`	Main class for building transformation pipelines
`Protocol`	Audit trail capturing transformation history
`Col`	Column reference for building filter expressions
`Filter`	Base class for serializable filter expressions

Class	Description
`ValidationResult`	Result of schema validation
`DryRunResult`	Preview of pipeline execution
`SchemaValidationError`	Exception raised on validation failure

Class	Description
`ChunkedProtocol`	Protocol for tracking chunked file processing
`ChunkValidationResult`	Result of validating pipeline for chunked processing
`ChunkingError`	Exception raised when pipeline is incompatible with chunking

TransformPlan provides operations organized by category:

Category	Description	Examples
Column Operations	Add, drop, rename, cast columns	`col_drop`, `col_rename`, `col_cast`
Math Operations	Arithmetic on numeric columns	`math_add`, `math_multiply`, `math_round`
Row Operations	Filter, sort, deduplicate rows	`rows_filter`, `rows_sort`, `rows_unique`
String Operations	Text manipulation	`str_replace`, `str_lower`, `str_split`
Datetime Operations	Date and time extraction	`dt_year`, `dt_month`, `dt_parse`
Map Operations	Value mapping and discretization	`map_values`, `map_discretize`

All TransformPlan operations at a glance. Click method names for detailed documentation.

Method	Description
`col_drop`	Drop a column from the DataFrame
`col_rename`	Rename a column
`col_cast`	Cast a column to a different dtype
`col_reorder`	Reorder columns (drops unlisted)
`col_select`	Keep only the specified columns
`col_duplicate`	Duplicate a column under a new name
`col_fill_null`	Fill null values in a column
`col_drop_null`	Drop rows with null values in specified columns
`col_drop_zero`	Drop rows where the specified column is zero
`col_add`	Add a new column with a constant value or expression
`col_add_uuid`	Add a column with unique random identifiers
`col_hash`	Hash one or more columns into a new column
`col_coalesce`	Take the first non-null value across multiple columns

Method	Description
`math_add`	Add a scalar value to a column
`math_subtract`	Subtract a scalar value from a column
`math_multiply`	Multiply a column by a scalar value
`math_divide`	Divide a column by a scalar value
`math_clamp`	Clamp column values to a range
`math_abs`	Take absolute value of a column
`math_round`	Round a column to specified decimal places
`math_set_min`	Set a minimum value for a column
`math_set_max`	Set a maximum value for a column
`math_add_columns`	Add two columns together into a new column
`math_subtract_columns`	Subtract one column from another
`math_multiply_columns`	Multiply two columns together
`math_divide_columns`	Divide one column by another
`math_percent_of`	Calculate percentage of one column relative to another
`math_cumsum`	Calculate cumulative sum (optionally grouped)
`math_rank`	Calculate rank of values
`math_standardize`	Z-score standardization (mean=0, std=1)
`math_minmax`	Min-max normalization to a range
`math_robust_scale`	Robust scaling using median and IQR
`math_log`	Logarithmic transform
`math_sqrt`	Square root transform
`math_power`	Power transform
`math_winsorize`	Clip values to percentiles or bounds

Method	Description
`rows_filter`	Filter rows using a Filter expression
`rows_drop`	Drop rows matching a filter
`rows_drop_nulls`	Drop rows with null values
`rows_flag`	Add a flag column based on a filter condition
`rows_unique`	Keep unique rows based on specified columns
`rows_deduplicate`	Deduplicate by keeping first/last based on sort order
`rows_sort`	Sort rows by one or more columns
`rows_head`	Keep only the first n rows
`rows_tail`	Keep only the last n rows
`rows_sample`	Sample rows from the DataFrame
`rows_explode`	Explode a list column into multiple rows
`rows_melt`	Unpivot from wide to long format
`rows_pivot`	Pivot from long to wide format

Method	Description
`str_lower`	Convert string column to lowercase
`str_upper`	Convert string column to uppercase
`str_strip`	Strip leading and trailing characters
`str_pad`	Pad a string column to a specified length
`str_slice`	Extract a substring from a string column
`str_truncate`	Truncate strings to a maximum length
`str_replace`	Replace occurrences of a pattern
`str_extract`	Extract substring using regex capture group
`str_split`	Split a string column by separator
`str_concat`	Concatenate multiple string columns

Method	Description
`dt_year`	Extract year from a datetime column
`dt_month`	Extract month from a datetime column
`dt_day`	Extract day from a datetime column
`dt_week`	Extract ISO week number
`dt_quarter`	Extract quarter (1-4)
`dt_year_month`	Create a year-month string
`dt_quarter_year`	Create a quarter-year string (e.g., 'Q1-2024')
`dt_calendar_week`	Create a year-week string (e.g., '2024-W05')
`dt_format`	Format a datetime column as a string
`dt_parse`	Parse a string column into a datetime
`dt_diff_days`	Calculate difference in days between two dates
`dt_age_years`	Calculate age in years from a birth date
`dt_truncate`	Truncate datetime to a specified precision
`dt_is_between`	Check if date falls within a range

Method	Description
`map_values`	Map values in a column using a dictionary
`map_case`	Apply case-when logic to a column
`map_from_column`	Map values using another column as lookup
`map_discretize`	Discretize a numeric column into bins
`map_bool_to_int`	Convert boolean to integer (True=1, False=0)
`map_null_to_value`	Replace null values with a specific value
`map_value_to_null`	Replace a specific value with null

Function	Description
`frame_hash`	Compute deterministic hash of a DataFrame
`validate_chunked_pipeline`	Validate pipeline compatibility with chunked processing