Filters¶
Serializable filter expressions for row filtering operations.
Overview¶
The filter system provides a way to build complex filter conditions that can be serialized to JSON and deserialized back. This enables reproducible pipelines that can be saved and shared.
from transformplan import Col, Filter
# Build a filter
filter_expr = (Col("age") >= 18) & (Col("status") == "active")
# Use in pipeline
plan = TransformPlan().rows_filter(filter_expr)
# Serialize
filter_dict = filter_expr.to_dict()
# Deserialize
restored = Filter.from_dict(filter_dict)
Col Class¶
Col
¶
Column reference for building filter expressions.
Col provides a fluent interface for creating filter conditions on DataFrame columns. Use comparison operators and methods to build filters that can be combined using & (and), | (or), and ~ (not).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the column to reference. |
required |
Example
Comparison operators¶
Col("age") >= 18 Col("status") == "active" Col("price") < 100
String methods¶
Col("email").str_contains("@company.com") Col("name").str_starts_with("A")
Null checks¶
Col("optional").is_null() Col("required").is_not_null()
Membership¶
Col("country").is_in(["US", "CA", "MX"]) Col("age").between(18, 65)
Combining conditions¶
(Col("age") >= 18) & (Col("status") == "active") (Col("role") == "admin") | (Col("role") == "moderator")
Initialize a column reference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the column to reference. |
required |
Source code in transformplan/filters.py
__eq__
¶
__eq__(value: object) -> Eq
Create an equality filter (column == value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
object
|
Value to compare against. |
required |
Returns:
| Type | Description |
|---|---|
Eq
|
Eq filter for column equals value. |
Source code in transformplan/filters.py
__ne__
¶
__ne__(value: object) -> Ne
Create an inequality filter (column != value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
object
|
Value to compare against. |
required |
Returns:
| Type | Description |
|---|---|
Ne
|
Ne filter for column not equals value. |
Source code in transformplan/filters.py
__gt__
¶
__gt__(value: Any) -> Gt
Create a greater-than filter (column > value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Any
|
Value to compare against. |
required |
Returns:
| Type | Description |
|---|---|
Gt
|
Gt filter for column greater than value. |
Source code in transformplan/filters.py
__ge__
¶
__ge__(value: Any) -> Ge
Create a greater-or-equal filter (column >= value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Any
|
Value to compare against. |
required |
Returns:
| Type | Description |
|---|---|
Ge
|
Ge filter for column greater than or equal to value. |
Source code in transformplan/filters.py
__lt__
¶
__lt__(value: Any) -> Lt
Create a less-than filter (column < value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Any
|
Value to compare against. |
required |
Returns:
| Type | Description |
|---|---|
Lt
|
Lt filter for column less than value. |
__le__
¶
__le__(value: Any) -> Le
Create a less-or-equal filter (column <= value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Any
|
Value to compare against. |
required |
Returns:
| Type | Description |
|---|---|
Le
|
Le filter for column less than or equal to value. |
Source code in transformplan/filters.py
is_in
¶
is_in(values: Sequence[Any]) -> IsIn
Create a membership filter (column in values).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
Sequence[Any]
|
Sequence of values to check membership against. |
required |
Returns:
| Type | Description |
|---|---|
IsIn
|
IsIn filter for column value in the given sequence. |
Example
Col("status").is_in(["active", "pending"])
Source code in transformplan/filters.py
is_null
¶
is_null() -> IsNull
Create a null check filter (column is null).
Returns:
| Type | Description |
|---|---|
IsNull
|
IsNull filter for column is null. |
Example
Col("optional_field").is_null()
is_not_null
¶
is_not_null() -> IsNotNull
Create a not-null check filter (column is not null).
Returns:
| Type | Description |
|---|---|
IsNotNull
|
IsNotNull filter for column is not null. |
Example
Col("required_field").is_not_null()
Source code in transformplan/filters.py
str_contains
¶
str_contains(pattern: str, *, literal: bool = True) -> StrContains
Create a string contains filter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
Substring or regex pattern to search for. |
required |
literal
|
bool
|
If True, treat pattern as literal string. If False, as regex. |
True
|
Returns:
| Type | Description |
|---|---|
StrContains
|
StrContains filter for column containing pattern. |
Example
Col("email").str_contains("@company.com") Col("description").str_contains(r"\d+", literal=False)
Source code in transformplan/filters.py
str_starts_with
¶
str_starts_with(prefix: str) -> StrStartsWith
Create a string starts-with filter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prefix
|
str
|
Prefix to check for. |
required |
Returns:
| Type | Description |
|---|---|
StrStartsWith
|
StrStartsWith filter for column starting with prefix. |
Example
Col("code").str_starts_with("PRD-")
Source code in transformplan/filters.py
str_ends_with
¶
str_ends_with(suffix: str) -> StrEndsWith
Create a string ends-with filter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
suffix
|
str
|
Suffix to check for. |
required |
Returns:
| Type | Description |
|---|---|
StrEndsWith
|
StrEndsWith filter for column ending with suffix. |
Example
Col("filename").str_ends_with(".csv")
Source code in transformplan/filters.py
between
¶
between(lower: Any, upper: Any) -> Between
Create a range filter (lower <= column <= upper).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lower
|
Any
|
Lower bound (inclusive). |
required |
upper
|
Any
|
Upper bound (inclusive). |
required |
Returns:
| Type | Description |
|---|---|
Between
|
Between filter for column within range. |
Example
Col("age").between(18, 65) Col("date").between("2024-01-01", "2024-12-31")
Source code in transformplan/filters.py
Filter Base Class¶
Filter
¶
Bases: ABC
Abstract base class for all filter expressions.
Filters are composable, serializable expressions that define row selection criteria. They can be combined using logical operators (&, |, ~) and serialized to dictionaries for storage and transmission.
Subclasses must implement
- to_expr(): Convert to a Polars expression
- to_dict(): Serialize to a dictionary
- _from_dict(): Deserialize from a dictionary (classmethod)
Example
filter1 = Col("age") >= 18 filter2 = Col("status") == "active" combined = filter1 & filter2 # And filter inverted = ~filter1 # Not filter
to_expr
abstractmethod
¶
Convert to a Polars expression.
Returns:
| Type | Description |
|---|---|
Expr
|
A Polars expression that can be used with DataFrame.filter(). |
to_dict
abstractmethod
¶
Serialize to a dictionary for JSON storage.
The dictionary includes a 'type' key identifying the filter class, plus any parameters needed to reconstruct the filter.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary representation of the filter. |
Source code in transformplan/filters.py
from_dict
classmethod
¶
from_dict(data: dict[str, Any]) -> Filter
Deserialize a filter from a dictionary.
Uses the 'type' key to determine which filter class to instantiate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with 'type' key and filter parameters. |
required |
Returns:
| Type | Description |
|---|---|
Filter
|
Reconstructed Filter instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If 'type' is missing or unknown. |
Example
data = {"type": "eq", "column": "status", "value": "active"} filter_obj = Filter.from_dict(data)
Source code in transformplan/filters.py
Comparison Filters¶
Eq (Equal)¶
Eq
dataclass
¶
Bases: Filter
Equality filter: column == value.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to compare. |
value |
Any
|
Value to compare against. |
Ne (Not Equal)¶
Ne
dataclass
¶
Bases: Filter
Inequality filter: column != value.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to compare. |
value |
Any
|
Value to compare against. |
Gt (Greater Than)¶
Gt
dataclass
¶
Bases: Filter
Greater-than filter: column > value.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to compare. |
value |
Any
|
Value to compare against. |
Ge (Greater Than or Equal)¶
Ge
dataclass
¶
Bases: Filter
Greater-or-equal filter: column >= value.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to compare. |
value |
Any
|
Value to compare against. |
Lt (Less Than)¶
Lt
dataclass
¶
Bases: Filter
Less-than filter: column < value.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to compare. |
value |
Any
|
Value to compare against. |
Le (Less Than or Equal)¶
Le
dataclass
¶
Bases: Filter
Less-or-equal filter: column <= value.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to compare. |
value |
Any
|
Value to compare against. |
IsIn¶
IsIn
dataclass
¶
Bases: Filter
Membership filter: column value in list of values.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to check. |
values |
Sequence[Any]
|
Sequence of values to check membership against. |
Between¶
Between
dataclass
¶
Bases: Filter
Range filter: lower <= column <= upper.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to check. |
lower |
Any
|
Lower bound (inclusive). |
upper |
Any
|
Upper bound (inclusive). |
Null Filters¶
IsNull¶
IsNull
dataclass
¶
Bases: Filter
Null check filter: column is null.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to check. |
IsNotNull¶
IsNotNull
dataclass
¶
Bases: Filter
Not-null check filter: column is not null.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the column to check. |
String Filters¶
StrContains¶
StrContains
dataclass
¶
Bases: Filter
String contains filter: column contains pattern.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the string column to search. |
pattern |
str
|
Substring or regex pattern to find. |
literal |
bool
|
If True, treat pattern as literal. If False, as regex. |
to_expr
¶
Convert to Polars str.contains expression.
Returns:
| Type | Description |
|---|---|
Expr
|
Polars expression for string containment check. |
to_dict
¶
Serialize to dictionary.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary representation with type, column, pattern, and literal. |
Source code in transformplan/filters.py
StrStartsWith¶
StrStartsWith
dataclass
¶
Bases: Filter
String starts-with filter: column starts with prefix.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the string column to check. |
prefix |
str
|
Prefix to match at the start. |
StrEndsWith¶
StrEndsWith
dataclass
¶
Bases: Filter
String ends-with filter: column ends with suffix.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Name of the string column to check. |
suffix |
str
|
Suffix to match at the end. |
Logical Combinators¶
And¶
And
dataclass
¶
Bases: Filter
Logical AND filter: both conditions must be true.
Typically created using the & operator between filters.
Attributes:
| Name | Type | Description |
|---|---|---|
left |
Filter
|
First filter condition. |
right |
Filter
|
Second filter condition. |
Example
(Col("age") >= 18) & (Col("status") == "active")
to_expr
¶
Convert to Polars AND expression.
Returns:
| Type | Description |
|---|---|
Expr
|
Polars expression combining both conditions with AND. |
to_dict
¶
Serialize to dictionary with nested filter dicts.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary representation with type, left, and right. |
Source code in transformplan/filters.py
Or¶
Or
dataclass
¶
Bases: Filter
Logical OR filter: at least one condition must be true.
Typically created using the | operator between filters.
Attributes:
| Name | Type | Description |
|---|---|---|
left |
Filter
|
First filter condition. |
right |
Filter
|
Second filter condition. |
Example
(Col("role") == "admin") | (Col("role") == "moderator")
to_expr
¶
Convert to Polars OR expression.
Returns:
| Type | Description |
|---|---|
Expr
|
Polars expression combining both conditions with OR. |
to_dict
¶
Serialize to dictionary with nested filter dicts.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary representation with type, left, and right. |
Source code in transformplan/filters.py
Not¶
Not
dataclass
¶
Not(operand: Filter)
Bases: Filter
Logical NOT filter: inverts the condition.
Typically created using the ~ operator on a filter.
Attributes:
| Name | Type | Description |
|---|---|---|
operand |
Filter
|
Filter condition to invert. |
Example
~(Col("deleted") == True)
Examples¶
Simple Comparisons¶
from transformplan import Col
# Numeric comparisons
Col("age") >= 18
Col("price") < 100
Col("quantity") == 0
# String equality
Col("status") == "active"
Col("country") != "US"
String Matching¶
# Contains substring
Col("email").str_contains("@company.com")
# Starts/ends with
Col("code").str_starts_with("PRD-")
Col("filename").str_ends_with(".csv")
Membership Tests¶
# Check if value is in list
Col("status").is_in(["active", "pending"])
# Range check
Col("age").between(18, 65)
Null Checks¶
Combining Conditions¶
# AND: both conditions must be true
(Col("age") >= 18) & (Col("status") == "active")
# OR: at least one condition must be true
(Col("role") == "admin") | (Col("role") == "moderator")
# NOT: invert condition
~(Col("deleted") == True)
# Complex combinations
(
(Col("age") >= 18) &
(Col("country").is_in(["US", "CA"])) &
~(Col("status") == "banned")
)