Row Operations¶
Operations for filtering, sorting, and transforming rows.
Overview¶
Row operations modify which rows are included in the DataFrame and how they are ordered. Use the Col class to build filter expressions.
from transformplan import TransformPlan, Col
plan = (
TransformPlan()
.rows_filter(Col("status") == "active")
.rows_sort("created_at", descending=True)
.rows_unique(columns=["email"])
)
Class Reference¶
RowOps
¶
Mixin providing row-level operations.
rows_filter
¶
rows_filter(filter: Filter | dict[str, Any]) -> Self
Filter rows using a serializable Filter expression.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Example
from transformplan.filters import Col
.rows_filter(Col("age") > 18) .rows_filter((Col("status") == "active") & (Col("score") >= 50))
Source code in transformplan/ops/rows.py
rows_drop
¶
rows_drop(filter: Filter | dict[str, Any]) -> Self
Drop rows matching a filter (inverse of rows_filter).
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Example
.rows_drop(Col("status") == "deleted")
Source code in transformplan/ops/rows.py
rows_drop_nulls
¶
Drop rows with null values in specified columns (or any column if None).
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_unique
¶
rows_unique(
columns: str | Sequence[str] | None = None,
keep: Literal["first", "last", "any", "none"] = "first",
) -> Self
Keep unique rows based on specified columns.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_deduplicate
¶
rows_deduplicate(
columns: str | Sequence[str],
sort_by: str,
keep: Literal["first", "last"] = "first",
*,
descending: bool = False,
) -> Self
Deduplicate rows by keeping first/last based on sort order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
str | Sequence[str]
|
Columns that define duplicates. |
required |
sort_by
|
str
|
Column to sort by before deduplication. |
required |
keep
|
Literal['first', 'last']
|
Keep 'first' or 'last' after sorting. |
'first'
|
descending
|
bool
|
Sort in descending order. |
False
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_sort
¶
Sort rows by one or more columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
str | Sequence[str]
|
Column(s) to sort by. |
required |
descending
|
bool | Sequence[bool]
|
Sort direction (single bool or list matching columns). |
False
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_flag
¶
rows_flag(
filter: Filter | dict[str, Any],
new_column: str,
*,
true_value: Any = True,
false_value: Any = False,
) -> Self
Add a flag column based on a filter condition (without dropping rows).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filter
|
Filter | dict[str, Any]
|
Filter condition. |
required |
new_column
|
str
|
Name for the flag column. |
required |
true_value
|
Any
|
Value when condition is True. |
True
|
false_value
|
Any
|
Value when condition is False. |
False
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_head
¶
Keep only the first n rows.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
rows_tail
¶
Keep only the last n rows.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
rows_sample
¶
Sample rows from the DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int | None
|
Number of rows to sample. |
None
|
fraction
|
float | None
|
Fraction of rows to sample (0.0 to 1.0). |
None
|
seed
|
int | None
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_explode
¶
Explode a list column into multiple rows.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
rows_melt
¶
rows_melt(
id_columns: Sequence[str],
value_columns: Sequence[str],
variable_name: str = "variable",
value_name: str = "value",
) -> Self
Unpivot a DataFrame from wide to long format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_columns
|
Sequence[str]
|
Columns to keep as identifiers. |
required |
value_columns
|
Sequence[str]
|
Columns to unpivot. |
required |
variable_name
|
str
|
Name for the variable column. |
'variable'
|
value_name
|
str
|
Name for the value column. |
'value'
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
rows_pivot
¶
rows_pivot(
index: str | Sequence[str],
columns: str,
values: str,
aggregate_function: PivotAgg = "first",
) -> Self
Pivot from long to wide format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Sequence[str]
|
Column(s) to use as row identifiers. |
required |
columns
|
str
|
Column whose unique values become new columns. |
required |
values
|
str
|
Column containing values to fill. |
required |
aggregate_function
|
PivotAgg
|
How to aggregate ('first', 'sum', 'mean', 'count', etc.). |
'first'
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/rows.py
Examples¶
Filtering Rows¶
from transformplan import Col
# Keep rows matching condition
plan = TransformPlan().rows_filter(Col("age") >= 18)
# Drop rows matching condition
plan = TransformPlan().rows_drop(Col("status") == "deleted")
# Complex filters
plan = TransformPlan().rows_filter(
(Col("score") >= 50) & (Col("active") == True)
)
Flagging Rows¶
Add a boolean column based on a condition without removing rows:
plan = TransformPlan().rows_flag(
filter=Col("score") >= 90,
new_column="is_excellent",
true_value=True,
false_value=False
)
Sorting¶
# Sort by single column
plan = TransformPlan().rows_sort("name")
# Sort descending
plan = TransformPlan().rows_sort("score", descending=True)
# Sort by multiple columns
plan = TransformPlan().rows_sort(
by=["category", "price"],
descending=[False, True]
)
Removing Duplicates¶
# Keep first occurrence of each unique value
plan = TransformPlan().rows_unique(columns=["email"])
# Keep last occurrence
plan = TransformPlan().rows_unique(columns=["user_id"], keep="last")
# Deduplicate with specific sort order
plan = TransformPlan().rows_deduplicate(
columns=["user_id"],
sort_by="updated_at",
keep="last",
descending=True
)
Handling Nulls¶
# Drop rows with nulls in any column
plan = TransformPlan().rows_drop_nulls()
# Drop rows with nulls in specific columns
plan = TransformPlan().rows_drop_nulls(columns=["required_field"])
Limiting Rows¶
# Keep first n rows
plan = TransformPlan().rows_head(10)
# Keep last n rows
plan = TransformPlan().rows_tail(10)
# Random sample
plan = TransformPlan().rows_sample(n=100, seed=42)
plan = TransformPlan().rows_sample(fraction=0.1, seed=42)
Reshaping¶
# Explode list column into multiple rows
plan = TransformPlan().rows_explode("tags")
# Unpivot from wide to long format
plan = TransformPlan().rows_melt(
id_columns=["id", "name"],
value_columns=["q1", "q2", "q3", "q4"],
variable_name="quarter",
value_name="sales"
)
# Pivot from long to wide format
plan = TransformPlan().rows_pivot(
index=["id"],
columns="quarter",
values="sales",
aggregate_function="sum"
)