String Operations¶
Text manipulation operations on string columns.
Overview¶
String operations allow you to transform text data in DataFrame columns. Operations include case conversion, trimming, splitting, concatenation, and pattern matching.
from transformplan import TransformPlan
plan = (
TransformPlan()
.str_lower("email")
.str_strip("name")
.str_replace("phone", "-", "")
)
Class Reference¶
StrOps
¶
Mixin providing string operations on columns.
str_replace
¶
Replace occurrences of a pattern in a string column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to modify. |
required |
pattern
|
str
|
Pattern to search for. |
required |
replacement
|
str
|
String to replace with. |
required |
literal
|
bool
|
If True, treat pattern as literal string. If False, treat as regex. |
True
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_slice
¶
Extract a substring from a string column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to modify. |
required |
offset
|
int
|
Start position (0-indexed, negative counts from end). |
required |
length
|
int | None
|
Number of characters to extract (None = to end). |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_truncate
¶
Truncate strings to a maximum length with optional suffix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to modify. |
required |
max_length
|
int
|
Maximum length of the string (including suffix). |
required |
suffix
|
str
|
Suffix to append to truncated strings. |
'...'
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_lower
¶
Convert string column to lowercase.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
str_upper
¶
Convert string column to uppercase.
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
str_strip
¶
Strip leading and trailing characters from a string column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to modify. |
required |
chars
|
str | None
|
Characters to strip (None = whitespace). |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_pad
¶
Pad a string column to a specified length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to modify. |
required |
length
|
int
|
Target length. |
required |
fill_char
|
str
|
Character to pad with. |
' '
|
side
|
str
|
'left' or 'right'. |
'left'
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_split
¶
str_split(
column: str,
separator: str,
new_columns: list[str] | None = None,
*,
keep_original: bool = False,
) -> Self
Split a string column by separator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to split. |
required |
separator
|
str
|
String to split on. |
required |
new_columns
|
list[str] | None
|
Names for the resulting columns. If None, explodes into rows. |
None
|
keep_original
|
bool
|
Whether to keep the original column. |
False
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_concat
¶
Concatenate multiple string columns into one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
list[str]
|
Columns to concatenate. |
required |
new_column
|
str
|
Name for the new column. |
required |
separator
|
str
|
Separator between values. |
''
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
str_extract
¶
str_extract(
column: str,
pattern: str,
group_index: int = 1,
new_column: str | None = None,
) -> Self
Extract substring using regex capture group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column to extract from. |
required |
pattern
|
str
|
Regex pattern with capture group(s). |
required |
group_index
|
int
|
Which capture group to extract (1-indexed). |
1
|
new_column
|
str | None
|
Name for result column (None = modify in place). |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
Self for method chaining. |
Source code in transformplan/ops/string.py
Examples¶
Case Conversion¶
# Convert to lowercase
plan = TransformPlan().str_lower("email")
# Convert to uppercase
plan = TransformPlan().str_upper("code")
Trimming and Padding¶
# Strip whitespace
plan = TransformPlan().str_strip("name")
# Strip specific characters
plan = TransformPlan().str_strip("code", chars="-_")
# Pad to fixed length
plan = TransformPlan().str_pad("id", length=10, fill_char="0", side="left")
Replacement¶
# Replace literal string
plan = TransformPlan().str_replace("phone", "-", "")
# Replace with regex
plan = TransformPlan().str_replace(
column="text",
pattern=r"\s+",
replacement=" ",
literal=False
)
Substring Operations¶
# Extract substring by position
plan = TransformPlan().str_slice("code", offset=0, length=3)
# Truncate with suffix
plan = TransformPlan().str_truncate("description", max_length=100, suffix="...")
Splitting¶
# Split into rows (explode)
plan = TransformPlan().str_split("tags", separator=",")
# Split into columns
plan = TransformPlan().str_split(
column="full_name",
separator=" ",
new_columns=["first_name", "last_name"],
keep_original=False
)
Concatenation¶
# Concatenate columns
plan = TransformPlan().str_concat(
columns=["first_name", "last_name"],
new_column="full_name",
separator=" "
)