CleanerCrafter¶
The CleanerCrafter handles missing values in datasets using multiple strategies.
Overview¶
from mlfcrafter import CleanerCrafter
crafter = CleanerCrafter(
strategy="auto",
str_fill="missing",
int_fill=0.0
)
Parameters¶
strategy
(str)¶
Default: "auto"
Available Strategies:
- "auto"
: Automatically choose strategy based on data type (numerical: uses int_fill, categorical: uses str_fill)
- "mean"
: Fill numerical columns with column mean (categorical columns unchanged)
- "median"
: Fill numerical columns with column median (categorical columns unchanged)
- "mode"
: Fill all columns with most frequent value
- "drop"
: Drop rows containing any missing values
- "constant"
: Fill with constant values (str_fill for strings, int_fill for numbers)
str_fill
(str)¶
Default: "missing"
Fill value for categorical/string columns.
int_fill
(float)¶
Default: 0.0
Fill value for numerical columns.
Context Input¶
data
: Dataset to clean (required)
Context Output¶
data
: Cleaned datasetcleaned_shape
: Shape after cleaningmissing_values_handled
: Flag indicating cleaning was performed
Example Usage¶
from mlfcrafter import MLFChain
from mlfcrafter.crafters import *
# Automatic cleaning
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter("data.csv"))
pipeline.add_crafter(CleanerCrafter(strategy="auto"))
result = pipeline.run()
# Mean imputation for numerical columns
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter("data.csv"))
pipeline.add_crafter(CleanerCrafter(strategy="mean"))
result = pipeline.run()
# Custom fill values
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter("data.csv"))
pipeline.add_crafter(CleanerCrafter(
strategy="constant",
str_fill="Unknown",
int_fill=-1
))
result = pipeline.run()