MLFChain API Reference¶
The MLFChain class is the core orchestrator for building and executing ML pipelines in mlfcrafter.
Class Definition¶
class MLFChain:
"""
Initialize MLFChain with multiple crafters
Usage: MLFChain(DataIngestCrafter(...), CleanerCrafter(...), ...)
"""
Constructor¶
__init__(self, *crafters)¶
Creates a new MLFChain instance with the specified crafters.
Parameters:
*crafters: Variable number of crafter instances to add to the pipeline
Example:
from mlfcrafter import MLFChain
from mlfcrafter.crafters import DataIngestCrafter, CleanerCrafter, ModelCrafter
# Initialize with crafters
pipeline = MLFChain(
DataIngestCrafter(data_path="data.csv"),
CleanerCrafter(strategy="auto"),
ModelCrafter(model_name="random_forest")
)
# Or initialize empty and add crafters later
pipeline = MLFChain()
Methods¶
add_crafter(crafter)¶
Adds a single crafter to the pipeline chain.
Parameters:
crafter: The crafter instance to add to the pipeline
Example:
from mlfcrafter.crafters import DataIngestCrafter, CleanerCrafter
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter(data_path="data.csv"))
pipeline.add_crafter(CleanerCrafter(strategy="auto"))
run(target_column=None, **kwargs)¶
Runs the entire pipeline chain.
Parameters:
target_column(str, optional): Target column name for ML tasks**kwargs: Additional parameters to pass to the first crafter
Returns: dict - Context dictionary containing all pipeline results
Raises:
RuntimeError: If any crafter in the pipeline failsTypeError: If a crafter doesn't return a dict (context)ValueError: If required data or parameters are missing
Example:
# Run pipeline with target column
results = pipeline.run(target_column="target")
# Access results
print(f"Model accuracy: {results['test_score']:.4f}")
print(f"Dataset shape: {results['original_shape']}")
Properties¶
crafters¶
List of all crafters in the pipeline.
Type: list
Example:
print(f"Pipeline has {len(pipeline.crafters)} crafters")
for i, crafter in enumerate(pipeline.crafters, 1):
print(f"Crafter {i}: {type(crafter).__name__}")
Context Dictionary¶
The run() method returns a context dictionary containing all pipeline results. Key elements include:
Data Processing Results¶
data(pd.DataFrame): Current state of the dataoriginal_shape(tuple): Shape of original data (rows, columns)cleaned_shape(tuple): Shape after cleaning (if cleaning was performed)missing_values_handled(bool): Whether missing values were processed
Model Training Results¶
model: Trained model objectX_train, X_test(pd.DataFrame): Feature splitsy_train, y_test(pd.Series): Target splitsy_pred(np.array): Predictions on test settrain_score(float): Training accuracytest_score(float): Test accuracymodel_name(str): Name of algorithm usedfeatures(list): List of feature column names
Scaling Results¶
scaler: Fitted scaler object for future usescaled_columns(list): Names of columns that were scaledscaler_type(str): Type of scaler used
Encoding Results¶
encoder: Fitted encoder object for future useencoded_columns(list): Names of columns that were encodedencoder_type(str): Type of encoder used
Scoring Results¶
scores(dict): Dictionary containing calculated metrics
Deployment Results¶
deployment_path(str): Path where model was savedartifacts_saved(list): List of artifact keys that were saveddeployment_successful(bool): Whether deployment completed successfully
Example:
results = pipeline.run(target_column="target")
# Access various results
print(f"Original data shape: {results['original_shape']}")
print(f"Model used: {results['model_name']}")
print(f"Test accuracy: {results['test_score']:.4f}")
print(f"Model saved to: {results.get('deployment_path', 'Not saved')}")
Error Handling¶
MLFChain handles errors that occur during pipeline execution:
Common Exceptions:
RuntimeError: Raised when a crafter fails during executionTypeError: Raised when a crafter doesn't return a dict (context)ValueError: Raised when required data or parameters are missing
Example:
try:
results = pipeline.run(target_column="target")
print("Pipeline completed successfully!")
except RuntimeError as e:
print(f"Pipeline failed: {e}")
except ValueError as e:
print(f"Configuration error: {e}")
Examples¶
Basic Pipeline¶
from mlfcrafter import MLFChain
from mlfcrafter.crafters import *
# Create pipeline with all crafters
pipeline = MLFChain(
DataIngestCrafter(data_path="data.csv"),
CleanerCrafter(strategy="auto"),
CategoricalCrafter(encoder_type="onehot")
ScalerCrafter(scaler_type="standard"),
ModelCrafter(model_name="random_forest"),
ScorerCrafter(),
DeployCrafter()
)
# Run pipeline
results = pipeline.run(target_column="target")
print(f"Model accuracy: {results['test_score']:.4f}")
Adding Crafters Individually¶
# Create empty pipeline
pipeline = MLFChain()
# Add crafters one by one
pipeline.add_crafter(DataIngestCrafter(data_path="data.csv"))
pipeline.add_crafter(CleanerCrafter(strategy="median"))
pipeline.add_crafter(ModelCrafter(model_name="xgboost"))
# Run pipeline
results = pipeline.run(target_column="target")
Error Handling¶
try:
results = pipeline.run(target_column="target")
print(f"Success! Model accuracy: {results['test_score']:.4f}")
print(f"Model saved to: {results.get('deployment_path', 'Not saved')}")
except RuntimeError as e:
print(f"Pipeline failed: {e}")
except ValueError as e:
print(f"Configuration error: {e}")