MLFChain API Reference¶
The MLFChain
class is the core orchestrator for building and executing ML pipelines in mlfcrafter.
Class Definition¶
class MLFChain:
"""
Initialize MLFChain with multiple crafters
Usage: MLFChain(DataIngestCrafter(...), CleanerCrafter(...), ...)
"""
Constructor¶
__init__(self, *crafters)
¶
Creates a new MLFChain instance with the specified crafters.
Parameters:
*crafters
: Variable number of crafter instances to add to the pipeline
Example:
from mlfcrafter import MLFChain
from mlfcrafter.crafters import DataIngestCrafter, CleanerCrafter, ModelCrafter
# Initialize with crafters
pipeline = MLFChain(
DataIngestCrafter(data_path="data.csv"),
CleanerCrafter(strategy="auto"),
ModelCrafter(model_name="random_forest")
)
# Or initialize empty and add crafters later
pipeline = MLFChain()
Methods¶
add_crafter(crafter)
¶
Adds a single crafter to the pipeline chain.
Parameters:
crafter
: The crafter instance to add to the pipeline
Example:
from mlfcrafter.crafters import DataIngestCrafter, CleanerCrafter
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter(data_path="data.csv"))
pipeline.add_crafter(CleanerCrafter(strategy="auto"))
run(target_column=None, **kwargs)
¶
Runs the entire pipeline chain.
Parameters:
target_column
(str, optional): Target column name for ML tasks**kwargs
: Additional parameters to pass to the first crafter
Returns: dict
- Context dictionary containing all pipeline results
Raises:
RuntimeError
: If any crafter in the pipeline failsTypeError
: If a crafter doesn't return a dict (context)ValueError
: If required data or parameters are missing
Example:
# Run pipeline with target column
results = pipeline.run(target_column="target")
# Access results
print(f"Model accuracy: {results['test_score']:.4f}")
print(f"Dataset shape: {results['original_shape']}")
Properties¶
crafters
¶
List of all crafters in the pipeline.
Type: list
Example:
print(f"Pipeline has {len(pipeline.crafters)} crafters")
for i, crafter in enumerate(pipeline.crafters, 1):
print(f"Crafter {i}: {type(crafter).__name__}")
Context Dictionary¶
The run()
method returns a context dictionary containing all pipeline results. Key elements include:
Data Processing Results¶
data
(pd.DataFrame): Current state of the dataoriginal_shape
(tuple): Shape of original data (rows, columns)cleaned_shape
(tuple): Shape after cleaning (if cleaning was performed)missing_values_handled
(bool): Whether missing values were processed
Model Training Results¶
model
: Trained model objectX_train, X_test
(pd.DataFrame): Feature splitsy_train, y_test
(pd.Series): Target splitsy_pred
(np.array): Predictions on test settrain_score
(float): Training accuracytest_score
(float): Test accuracymodel_name
(str): Name of algorithm usedfeatures
(list): List of feature column names
Scaling Results¶
scaler
: Fitted scaler object for future usescaled_columns
(list): Names of columns that were scaledscaler_type
(str): Type of scaler used
Encoding Results¶
encoder
: Fitted encoder object for future useencoded_columns
(list): Names of columns that were encodedencoder_type
(str): Type of encoder used
Scoring Results¶
scores
(dict): Dictionary containing calculated metrics
Deployment Results¶
deployment_path
(str): Path where model was savedartifacts_saved
(list): List of artifact keys that were saveddeployment_successful
(bool): Whether deployment completed successfully
Example:
results = pipeline.run(target_column="target")
# Access various results
print(f"Original data shape: {results['original_shape']}")
print(f"Model used: {results['model_name']}")
print(f"Test accuracy: {results['test_score']:.4f}")
print(f"Model saved to: {results.get('deployment_path', 'Not saved')}")
Error Handling¶
MLFChain handles errors that occur during pipeline execution:
Common Exceptions:
RuntimeError
: Raised when a crafter fails during executionTypeError
: Raised when a crafter doesn't return a dict (context)ValueError
: Raised when required data or parameters are missing
Example:
try:
results = pipeline.run(target_column="target")
print("Pipeline completed successfully!")
except RuntimeError as e:
print(f"Pipeline failed: {e}")
except ValueError as e:
print(f"Configuration error: {e}")
Examples¶
Basic Pipeline¶
from mlfcrafter import MLFChain
from mlfcrafter.crafters import *
# Create pipeline with all crafters
pipeline = MLFChain(
DataIngestCrafter(data_path="data.csv"),
CleanerCrafter(strategy="auto"),
CategoricalCrafter(encoder_type="onehot")
ScalerCrafter(scaler_type="standard"),
ModelCrafter(model_name="random_forest"),
ScorerCrafter(),
DeployCrafter()
)
# Run pipeline
results = pipeline.run(target_column="target")
print(f"Model accuracy: {results['test_score']:.4f}")
Adding Crafters Individually¶
# Create empty pipeline
pipeline = MLFChain()
# Add crafters one by one
pipeline.add_crafter(DataIngestCrafter(data_path="data.csv"))
pipeline.add_crafter(CleanerCrafter(strategy="median"))
pipeline.add_crafter(ModelCrafter(model_name="xgboost"))
# Run pipeline
results = pipeline.run(target_column="target")
Error Handling¶
try:
results = pipeline.run(target_column="target")
print(f"Success! Model accuracy: {results['test_score']:.4f}")
print(f"Model saved to: {results.get('deployment_path', 'Not saved')}")
except RuntimeError as e:
print(f"Pipeline failed: {e}")
except ValueError as e:
print(f"Configuration error: {e}")