Pipeline Basics¶
Learn the fundamental concepts of MLFCrafter pipelines and how to build effective ML workflows.
What is a MLFChain?¶
A MLFChain
is the core orchestrator in MLFCrafter that manages the execution of multiple processing steps (crafters) in sequence. It provides:
- Sequential Processing: Steps execute in order
- Data Flow Management: Automatic data passing between steps
- Error Handling: Graceful failure handling and rollback
- State Management: Track pipeline state and intermediate results
- Logging: Comprehensive logging of each step
Basic Pipeline Structure¶
from mlfcrafter import MLFChain
# Create a new pipeline
pipeline = MLFChain()
# Add processing steps
pipeline.add_crafter(CrafterA())
pipeline.add_crafter(CrafterB())
pipeline.add_crafter(CrafterC())
# Run the pipeline
result = pipeline.run()
Crafter Lifecycle¶
Each crafter goes through several phases during execution:
- Initialization: Crafter receives data from previous step
- Validation: Input data is validated
- Processing: Main logic execution
- Output: Results are prepared for next step
- Cleanup: Resources are released
Data Flow Between Crafters¶
Data flows automatically between crafters:
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter(data)) # Outputs: processed DataFrame
pipeline.add_crafter(CleanerCrafter()) # Inputs: DataFrame, Outputs: clean DataFrame
pipeline.add_crafter(ScalerCrafter()) # Inputs: DataFrame, Outputs: scaled DataFrame
pipeline.add_crafter(ModelCrafter()) # Inputs: DataFrame, Outputs: trained model
Conditional Execution¶
Add conditional logic to pipelines:
pipeline = MLFChain()
pipeline.add_crafter(DataIngestCrafter(data))
# Conditional cleaning based on data quality
if data.isnull().sum().sum() > 0:
pipeline.add_crafter(CleanerCrafter(strategy='impute'))
pipeline.add_crafter(ModelCrafter())
Next Steps¶
- Explore Model Training options
- Read about Deployment strategies