Key Concepts
Understanding the fundamental concepts behind Orchestrator will help you build more effective pipelines and make the most of the framework’s capabilities.
Pipelines
A pipeline is a collection of interconnected tasks that work together to achieve a specific goal. Think of it as a recipe or workflow that can be executed automatically.
Input-Agnostic Design
One of Orchestrator’s core innovations is input-agnostic pipelines. This means a single pipeline definition can work with different inputs to produce different outputs:
# One pipeline definition
name: research-pipeline
inputs:
topic: { type: string, required: true }
depth: { type: string, default: "medium" }
steps:
- id: research
action: search_web
parameters:
query: "{{ inputs.topic }}"
# Many different uses
pipeline = orc.compile("research-pipeline.yaml")
ai_report = pipeline.run(topic="artificial intelligence")
climate_report = pipeline.run(topic="climate change")
space_report = pipeline.run(topic="space exploration")
This design promotes reusability and maintainability - write once, use many times.
Tasks
A task is the fundamental unit of work in a pipeline. Each task represents a single operation or action.
Task Anatomy
Every task has these key components:
- id: unique_identifier # Required: Unique name
action: what_to_do # Required: Action to perform
description: "What it does" # Optional: Human description
parameters: # Optional: Input parameters
key: value
depends_on: [other_task] # Optional: Dependencies
condition: "when_to_run" # Optional: Conditional execution
Task Dependencies
Tasks can depend on other tasks, creating execution ordering:
steps:
- id: fetch_data
action: download_file
parameters:
url: "{{ inputs.data_url }}"
- id: process_data
depends_on: [fetch_data] # Runs after fetch_data
action: transform_data
parameters:
data: "$results.fetch_data"
- id: save_results
depends_on: [process_data] # Runs after process_data
action: write_file
parameters:
content: "$results.process_data"
Templates and References
Orchestrator uses Jinja2 templating to make pipelines dynamic and data-driven.
Template Syntax
# Access input values
query: "{{ inputs.search_term }}"
# Reference results from other tasks
data: "$results.previous_task"
# Use filters and functions
filename: "{{ inputs.name | slugify }}.pdf"
# Conditional expressions
mode: "{{ 'advanced' if inputs.premium else 'basic' }}"
Runtime vs Compile-Time Resolution
Templates are resolved at different stages:
Compile-time: Static values resolved when pipeline is compiled
Runtime: Dynamic values resolved during execution
steps:
- id: example
parameters:
# Compile-time: resolved once during compilation
timestamp: "{{ compile_time.now }}"
# Runtime: resolved during each execution
user_input: "{{ inputs.query }}"
previous_result: "$results.other_task"
Tools and Actions
Tools provide real-world capabilities to your pipelines - they’re how pipelines interact with the outside world.
Tool Categories
Web Tools: - Search the internet - Scrape websites - Interact with web pages
System Tools: - Execute commands - Manage files - Run scripts
Data Tools: - Process and transform data - Validate information - Convert between formats
AI Tools: - Generate content - Analyze text - Extract information
Action Names
Actions are how you invoke tools in pipelines:
# Web search
- action: search_web
parameters:
query: "machine learning"
# File operations
- action: write_file
parameters:
path: "output.txt"
content: "Hello world"
# Shell commands (prefix with !)
- action: "!ls -la"
# AI generation
- action: generate_content
parameters:
prompt: "Write a summary about {{ topic }}"
Automatic Tool Detection
Orchestrator automatically detects required tools from your pipeline:
steps:
- action: search_web # → Requires web tool
- action: "!python script.py" # → Requires terminal tool
- action: write_file # → Requires filesystem tool
Tools are registered and made available via the Model Context Protocol (MCP).
Models and Intelligence
Models provide the AI capabilities that power AUTO tag resolution and content generation.
Model Types
Local Models (via Ollama): - Run on your machine - No API costs - Privacy and control
Cloud Models (OpenAI, Anthropic): - Powerful capabilities - API-based - Pay per use
Specialized Models: - Code generation - Data analysis - Specific domains
Intelligent Model Selection
Orchestrator chooses the best model for each task based on:
Task requirements (reasoning, coding, analysis)
Available resources (memory, GPU, time)
Performance history (success rates, quality scores)
Cost considerations (API costs, efficiency)
# Models are selected automatically
registry = orc.init_models()
# Available models are ranked by capability
print(registry.list_models())
# ['ollama:gemma2:27b', 'ollama:llama3.2:1b', 'huggingface:gpt2']
State Management
State management ensures pipeline reliability and recovery.
Checkpointing
Orchestrator can save pipeline state at task boundaries:
config:
checkpoint: true # Enable automatic checkpointing
steps:
- id: expensive_task
action: long_running_process
checkpoint: true # Force checkpoint after this step
Recovery
If a pipeline fails, it can resume from the last checkpoint:
# Pipeline fails at step 5
pipeline.run(inputs) # Fails
# Resume from last checkpoint
pipeline.resume() # Continues from step 4
This is especially valuable for: - Long-running pipelines - Expensive operations - Unreliable external services
Control Systems
Control systems are the execution engines that run your pipelines.
Built-in Control Systems
MockControlSystem: - For testing and development - Simulates tool execution - Fast and predictable
ToolIntegratedControlSystem: - Real tool execution - Full MCP integration - Production-ready
Custom Control Systems
You can create custom control systems for specific needs:
from orchestrator.core.control_system import ControlSystem
class MyControlSystem(ControlSystem):
async def execute_task(self, task, context):
# Custom execution logic
pass
Pipeline Composition
Complex workflows can be built by composing smaller pipelines.
Pipeline Imports
imports:
- common/validation.yaml as validator
- workflows/analysis.yaml as analyzer
steps:
- id: validate
pipeline: validator
inputs:
data: "{{ inputs.raw_data }}"
- id: analyze
pipeline: analyzer
inputs:
validated_data: "$results.validate"
Modular Design
This enables: - Reusability: Share common patterns - Maintainability: Update once, use everywhere - Collaboration: Teams can work on different components - Testing: Test pipelines in isolation
Error Handling
Robust pipelines handle errors gracefully.
Error Strategies
steps:
- id: risky_task
action: external_api_call
error_handling:
# Retry with backoff
retry:
max_attempts: 3
backoff: exponential
# Fallback action
fallback:
action: use_cached_data
# Continue pipeline on error
continue_on_error: true
Error Types
Network errors: Connection failures, timeouts
Data errors: Invalid formats, missing fields
Logic errors: Failed validation, impossible conditions
Resource errors: Out of memory, disk space
Performance Concepts
Understanding performance helps you build efficient pipelines.
Parallel Execution
Tasks without dependencies can run in parallel:
steps:
# These run in parallel
- id: source1
action: fetch_data_a
- id: source2
action: fetch_data_b
# This waits for both
- id: combine
depends_on: [source1, source2]
action: merge_data
Caching
Expensive operations can be cached:
steps:
- id: expensive_computation
action: complex_analysis
cache:
enabled: true
key: "{{ inputs.data_hash }}"
ttl: 3600 # 1 hour
Resource Management
Control resource usage:
config:
resources:
max_memory: "8GB"
max_threads: 4
gpu_enabled: false
Security Concepts
Security is built into Orchestrator’s design.
Sandboxing
Code execution happens in isolated environments: - Docker containers for full isolation - Restricted permissions for file access - Network controls for external access
Input Validation
All inputs are validated:
inputs:
email:
type: string
validation:
pattern: "^[\\w.-]+@[\\w.-]+\\.\\w+$"
amount:
type: number
validation:
min: 0
max: 10000
Secret Management
Sensitive data is handled securely:
parameters:
api_key: "{{ env.SECRET_API_KEY }}" # From environment
password: "{{ vault.db_password }}" # From secret vault
Best Practices
Design Principles
Single Responsibility: Each task does one thing well
Loose Coupling: Tasks don’t depend on implementation details
High Cohesion: Related tasks are grouped together
Fail Fast: Validate inputs and catch errors early
Idempotent: Running the same pipeline multiple times is safe
Pipeline Organization
pipelines/
├── common/ # Shared components
│ ├── validation.yaml
│ └── formatting.yaml
├── workflows/ # Complete workflows
│ ├── research.yaml
│ └── analysis.yaml
└── specialized/ # Domain-specific
├── finance.yaml
└── healthcare.yaml
Testing Strategy
Unit test individual tasks
Integration test complete pipelines
Smoke test with real data
Performance test under load
Next Steps
Now that you understand the concepts:
Practice with the Tutorials
Explore the API Reference for detailed reference
Build your own pipelines for real problems
Share your patterns with the community
Tip
The best way to internalize these concepts is to start building. Begin with simple pipelines and gradually add complexity as you become more comfortable with the framework.