Quickstart

This quickstart guide will walk you through creating and running your first Orchestrator pipeline in under 5 minutes.

Your First Pipeline

Let’s create a simple pipeline that generates a summary of any topic.

Step 1: Create the Pipeline Definition

Create a file called summarize.yaml:

name: topic-summarizer
description: Generate a concise summary of any topic

inputs:
  topic:
    type: string
    description: The topic to summarize
    required: true

  length:
    type: integer
    description: Approximate word count for the summary
    default: 200

outputs:
  summary:
    type: string
    value: "{{ inputs.topic }}_summary.txt"

steps:
  - id: research
    action: generate_content
    parameters:
      prompt: |
        Research and provide key information about: {{ inputs.topic }}
        Focus on the most important and interesting aspects.
      max_length: 500

  - id: summarize
    action: generate_summary
    parameters:
      content: "$results.research"
      target_length: "{{ inputs.length }}"
      style: <AUTO>Choose appropriate style for the topic</AUTO>

  - id: save_summary
    action: write_file
    parameters:
      path: "{{ outputs.summary }}"
      content: "$results.summarize"

Step 2: Run the Pipeline

Create a Python script to run your pipeline:

import orchestrator as orc

# Initialize the model pool
orc.init_models()

# Compile the pipeline
pipeline = orc.compile("summarize.yaml")

# Run with different topics
result1 = pipeline.run(
    topic="quantum computing",
    length=150
)

result2 = pipeline.run(
    topic="sustainable energy",
    length=250
)

print(f"Created summaries: {result1}, {result2}")

Step 3: Check the Results

Your pipeline will create two files: - quantum_computing_summary.txt - sustainable_energy_summary.txt

Each contains a tailored summary of the specified length.

Building More Complex Pipelines

Research Report Pipeline

Let’s create a more sophisticated pipeline that generates research reports:

name: research-report-generator
description: Generate comprehensive research reports with citations

inputs:
  topic:
    type: string
    required: true
  focus_areas:
    type: array
    description: Specific areas to focus on
    default: []

outputs:
  report_pdf:
    type: string
    value: "reports/{{ inputs.topic }}_report.pdf"

steps:
  # Web search for recent information
  - id: search_recent
    action: search_web
    parameters:
      query: "{{ inputs.topic }} 2024 latest developments"
      max_results: 10

  # Search academic sources
  - id: search_academic
    action: search_web
    parameters:
      query: "{{ inputs.topic }} research papers scholarly"
      max_results: 5

  # Compile all sources
  - id: compile_sources
    action: compile_markdown
    parameters:
      sources:
        - "$results.search_recent"
        - "$results.search_academic"
      include_citations: true

  # Generate the report
  - id: write_report
    action: generate_report
    parameters:
      research: "$results.compile_sources"
      topic: "{{ inputs.topic }}"
      focus_areas: "{{ inputs.focus_areas }}"
      style: "academic"
      sections:
        - "Executive Summary"
        - "Introduction"
        - "Current State"
        - "Recent Developments"
        - "Future Outlook"
        - "Conclusions"

  # Quality check
  - id: validate
    action: validate_report
    parameters:
      report: "$results.write_report"
      checks:
        - completeness
        - citation_accuracy
        - readability

  # Generate PDF
  - id: create_pdf
    action: "!pandoc -o {{ outputs.report_pdf }} --pdf-engine=xelatex"
    parameters:
      input: "$results.write_report"

Working with Tools

Orchestrator automatically detects and configures tools based on your pipeline actions.

Available Tool Actions

Web Tools:

# Web search
- action: search_web
  parameters:
    query: "your search query"

# Scrape webpage
- action: scrape_page
  parameters:
    url: "https://example.com"

System Tools:

# Run shell commands (prefix with !)
- action: "!ls -la"

# File operations
- action: read_file
  parameters:
    path: "data.txt"

- action: write_file
  parameters:
    path: "output.txt"
    content: "Your content"

Data Tools:

# Process data
- action: transform_data
  parameters:
    input: "$results.previous_step"
    operations:
      - type: filter
        condition: "value > 100"

# Validate data
- action: validate_data
  parameters:
    data: "$results.data"
    schema:
      type: object
      required: ["name", "value"]

Using AUTO Tags

AUTO tags let AI models make intelligent decisions:

steps:
  - id: analyze
    action: analyze_data
    parameters:
      data: "$results.fetch"
      method: <AUTO>Choose best analysis method based on data type</AUTO>
      visualization: <AUTO>Determine if visualization would be helpful</AUTO>
      depth: <AUTO>Set analysis depth (shallow/medium/deep)</AUTO>

The AI model will examine the context and make appropriate choices.

Pipeline Composition

You can compose pipelines from smaller, reusable components:

name: composite-pipeline

imports:
  - common/data_fetcher.yaml as fetcher
  - common/validator.yaml as validator

steps:
  # Use imported pipeline
  - id: fetch_data
    pipeline: fetcher
    parameters:
      source: "api"

  # Local step
  - id: process
    action: process_data
    parameters:
      data: "$results.fetch_data"

  # Use another import
  - id: validate
    pipeline: validator
    parameters:
      data: "$results.process"

Error Handling

Add error handling to make pipelines robust:

steps:
  - id: risky_operation
    action: fetch_external_data
    parameters:
      url: "{{ inputs.data_source }}"
    error_handling:
      retry:
        max_attempts: 3
        backoff: exponential
      fallback:
        action: use_cached_data
        parameters:
          cache_key: "{{ inputs.topic }}"

Debugging Pipelines

Enable debug mode for detailed execution logs:

import logging
import orchestrator as orc

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# Compile with debug flag
pipeline = orc.compile("pipeline.yaml", debug=True)

# Run with verbose output
result = pipeline.run(
    topic="test",
    _verbose=True,
    _step_callback=lambda step: print(f"Executing: {step.id}")
)

Best Practices

Use Descriptive IDs: Make step IDs self-documenting
Leverage Templates: Use Jinja2 templates for dynamic values
Handle Errors: Always consider what could go wrong
Validate Inputs: Define clear input schemas
Document Purpose: Add descriptions to pipelines and steps

Next Steps

Now that you’ve built your first pipelines:

Explore Tutorials for in-depth tutorials
Check out examples/index for real-world examples
Learn about Key Concepts for deeper understanding
Review the API Reference for advanced features

Tip

Try modifying the examples above to create your own custom pipelines. The best way to learn is by experimenting!