Maestro

On GitHub

Bring rigor, reproducibility, and shareability to your computational science following the model set by the experimental disciplines. Specify computational processes in a generalized way such that they can be documented, shared, executed, and easily reproduced.

Why Maestro?

Maestro gives an easy path to automating and orchestrating your workflows, building upon your existing shell and batch (HPC scheduled scripts/tasks) script tasks to layer on parameterization, task dependencies, and output isolation. Additionally, Maestro's workflow specification layer enables documenting those scripts and their interdependencies if chaining them together, and makes them more repeatable and shareable for enhanced collaboration with your peers.

Getting Started is Quick and Easy

Install: build a virtual environment and install maestro into it (python 3.6 or greater)
```
python -m virtualenv maestroenv

source maestroenv/bin/activate

pip install maestrowf
```
And you now have the maestro command available in your shell

Create a YAML file named study.yaml and paste the following content into the file

description: # (1)
    name: hello_bye_parameterized_funnel
    description: A study that says hello and bye to multiple people, and a final good bye to all.

env:
    variables: # (2)
        OUTPUT_PATH: ./samples/hello_bye_parameterized_funnel
    labels: # (3)
        HELLO_FORMAT: $(GREETING)_$(NAME).txt
        BYE_FORMAT: $(FAREWELL)_$(NAME).txt

study:
    - name: say-hello
      description: Say hello to someone!
      run: # (4)
          cmd: |
            echo "$(GREETING), $(NAME)!" > $(HELLO_FORMAT)

    - name: say-bye
      description: Say bye to someone!
      run:
          cmd: |
            echo "$(FAREWELL), $(NAME)!" > $(BYE_FORMAT)
          depends: [say-hello] # (5)

    - name: bye-all
      description: Say bye to everyone!
      run:
          cmd: |
            echo "Good-bye, World!" > good_bye_all.txt
          depends: [say-bye_*]

global.parameters:
    NAME:  # (6)
        values: [Pam, Jim, Michael, Dwight]
        label: NAME.%%
    GREETING:
        values: [Hello, Ciao, Hey, Hi]
        label: GREETING.%%
    FAREWELL:
        values: [Goodbye, Farewell, So long, See you later]
        label: FAREWELL.%%

Mandatory name and description fields to encourage documenting your workflows in a meaningful way
Define single valued variable tokens for use in your workflow steps
Define nested tokens for combining variables and parameters for more flexible token/substitution in your workflow steps
cmd is a multline string, written in bash to harness the robust existing ecosystem of tools users are already familiar with
Specify step dependencies using steps name to control execution order
Define parameter tokens $(NAME) and lists of values to use in your steps such that Maestro can parameterize them for you

PHILOSOPHY: Maestro believes in the principle of a clearly defined process, specified as a list of tasks, that are self-documenting and clear in their intent.

Running the hello_world study is as simple as...
```
maestro run study.yaml
```

Maestro run turns your workflow process specification and parameters into a task graph and executes them for you

Hello World Workflow GraphHello World Executed Workspace

flowchart TD
    A(study root) --> COMBO1
    subgraph COMBO1 [Combo #1]
      subgraph say_hello1 [say-hello]
        B(Hello, Pam)
      end
      subgraph say_bye1 [say-bye]
        C(Goodbye, Pam)
      end
      say_hello1 --> say_bye1
    end
    A --> COMBO2
    subgraph COMBO2 [Combo #2]
      subgraph say_hello2 [say-hello]
        D(Ciao, Jim)
      end
      subgraph say_bye2 [say-bye]
        E(Farewell, Jim)
      end
      say_hello2 --> say_bye2
    end
    A --> COMBO3
    subgraph COMBO3 [Combo #3]
      subgraph say_hello3 [say-hello]
        F(Hey, Michael)
      end
      subgraph say_bye3 [say-bye]
        G(So long, Michael)
      end
      say_hello3 --> say_bye3
    end
    A --> COMBO4
    subgraph COMBO4 [Combo #4]
      subgraph say_hello4 [say-hello]
        H(Hi, Dwight)
      end
      subgraph say_bye4 [say-bye]
        I(See you later, Dwight)
      end
      say_hello4 --> say_bye4
    end
    J{{bye-all}}
    COMBO1 --> J
    COMBO2 --> J
    COMBO3 --> J
    COMBO4 --> J

Maestro manages the workspaces for each parameter set for you, ensuring isolation, and capturing the generated execution scripts with parameter/variable values inserted in place of the $(PARAMNAME) tokens

Maestro's Goals and Motivations

The primary goal of Maestro is to provide a lightweight tool for encouraging modularized workflow composition, a mental framework for thinking about these concepts, and a tool that users can flexibly utilize for a wide variety of use cases (science, software testing/deployment, and etc.). Maestro's vision aims to make steady progression towards making reproducible workflows user-friendly and easy to manage. We maintain a few high level principles:

Reproducibility is not free, nor is there a silver bullet to achieve it. It is a mixture of tooling, infrastructure, and best practices.
Workflow best practices should always be encouraged wherever possible, and enforced where reasonable.
It is not a workflow framework's place to force a user to use specific technologies -- a framework should couple minimally, but offer a high degree of flexibility.
Division of responsibility is critical. Data management, optimal performance, and orchestration are related but separate.
A workflow should be coherent and easy to communicate, with a framework providing users a mental framework to think and discuss such challenges.

We firmly believe that a user-friendly tool and environment that promotes provenance and best practices, while minimizing the effort needed for users to achieve progress, will greatly improve the quality of computational science.

Next Steps

Explore the complete set of features Maestro can bring to your computational workflows and how to use them in the User Guide:

Start from scratch with tutorials
Port an existing HPC (SLURM, LSF, etc) batch script
How-to guides
Full yaml input specification
Colleciton of example specifications in the repo's samples directory
and more

Or, read up on more of the philosophies behind Maestro's design: Philosopy.

Additionally, check out some of the contributed tools and recipes in the Maestro Sheetmusic repo. Contributions welcome if you create any generally useful tools in your workflow adventures with Maestro and want to share!

Contributors

Many thanks go to MaestroWF's contributors.

If you have any questions or to submit feature requests please open a ticket.

Release

MaestroWF is released under an MIT license. For more details see the NOTICE and LICENSE files.

LLNL-CODE-734340