Study Specification
The study specification is the main definition of work and is the main record of a user specified workflow. A complete specification is made up of the following components:
Key | Required? | Description |
---|---|---|
description |
Yes | General information about a study and its high-level purpose |
batch |
No | Settings for submission to batch systems |
env |
No | Fixed constants and other values that are globally set and referenced |
study |
Yes | Tasks that the study is composed of and are executed in a defined order |
global.parameters |
No | Parameters that are user varied and applied to the workflow |
This page will break down the keys available in each section and what they provide. But first, a look at the DSL embedded in the specification that appears in multiple sections.
Tokens: Maestro's Minimal Workflow DSL
Maestro takes a minimalist approach to it's workflow language features that are available in the study specification. All of this is contained in the token replacement hooks available in the study
and env
blocks. These tokens are referenced using the $(TOKEN_NAME)
syntax, with the $( )
encapsulating this minimal dsl.
Note
These tokens are currently limited to simple data types owing to the initial design being aimed at injecting values onto command lines. This means array and dict types are supported in the current version of this DSL.
Default Tokens
There are a few special tokens that are always available:
Name | Description | Notes |
---|---|---|
$(SPECROOT) |
This defines the base path of the study specification | This provides a portable relative path to use with associated dependencies and supporting scripts/tools |
$(OUTPUT_PATH) |
This is the path to the current study instance | OUTPUT_PATH can be specified in the env block's variables group providing a way to group the timestamped instance directories instead of polluting the $(SPECROOT) path |
$(LAUNCHER) |
Abstracts HPC scheduler specific job launching wrappers such as srun (SLURM) | Primary mechanism for making study steps system agnostic and portable |
$(WORKSPACE) |
Can be used in a step to reference the current step path | |
$(<step_name>.workspace) |
Can be used in a step to reference path to other previous step workspaces | <step-name> is the name key in each study step |
The following example shows use of the first two to help with study inputs and outputs. This uses $(SPECROOT)
to access a supporting tool that lives alongside the study specification and then writes all instances of this study (timestamped directories) into a SAMPLE_STUDY_OUTPUTS
directory to prevent pollution of the directory the study is invoked from.
env:
variables:
SUPPORTING_TOOL1: my_helper_tool.py
labels:
SUPPORTING_TOOL1_PATH: $(SPECROOT)/SUPPORTING_TOOL1
study:
- name: sample-step-1
description: sample study step
run:
cmd: |
cp $(SUPPORTING_TOOL1_PATH) .
# Run the tool
python $(SUPPORTING_TOOL1) -o sample_output.yaml
Environment Tokens
In the env
block every key in the variables
and labels
blocks can be referenced as a token. Additionally, the dependencies entries can be referenced via tokens, with the tokens being the name
keys in each one.
env:
variables:
VAR1: value1
VAR2: value2
MODEL1: my_model.input
labels:
PATH1: /dev/$(VAR2)
dependencies:
paths:
- name: CODE
path: /path/to/simulation/code
git:
- name: MODEL_REPO
path: $(OUTPUT_PATH)
url: https://your.git.host/models.git
tag: 2.9.15
study:
- name: step1
description: just a sample step
run:
cmd: |
echo "The value of 'VAR1' is $(VAR1)"
echo "And this is the value of 'PATH1': $(PATH1)"
cp $(MODEL_REPO)/$(MODEL1) .
$(LAUNCHER) $(CODE) -in $(MODEL1)
procs: 1
nodes: 1
walltime: "00:01:00"
Parameter Tokens
Parameters follow the convention in the env
's variables
and labels
blocks where the token name is the key in the global.parameters
block (or the pgen
equivalent). The big difference with substitution of parameter tokens is that only single values are replaced. The expansion process will create one step per value in these tokens, and so using them in your steps/labels is akin to working with a single instance. Additionally parameters have a string formatted representation in the label
key which can be accessed similar to step workspaces: (PARAM1.label)
. In the below example this is combined with the OUTPUTNAME
label to include the parameter label in the steps generated output files in place of a more generic single name for all instances of the step. Three files will be output by the model in this case: MODEL_OUTPUT_PARAM1.1.out
, MODEL_OUTPUT_PARAM1.2.out
, and MODEL_OUTPUT_PARAM1.3.out
.
env:
variables:
VAR1: value1
VAR2: value2
MODEL1: my_model.input
labels:
PATH1: /dev/$(VAR2) # (1)
OUTPUTNAME: MODEL_OUTPUT_$(PARAM1.label).out #(2)
dependencies:
paths:
- name: CODE
path: /path/to/simulation/code
git:
- name: MODEL_REPO
path: $(OUTPUT_PATH)
url: https://git-url.llnl.gov/models.git
tag: 2.9.15
study:
- name: step1
description: just a sample step
run:
cmd: |
echo "The value of 'VAR1' is $(VAR1)"
echo "And this is the value of 'PATH1': $(PATH1)"
cp $(MODEL_REPO)/$(MODEL1) .
$(LAUNCHER) $(CODE) -in $(MODEL1) -out $(OUTPUTNAME)
procs: 1
nodes: 1
walltime: "00:01:00"
global.parameters:
PARAM1:
values: [1, 2, 3]
label: PARAM1.%%
- Build a label by substituting in the value of the
$(VAR2)
variable - Build a label by substituting in the label string of the
$(PARAM1)
parameter: happens at study/parameter expansion time
Description: description
This section is meant primarily for documentation purposes, providing a general overview of what this study is meant to achieve. This is both an important part of the provenance of the instantiated studies (via the workspace copy) and to enhance the shareability of the study with other users.
Key | Required? | Type | Description |
---|---|---|---|
name |
Yes | str | Name of the study that is easily identifiable/indicative of purpose |
description |
Yes | str | A short overview/description of what this study intends to achieve |
description:
name: lulesh_sample1
description: |
A sample LULESH study that downloads, builds, and runs a parameter study
of varying problem sizes and iterations.
Note
You can add other keys to this block for custom documentation. Maestro currently only verifies the presence of the required set enumerated above.
Environment: env
The environment block is where items describing the study's environment are defined. This includes static information that the study needs to know about and dependencies that the workflow requires for execution. This is a good place for global parameters that aren't varying in each step.
Note
This block isn't strictly required as a study may not depend on anything.
Key/Subsection | Description |
---|---|
variables | Static values that are substituted into steps ahead of all other values |
labels | Static values that can contain variables and parameters which, like variables, can be substituted into all steps |
dependencies | Items that must be "acquired" before the workflow can proceed |
Variables: variables
Variables represent static, one-time substitutions into the steps that make a study. Variables are great for encouraging consistency throughout a workflow, and are useful for things like propagating fixed settings or setting control logic flags. These are similar in concept to Unix environment variables, but are more portable.
There are some special tokens/variables available in Maestro specifications, the first of which is shown above: OUTPUT_PATH
. This is a keyword variable that Maestro looks for in order to set a custom output path for concrete study instance workspaces. These workspaces are usually timestamped folder names based on the name
in the description block, stored inside OUTPUT_PATH
. The OUTPUT_PATH
can use relative pathing semantics, as shown above, where the ./
is starting from the same parent directory as the study specification is read from.
Note
If not specified OUTPUT_PATH
is assumed to be the path where Maestro was launched from.
Note
If the '-o' flag is specified for the run subcommand, OUTPUT_PATH
will be taken from there and will not generate a timestamped path.
Labels: labels
Labels are similar to variables, representing static, one-time substitutions into steps. The difference from variables is that they support variable and parameter substitution. This functionality can be useful for enforcing fixed formatting on output files, or fixed formatting of components of steps.
env:
labels:
outfile: $(SIZE.label).$(ITERATIONS.label).log # (1)
...
global.parameters:
SIZE:
values: [10, 20, 30]
label: SIZE.%%
ITERATIONS:
values: [100, 200, 300]
label: ITERATIONS.%%
- Dynamic label construction based on parameter values. Each step/parameter combo will also have a corresponding label
Dependencies: dependencies
Dependencies represent external artifacts that should be present before a workflow can run. This includes things such as acquirable inputs from a directory or version control system/repository, e.g. input files for programs, code, data, etc... They can be used in steps via Maestro's token syntax using each dependencies name
key as the token name. Labels and variables
can also be used in the definition of these dependencies, as shown in the example
There are currently two types of dependencies:
-
paths
: verifies the existence of the specified path before execution. This is a list of (-
prefixed) dictionaries of paths to acquire. If a path's existence cannot be verified, then Maestro will throw an exception and halt the study launching process.Key Required? Type Description name
Yes str Unique name for the identifying/referring to the path dependency path
Yes str Path to acquire and make available for substitution into string data/steps Info
A path dependency will only check for the exact path that is specified. Maestro will not attempt to verify any sub-paths or sub-directories underneath that path.
-
git
: clones the specified repository before excution of the study. This is a list of (-
prefixed) dictionaries of repositories to cloneKey Required? Type Description name
Yes str Unique name for the identifying/referring to repository dependency path
Yes str Parent path in which to clone the repo to url
Yes str Url/path to repo to clone tag
No str Optional git tag to checkout after cloning
The git
type dependency will attempt to clone the specified remote repository, and on success continue onto the next step in the launch process; however, if the clone fails then the process will throw an exception without launching any part of the workflow.
Batch: batch
The batch
block is an optional block that enables specification of HPC scheduler information to enable writing steps that are decoupled from particular machines and thus more portable/reusable. The base/general keys that show up in this block are shown below. Each scheduler type may have some unique keys, and further discussion will be in
Key | Required | Type | Description |
---|---|---|---|
type |
Yes | str | Type of scheduler managing execution. Currently one of: {local , slurm , lsf , flux } |
shell |
No | str | Optional specification path to shell to use for execution. Defaults to "/bin/bash" |
bank |
Yes | str | Account to charge computing time to |
host |
Yes | str | The name of the cluster to execute this study on |
queue |
Yes | str | Scheduler queue/partition to submit jobs (study steps) to |
nodes |
No | int | Number of compute nodes to be reserved for jobs: note this is also a per step key |
reservation |
No | str | Optional reserved allocation to submit jobs to |
qos |
No | str | Quality of service specification -> i.e. run in standby mode to use idle resources when user priority is low/job limits already reached |
gpus |
No | str | Optional reservation of gpu resources for jobs |
procs |
No | int | Optional number of tasks in batch allocations: note this is also a per step key |
flux_uri |
Yes* | str | Uri of flux instance to schedule jobs to: only required with type =flux |
version |
No | str | Optional version of flux scheduler; for accomodating api changes |
args |
No | dict | Optional additional args to pass to scheduler; keys are arg names, values are arg values |
Study: study
The study
block is where the steps to be executed in the Maestro study are defined. This section represents the unexpanded set of tasks that the study is composed of. Here, unexpanded means no parameter substitution; the steps only contain references to the parameters. Steps are given as a list (-
prefixed) dictionaries of keys:
Key | Required? | Type | Description |
---|---|---|---|
name |
Yes | str | Unique name for identifying and referring to a task |
description |
Yes | str | A general description of what this step is intended to do |
run |
Yes | dict | Properties that describe the actual specification of the task |
Note
Unlike the previous blocks, almost every key in the study section can accept parameter tokens. The primary benefit of this is in the resource specification keys, allowing easy parameterization of numbers of tasks, cores, nodes, walltime, etc on a per step basis.
run
:
The run
key contains several other keys that define what a task does and how it relates to other tasks. This is where you define the concrete shell commands the task needs to execute, any parameter
or env
tokens to inject, and step/task dependencies that dictate the topology of the study task graph.
Key | Required? | Type | Description |
---|---|---|---|
cmd |
Yes | str | The actual task (shell commands) to be executed |
depends |
Yes | list | List of other tasks which must successfully execute before this task can be executed |
restart |
No | str | Similar to cmd , providing optional alternate commands to run upon restarting, e.g. after a scheduler timeout |
There are also a number of optional keys for describing resource requirements to pass to the scheduler and associated $(LAUNCHER)
tokens used to execute applications on HPC systems. Presence of the nodes
and/or procs
keys have particular importance here: they tell Maestro that this step needs to be scheduled and not run locally on login nodes.
Key | Required? | Type | Description |
---|---|---|---|
nodes |
No | str | Number of nodes to reserve for executing this task |
procs |
No | str | Number of processors needed for task execution: primarily used by $(LAUNCHER) expansion |
walltime |
No | str | Specifies maximum amount of time to reserve HPC resources for |
reservation |
No | str | Reservation to schedule this step to; overrides batch block |
qos |
No | str | Quality of service options for this step; overrides batch block |
Additionally there are more fine grained resource/scheduler control enabled by the various schedulers.
Note
The remaining keys have been gradually appended and thus are not uniform across schedulers. Version 2.0 of the study specification will be refactoring these into a uniform/portable set. Full documentation/explanation of the resource keys can be seen in the scheduler specific sections: Local, SLURM, Flux, LSF
The following keys are all optional and get into scheduler specific features. See the respective sections before using them:
Key | Type | Description |
---|---|---|
cores per task |
str/int | Number of cores to use for each task |
exclusive |
str | Flag for ensuring batch job has exclusive access to it's requested resources |
gpus |
str/int | Number of gpus to allocate with this step |
tasks per rs |
str/int | Number of tasks per resource set (LSF/jsrun) |
rs per node |
str/int | Number of resource sets per node |
cpus per rs |
str/int | Number of cpus in each resource set |
bind |
str | Controls binding of tasks in resource sets |
bind gpus |
str | Controls binding of gpus in resource sets |
Parameters: global.parameters
The global.parameters
block of the specification contains all of the things that you are going to vary in the
study; this is where we setup the parameter tokens and values that get substituted into the study steps. This
block is optional, and when present the defined study steps get expanded into one concrete instance per parameter
combination specified in this block. These parameter combinations are defined as a set of keys that are the
parameter names (for use in steps via the $(PARAM)
token syntax), and then dictionaries defining the list of
values in each parameter combination and a format string for constructing labels from those values.
Parameter keys:
Key | Required? | Type | Description |
---|---|---|---|
values |
Yes | list | List of values in each parameter combination used to expand the study |
label |
Yes | str | Format string for constructing labels from the values list |
In the example below we are generating three parameters with the tokens $(TRIAL)
, $(SIZE)
, and $(ITERATIONS)
that can reference them in study steps and the environments' labels block. We are also constructing 9 parameter
combinations, meaning that steps that use these variables will have 9 instances, one for each parameter combination.
The label format syntax is currently limited to simple str(param)
type formatting that injects the string form of
the parameter values in place of the %%
placeholder, with the parameter name prefix here being a user settable
string. For TRIAL
, the paramters block below will create the following labels: TRIAL.1, TRIAL.2, TRIAL.3,
TRIAL.4, TRIAL.5, TRIAL.6, TRIAL.7, TRIAL.8, TRIAL.9
. These labels get used in naming of the directories created
for each steps' outputs as well as in the logging.
global.parameters:
TRIAL:
values : [1, 2, 3, 4, 5, 6, 7, 8, 9]
label : TRIAL.%%
SIZE:
values : [10, 10, 10, 20, 20, 20, 30, 30, 30]
label : SIZE.%%
ITERATIONS:
values : [10, 20, 30, 10, 20, 30, 10, 20, 30]
label : ITER.%%
For more programmatic creation of these parameter combinations, see the section on the pgen
functionality . This alternate mode acts like an override of the global.parameters
block and is injected at run time rather
than being baked into the study specification.
Full Example
Finally, we can pull all of this together into a complete example. This and other versions of the lulesh study specification and other problems can be found in the samples directory in the repo: samples
description:
name: lulesh_sample1
description: A sample LULESH study that downloads, builds, and runs a parameter study of varying problem sizes and iterations.
env:
variables:
# DEFAULTS FOR MONTECARLO PGEN EXAMPLE
SMIN: 10
SMAX: 30
TRIALS: 50
ITER: 100
OUTPUT_PATH: ./sample_output/lulesh
labels:
outfile: $(SIZE.label).$(ITERATIONS.label).log
dependencies:
git:
- name: LULESH
path: $(OUTPUT_PATH)
url: https://github.com/LLNL/LULESH.git
tag: 2.0.3
study:
- name: make-lulesh
description: Build the serial version of LULESH.
run:
cmd: |
cd $(LULESH)
sed -i 's/^CXX = $(MPICXX)/CXX = $(SERCXX)/' ./Makefile
sed -i 's/^CXXFLAGS = -g -O3 -fopenmp/#CXXFLAGS = -g -O3 -fopenmp/' ./Makefile
sed -i 's/^#LDFLAGS = -g -O3/LDFLAGS = -g -O3/' ./Makefile
sed -i 's/^LDFLAGS = -g -O3 -fopenmp/#LDFLAGS = -g -O3 -fopenmp/' ./Makefile
sed -i 's/^#CXXFLAGS = -g -O3 -I/CXXFLAGS = -g -O3 -I/' ./Makefile
make clean
make
depends: []
- name: run-lulesh
description: Run LULESH.
run:
cmd: |
$(LULESH)/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p > $(outfile)
depends: [make-lulesh]
- name: post-process-lulesh
description: Post process all LULESH results.
run:
cmd: |
echo "Unparameterized step with Parameter Independent dependencies." >> out.log
echo $(run-lulesh.workspace) >> out.log
ls $(run-lulesh.workspace) >> out.log
depends: [run-lulesh_*]
- name: post-process-lulesh-trials
description: Post process all LULESH results.
run:
cmd: |
echo "Parameterized step that has Parameter Independent dependencies" >> out.log
echo "TRIAL = $(TRIAL)" >> out.log
echo $(run-lulesh.workspace) >> out.log
ls $(run-lulesh.workspace) >> out.log
depends: [run-lulesh_*]
- name: post-process-lulesh-size
description: Post process all LULESH results.
run:
cmd: |
echo "Parameterized step that has Parameter Independent dependencies" >> out.log
echo "SIZE = $(SIZE)" >> out.log
echo $(run-lulesh.workspace) >> out.log
ls $(run-lulesh.workspace) | grep $(SIZE.label) >> out.log
depends: [run-lulesh_*]
global.parameters:
TRIAL:
values : [1, 2, 3, 4, 5, 6, 7, 8, 9]
label : TRIAL.%%
SIZE:
values : [10, 10, 10, 20, 20, 20, 30, 30, 30]
label : SIZE.%%
ITERATIONS:
values : [10, 20, 30, 10, 20, 30, 10, 20, 30]
label : ITER.%%
description:
name: lulesh_sample1
description: A sample LULESH study that downloads, builds, and runs a parameter study of varying problem sizes and iterations on SLURM.
env:
variables:
OUTPUT_PATH: ./sample_output/lulesh
labels:
outfile: $(SIZE.label).$(ITERATIONS.label).log
dependencies:
git:
- name: LULESH
path: $(OUTPUT_PATH)
url: https://github.com/LLNL/LULESH.git
batch:
type : slurm
host : quartz
bank : baasic
queue : pbatch
gres : ignore
reservation : test_reservation
study:
- name: make-lulesh
description: Build the MPI enabled version of LULESH.
run:
cmd: |
module load cmake/3.13.4
cd $(LULESH)
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DMPI_CXX_COMPILER=`which mpicxx` ..
make
depends: []
- name: run-lulesh
description: Run LULESH.
run:
cmd: |
$(LAUNCHER) $(LULESH)/build/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p > $(outfile)
depends: [make-lulesh]
nodes: 2
procs: 27
exclusive : True
walltime: "00:10:00"
global.parameters:
SIZE:
values : [100, 100, 100, 200, 200, 200, 300, 300, 300]
label : SIZE.%%
ITERATIONS:
values : [100, 200, 300, 100, 200, 300, 100, 200, 300]
label : ITER.%%
description:
name: lulesh_sample1_lsf
description: A sample LULESH study that downloads, builds, and runs mpi and openmp weak scaling modes on LSF
env:
variables:
OUTPUT_PATH: ./sample_output/lulesh
labels:
outfile: $(SIZE.label).$(ITERATIONS.label).$(CPUS_PER_TASK.label).$(TASKS.label).log
dependencies:
git:
- name: LULESH
path: $(OUTPUT_PATH)
url: https://github.com/LLNL/LULESH.git
batch: # NOTE: UPDATE THESE FOR YOUR SYSTEM
type : lsf
host : lassen
bank : wbronze
queue : pdebug
study:
- name: make-lulesh
description: Build the MPI+OpenMP enabled version of LULESH.
run:
cmd: |
# LLNL specific initialization of lmod setup on the allocation
source /etc/profile.d/z00_lmod.sh
# NOTE: ensure a compatible mpi install is available for specified compiler
module load gcc/8.3.1
module load cmake/3.14.5
cd $(LULESH)
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DWITH_MPI=On -D -DWITH_OMP=On -DMPI_CXX_COMPILER=`which mpicxx` ..
make
depends: []
nodes: 1
procs: 1
rs per node: 1
tasks per rs: 1
cpus per rs: 40
walltime: "10"
- name: run-lulesh
description: Run LULESH.
run:
cmd: |
# LLNL specific initialization of lmod setup on the allocation
source /etc/profile.d/z00_lmod.sh
# Ensure consistent mpi is active (LC systems reload mpi with compiler changes)
module load gcc/8.3
# Echo parallel resources for easier id in post
echo "NODES: $(NODES)"
echo "TASKS: $(TASKS)"
echo "CPUS_PER_TASK: $(CPUS_PER_TASK)"
# OPENMP settings
export OMP_NUM_THREADS=$(CPUS_PER_TASK)
echo "OPENMP THREADS: $OMP_NUM_THREADS"
$(LAUNCHER) $(LULESH)/build/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p >& $(outfile)
depends: [make-lulesh]
nodes: $(NODES)
procs: $(TASKS)
rs per node: $(TASKS)
cpus per rs: $(CPUS_PER_TASK)
exclusive : True
walltime: "00:10"
global.parameters:
SIZE:
values : [100, 100, 200]
label : SIZE.%%
ITERATIONS:
values : [100, 100, 100]
label : ITER.%%
TASKS:
values : [1, 8, 1]
label : TASKS.%%
CPUS_PER_TASK:
values : [1, 1, 8]
label : CPT.%%
NODES:
values : [1, 1, 1]
label : NODES.%%
description:
name: lulesh_sample1
description: A sample LULESH study that downloads, builds, and runs a parameter study of varying problem sizes and iterations on SLURM.
env:
variables:
OUTPUT_PATH: ./sample_output/lulesh
labels:
outfile: $(SIZE.label).$(ITERATIONS.label).log
dependencies:
git:
- name: LULESH
path: $(OUTPUT_PATH)
url: https://github.com/LLNL/LULESH.git
batch:
type : flux
host : quartz
bank : baasic
queue : pbatch
study:
- name: make-lulesh
description: Build the MPI enabled version of LULESH.
run:
cmd: |
cd $(LULESH)
mkdir build
cd build
cmake -WITH_MPI=Off -WITH_OPENMP=Off ..
make
depends: []
- name: run-lulesh
description: Run LULESH.
run:
cmd: |
$(LAUNCHER) $(LULESH)/build/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p > $(outfile)
depends: [make-lulesh]
nodes: 1
procs: 1
cores per task: 1
nested: True
priority: high
walltime: "00:02:00"
global.parameters:
SIZE:
values : [100, 100, 100, 200, 200, 200, 300, 300, 300]
label : SIZE.%%
ITERATIONS:
values : [10, 20, 30, 10, 20, 30, 10, 20, 30]
label : ITER.%%