Parsl configuration guide
Note
This guide focuses on Slurm-based configurations for brevity. Parsl also supports a wide range of resource providers and launchers beyond Slurm, and can run on many different systems. See the Parsl documentation for other workload managers.
exa-AMD uses Parsl to schedule each workflow phase on your system. exa-AMD typically uses Parsl configuration to specify Slurm accounts/queues, CPU/GPU placement, number of nodes, per-node workers/threads, walltime, and how the software environment is initialized on compute nodes.
Parsl can technically run the workflow on a local machine. However, exa-AMD is designed for supercomputers and typically requires substantial computational resources, so local execution is not recommended.
Executor labels
exa-AMD uses labels to link tasks to executors. A task’s label selects the executor it will run on. The executor, in turn, defines the computational resources used by the task (e.g., CPUs/GPUs, node count).
First, the label is used when registering the executor:
executor = HighThroughputExecutor(
label=EXECUTOR_LABEL,
...
)
Then, the same label is referenced by the task decorator to run that task on that executor:
@python_app(executors=[EXECUTOR_LABEL])
def task():
This keeps the workflow code independent of site details while precisely controlling resource placement per task.
exa-AMD uses fixed executor labels for each of the five workflow phases described in workflow.
Phase |
Parsl executor label |
|---|---|
Structure generation |
|
CGCNN prediction |
|
Structure selection |
|
VASP (DFT) |
|
Post-processing |
|
Selecting a config at runtime
At runtime, exa-AMD reads a JSON run configuration. The key:
parsl_config— selects which registered Parsl config to use (e.g.,"perlmutter_premium").
This value must match the registry name defined in the Python config, e.g.:
# https://github.com/ML-AMD/exa-amd/blob/main/parsl_configs/perlmutter.py#L154
register_parsl_config("perlmutter_premium", PerlmutterConfig)
Built-in configurations
The repository provides four registered configurations:
register_parsl_config("chicoma", ChicomaConfig)
register_parsl_config("chicoma_debug", ChicomaConfigDebug)
register_parsl_config("chicoma_debug_cpu", ChicomaConfigDebugCPU)
register_parsl_config("perlmutter_premium", PerlmutterConfig)
Select any of these by setting parsl_config accordingly in your run JSON.
Using the provided configs
For LANL Chicoma and NERSC Perlmutter:
Update ``worker_init`` in the config to load your site modules and activate your Conda environment on compute nodes.
Provide accounts and other runtime knobs (e.g., number of nodes) in your run JSON.
Additionally, update the different Slurm fields (e.g., qos, constraint,
launcher).
Registering a new config
If your site differs substantially you may want to register a new Parsl configuration:
Create a file under
parsl_configs/(e.g.,my_system.py).Implement a
Configsubclass with five executors (using the labels above).Register it with a unique name:
register_parsl_config("my_system", MySystemConfig)
Set
parsl_configto"my_system"in your run JSON.
Important
The registry name must be unique across all registered configs in your environment.
Resource allocation & placement
Parsl’s provider/executor fields map directly to the resources you request from Slurm.
- Node type
constraint: choose CPU vs GPU nodes (e.g.,"cpu"or"gpu").available_accelerators: GPUs per node (e.g., 4 on Perlmutter).
- How many nodes
nodes_per_block: nodes in one Slurm allocation.max_blocks/min_blocks/init_blocks: how many allocations Parsl may keep alive. - One multi-node allocation:nodes_per_block = N,max_blocks = 1. - Many single-node allocations:nodes_per_block = 1,max_blocks = N.
- Per-node concurrency
cores_per_worker: CPUs per Parsl worker.max_workers_per_node: limit on workers per node.
- Operational
accountandqos: indetical to Slurm equivalents.walltime: job time limit.worker_init: environment on compute nodes (e.g., modules).scheduler_options: raw#SBATCHdirectives when needed.
Quick mapping to Slurm
Nodes:
nodes_per_block→ roughly-N.GPUs per node:
available_accelerators→ akin to--gpus-per-node.CPU threads per worker:
cores_per_worker→ similar to--cpus-per-task(per worker).Multiple allocations:
max_blocks> 1 → multiple Slurm jobs managed by Parsl.
What the run JSON typically controls
Common knobs provided in the run JSON (names may vary slightly by version):
Parsl selection & accounts -
parsl_config: registry name of the site config (e.g.,"perlmutter_premium"). -cpu_account/gpu_account: Slurm accounts for CPU/GPU executors.Resource allocation & placement -
num_workers: CPU threads per worker (used by CPU-bound phases). -pre_processing_nnodes: node count for structure generation and CGCNN. -vasp_nnodes: node count for the VASP phase.
Full working example
For a complete configuration with five labeled executors and typical Slurm settings, see the Perlmutter config in the repository:
parsl_configs/perlmutter.py
Need help?
If you are setting up a new site configuration or encountering center-specific constraints, please open a discussion or issue:
Further reading
Parsl configuration guide: https://parsl.readthedocs.io/en/latest