Parsl configuration guide

Note

This guide focuses on Slurm-based configurations for brevity. Parsl also supports a wide range of resource providers and launchers beyond Slurm, and can run on many different systems. See the Parsl documentation for other workload managers.

exa-AMD uses Parsl to schedule each workflow phase on your system. exa-AMD typically uses Parsl configuration to specify Slurm accounts/queues, CPU/GPU placement, number of nodes, per-node workers/threads, walltime, and how the software environment is initialized on compute nodes.

Parsl can technically run the workflow on a local machine. However, exa-AMD is designed for supercomputers and typically requires substantial computational resources, so local execution is not recommended.

Executor labels

exa-AMD uses labels to link tasks to executors. A task’s label selects the executor it will run on. The executor, in turn, defines the computational resources used by the task (e.g., CPUs/GPUs, node count).

First, the label is used when registering the executor:

executor = HighThroughputExecutor(
    label=EXECUTOR_LABEL,
    ...
)

Then, the same label is referenced by the task decorator to run that task on that executor:

@python_app(executors=[EXECUTOR_LABEL])
def task():

This keeps the workflow code independent of site details while precisely controlling resource placement per task.

exa-AMD uses fixed executor labels for each of the five workflow phases described in workflow.

Executor labels
Phase	Parsl executor label
Structure generation	`GENERATE_EXECUTOR_LABEL`
CGCNN prediction	`CGCNN_EXECUTOR_LABEL`
Structure selection	`SELECT_EXECUTOR_LABEL`
VASP (DFT)	`VASP_EXECUTOR_LABEL`
Post-processing	`POSTPROCESSING_LABEL`

Selecting a config at runtime

At runtime, exa-AMD reads a JSON run configuration. The key:

parsl_config — selects which registered Parsl config to use (e.g., "perlmutter_premium").

This value must match the registry name defined in the Python config, e.g.:

# https://github.com/ML-AMD/exa-amd/blob/main/parsl_configs/perlmutter.py#L154
register_parsl_config("perlmutter_premium", PerlmutterConfig)

Built-in configurations

The repository provides four registered configurations:

register_parsl_config("chicoma", ChicomaConfig)
register_parsl_config("chicoma_debug", ChicomaConfigDebug)
register_parsl_config("chicoma_debug_cpu", ChicomaConfigDebugCPU)
register_parsl_config("perlmutter_premium", PerlmutterConfig)

Select any of these by setting parsl_config accordingly in your run JSON.

Using the provided configs

For LANL Chicoma and NERSC Perlmutter:

Update ``worker_init`` in the config to load your site modules and activate your Conda environment on compute nodes.
Provide accounts and other runtime knobs (e.g., number of nodes) in your run JSON.

Additionally, update the different Slurm fields (e.g., qos, constraint, launcher).

Registering a new config

If your site differs substantially you may want to register a new Parsl configuration:

Create a file under parsl_configs/ (e.g., my_system.py).
Implement a Config subclass with five executors (using the labels above).

Register it with a unique name:

register_parsl_config("my_system", MySystemConfig)

Set parsl_config to "my_system" in your run JSON.

Important

The registry name must be unique across all registered configs in your environment.

Resource allocation & placement

Parsl’s provider/executor fields map directly to the resources you request from Slurm.

Node type

constraint: choose CPU vs GPU nodes (e.g., "cpu" or "gpu").
available_accelerators: GPUs per node (e.g., 4 on Perlmutter).

How many nodes

nodes_per_block: nodes in one Slurm allocation.
max_blocks / min_blocks / init_blocks: how many allocations Parsl may keep alive. - One multi-node allocation: nodes_per_block = N, max_blocks = 1. - Many single-node allocations: nodes_per_block = 1, max_blocks = N.

Per-node concurrency

cores_per_worker: CPUs per Parsl worker.
max_workers_per_node: limit on workers per node.

Operational

account and qos: indetical to Slurm equivalents.
walltime: job time limit.
worker_init: environment on compute nodes (e.g., modules).
scheduler_options: raw #SBATCH directives when needed.

Quick mapping to Slurm

Nodes: nodes_per_block → roughly -N.
GPUs per node: available_accelerators → akin to --gpus-per-node.
CPU threads per worker: cores_per_worker → similar to --cpus-per-task (per worker).
Multiple allocations: max_blocks > 1 → multiple Slurm jobs managed by Parsl.

What the run JSON typically controls

Common knobs provided in the run JSON (names may vary slightly by version):

Parsl selection & accounts - parsl_config: registry name of the site config (e.g., "perlmutter_premium"). - cpu_account / gpu_account: Slurm accounts for CPU/GPU executors.
Resource allocation & placement - num_workers: CPU threads per worker (used by CPU-bound phases). - pre_processing_nnodes: node count for structure generation and CGCNN. - vasp_nnodes: node count for the VASP phase.

Full working example

For a complete configuration with five labeled executors and typical Slurm settings, see the Perlmutter config in the repository:

parsl_configs/perlmutter.py

Need help?

If you are setting up a new site configuration or encountering center-specific constraints, please open a discussion or issue:

https://github.com/ML-AMD/exa-amd