Tutorial

This tutorial demonstrates how to build and run exa-PD on both the GPU and CPU partitions of the Perlmutter supercomputer to construct the phase diagram for the Cu-Zr system.

1. Clone the Repository

Start by cloning the exa-PD repository:

git clone https://github.com/ML-AMD/exa-pd.git
cd exa-pd

2. Install Dependencies

If you use Conda to manage Python packages, you may create a conda environment to install the required packages using the exa_pd_env.yml file we provide:

conda env create -f exa_pd_env.yml
conda activate exa_pd_env.yml
# from the repo root:
python run.py --help

3. Prepare the Data and LAMMPS Setup

Ensure you have a working LAMMPS installation. If you will be using a neural network potential (NNP), make sure that the necessary package supporting the NNP is installed in LAMMPS. For example, if you use a NNP trained by DeepMD-kit, you will need to install DeepMD-kit and have LAMMPS compiled with USER-DP package enabled. You can also use the pre-compiled lmp executable that ususally comes with the DeepMD-kit installation.

Next, copy the folder that contains the crystal structures of the Cu-Zr solid phases:

cp -r example/phases/ ./

3. Prepare the Parsl Configuration

Parsl configurations must be placed inside the parsl_configs/ directory so that they can be automatically discovered by exa-PD at runtime.

Start by copying the default Perlmutter configuration:

cp parsl_configs/perlmutter.py parsl_configs/my_perlmutter.py

Then edit my_perlmutter.py and config_registry.py:

a. Change the registration name

At the top of config_registry.py, update CONFIG_REGISTRY to reflect the new config name. This value have to match the value you will set in your JSON config file (the run field).

# Before:
CONFIG_REGISTRY = {
 "perlmutter": "parsl_configs.perlmutter.PerlmutterConfig",
}

# After:
CONFIG_REGISTRY = {
 "my_perlmutter": "parsl_configs.my_perlmutter.PerlmutterConfig",
}

b. Update each executor

The Perlmutter configuration defines two separate executors: one that runs on GPU nodes and the other on CPU nodes

For each executor, update the following fields in the SlurmProvider:

max_blocks: The maximum number of Slurm jobs that Parsl is allowed to create for that provider. It is not a lifetime cap, it’s a concurrent cap. It only limits how many blocks (Slurm jobs) can exist at the same time. If one job finishes, Parsl is free to submit another job, as long as the total number of active blocks never exceeds max_blocks.
wall_time: It specifies the maximum run time requested for each Slurm job allocation (block). It directly maps to Slurm’s –time option.

5. Prepare the JSON Input File

Copy the default input file:

cp configs/input.json configs/my_input.json

Edit the following fields in my_input.json:

dir: Path to the root directory of the project for running the calculations. Default is the current directory.
pair_coeff: The pair coeff associated with the pair style in LAMMPS syntax. The path to the potential file should be changed to the absolute path.
ngpu: The number of nodes required for each GPU slurm job submitted by Parsl. Default is 1.
ncpu: The number of nodes required for each CPU slurm job submitted by Parsl. Default is 1.
gpu_account: The allocation or account name to be used when submitting GPU jobs (e.g., “m1234”).
cpu_account: The allocation or account name to be used when submitting CPU jobs (e.g., “m1234”).
gpu_exe: The executable command or absolute path to run LAMMPS on GPU resources.
cpu_exe: The executable command or absolute path to run LAMMPS on CPU resources.
parsl_config: The Parsl configuration profile that specifies how jobs are launched and resources are allocated. This value have to match the value in config_registry.py (e.g., “my_perlmutter”)
qos: The quality-of-service (QOS) flag to control scheduling priority (e.g., “regular”, “premium”).
data_in: Input data file for the liquid structure in the atom style of the LAMMPS data format.
ref_pair_style and ref_pair_ceoff: The pair style and coefficient defining the reference system.

6. Run the Workflow

Run the full exa-PD workflow from a login node of Perlmutter:

export PYTHONPATH=$(pwd):$PYTHONPATH
python run.py --config configs/my_input.json