AFIR-web Support Site

How to run GRRM17 in multiple nodes using MPI (with Gaussian 09)

GRRM17 will work on distributed-memory multiprocessor systems, such as workstation clusters, and Message Passing Interface (MPI) is used to communicate between the compute nodes. A parallelized ADDF, ReStruct, ReEnergy, RePath, MC-AFIR, or SC-AFIR calculation (with Gaussian 03/09/16) can also be launched via "mpirun" as

$ mpirun –n N –machine XXX.machine GRRM17p XXX –mpi -sT –wDirectoryName        ,

where N is the total number of processes (1 master and (N − 1) children processes to be spawned, N ≤ 257), XXX is the input file name (without .com), and T is the time (in second) until the following Save&Shutdown starts. The –mpi keyword is required to run GRRM17p with MPI.

Do not confuse the argument "-n N" with the total number of processors (CPU cores) to be used. N is the total number of GRRM17p processes! If you want to launch a 16-threaded GRRM calculation and to assign 4 CPU cores to each Gaussian job, N should be set to 17 and the option GauProc=4 must be provided in XXX.com. In this case, 64 processors will be used in total.

The machinefile must be provided by "–machine XXX.machine". The machinefile tells the GRRM17p master process how many threads to be assigned in each compute node; in case a calculation with N=17 and GauProc=4 is executed in 4 compute nodes (node1, node2, node3, and node4) including 16 cores each, the following machinefile is required.

node1
node1
node1
node1
node1
node2
node2
node2
node2
node3
node3
node3
node3
node4
node4
node4
node4

In this case, the master and 4 children processes will run in node1, but the master process only little occupy the CPU-core. If you don't provide the machine file, all the 16 children processes (16 Gaussian calculations requesting 4 nodes in each) will run in a single node. This will considerably slow the calculation down.

The argument –wDirectoryName (e.g. -w/scr, -w/usr/tmp, etc.) is optional and it specifies the scratch directory (intermediate file repository) in each compute node. The scratch directory should be a local (not shared) directory. The working directory (main file repository) must be accessible to all of the compute nodes used, and all files will be generated in the working directory without –wDirectoryName keyword (this may cause response of the system slow). If –wDirectoryName is applied, all intermediate files will be generated in the directory, DirectoryName. When the job is completed or terminated by the Save&Shutdown procedure, all the intermediate files generated in DirectoryName are moved to the working directory.

Manual