6. Hello MPI

6.1. Memory

In the context of parallel computing, various memory domains are used to store and manage data.

Shared Memory:

Shared memory is accessible by all processors or threads in a parallel system. It simplifies communication and data sharing but can lead to issues like data races and requires careful synchronization.

Distributed Memory:

Distributed memory systems have separate memory spaces for each process in a parallel system. These memory spaces are not directly accessible to other processors, so communication between nodes typically involves message passing. Examples of distributed memory systems include clusters and supercomputers.

6.2. Hello World

MPI which stands for Message Passing Interface is commonly used in high-performance computing (HPC) and parallel computing environments to facilitate communication and data exchange between different processes running on multiple computing nodes in a cluster or supercomputer. It supports point-to-point communication, collective communication, and synchronization mechanisms.

Common Steps:

The following three steps are essential when using MPI in C/C++.

  1. #include <mpi.h>: Makes MPI’s declaration visible in C/C++.

  2. MPI_Init(&argc, &argv): Initializes the MPI environment. It is a crucial step before using MPI functions. argc is the pointer to the number of arguments and argv is the pointer to the vector of arguments.

  3. MPI_Finalize(): Finalizes the MPI environment, ensuring all MPI-related resources are released properly. It should be called at the end of your program to avoid resource leaks.

Rank and Communicator:

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);

  int l_rank;
  int l_comm_size;

  MPI_Comm_rank(MPI_COMM_WORLD, &l_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &l_comm_size);

  std::cout << "Process " << l_rank
            << " out of " << l_comm_size << " processes says: Hello, MPI!" << std::endl;

  MPI_Finalize();

  return 0;
}

Compiling:

When compiling MPI programs in C/C++, you typically need to use a specific wrapper that is compatible with MPI.

Open MPI: Open MPI is a widely used implementation of the MPI standard and comes with its own C/C++ compiler wrappers (mpicxx or mpic++). You can use these wrappers to compile your C/C++ MPI programs. For example:

module load mpi/openmpi/<version>
mpicxx -o <output_name> <file_name>

mpicxx is actually a compiler wrapper provided by MPI implementations. It is designed to simplify the process of compiling and linking C++ programs that use MPI for distributed computing. When you use mpicxx, it acts as a wrapper around your system’s C/C++ compiler (such as g++) and includes the necessary flags and libraries required for MPI support. You can use mpicxx --show to display the compiler and linker flags that mpicxx would use.

$ mpicxx --show

g++ -I/usr/local/include -pthread -Wl,-rpath -Wl,/usr/local/lib-Wl,\
    --enable-new-dtags -L/usr/local/lib -lmpi_cxx -lmpi
  • g++ is the underlying C/C++ compiler.

  • -I/usr/local/include specifies the include directories.

  • -pthread is used for multithreading support.

  • -Wl flags are related to linker options.

  • -L/usr/local/lib specifies the library directories.

  • -lmpi_cxx and -lmpi are the necessary MPI libraries for your program.

Executing:

To run your code after compiling:

mpirun -np <number_of_processes> ./<output_name>

Error Handling:

For error handling in C/C++ programs, one option is to use the <cassert> header, which provides the assert macro. The assert macro is a built-in debugging tool that allows you to include conditional checks in your code.

int main() {
  // ...
  assert(some_condition);
  // ...
  return 0;
}

For example during initializing a MPI:

int l_ret = MPI_Init(&argc, &argv);
assert(l_ret == MPI_SUCCESS); // Ensure MPI_Init succeeded
../_images/linearProcesses.svg

Fig. 6.2.1 Illustration of a linear arrangement of four processes. Arrows indicate communication between processes.

Task

  1. Research and provide an overview of two different MPI implementations (e.g., Intel MPI and Open MPI).

  2. Write an MPI program that prints a message for each process, indicating its rank and its neighbors.

  • Assume a linear arrangement of processes within the communicator, where each process exept for the first and last has two neighbors. An example with four processes is shown in Fig. 6.2.1.

  • For process 2 the output could be something like: Hello from process rank 2 to my right neighbor (rank 3) and my left neighbor (rank 1).

  1. Execute your program with 8 processes.

6.3. Time Measurement

When working with MPI programs, it is often essential to measure the elapsed time of specific code segments for performance analysis. MPI provides a convenient function MPI_Wtime() for this purpose.

To measure the time taken by a particular code block, follow these steps:

// MPI_Barrier(MPI_COMM_WORLD);

double start_time = MPI_Wtime();

// Code you want to measure

// Insert a barrier to synchronize processes before printing
MPI_Barrier(MPI_COMM_WORLD);

double end_time = MPI_Wtime();
double elapsed_time = end_time - start_time;

if (rank == 0) {
  std::cout << "Elapsed time: " << elapsed_time << " seconds" << std::endl;
}