12. Performance Metrics

12.1. Amdahl’s Law

Amdahl’s Law predicts the speedup of a computation when only a fraction of the computation can be parallelized. The law is named after Gene Amdahl, who introduced it in 1967:

\[S = \frac{1}{{(1 - \alpha) + \frac{\alpha}{p}}}\]
Where:
  • \(S\) is the speedup of the parallelized computation,

  • \(\alpha\) is the fraction of the computation that can be parallelized,

  • \(p\) is the number of processors.

Task

  1. Assume a task for optimizing a complex computational workflow that consists of three stages (A, B, and C). Each stage has a different degree of parallelizability, and the goal is to determine the overall speedup of the entire workflow when executed on different numbers of processors (pp).

  • \(\alpha_A = 0.6\) (60% parallelizable)

  • \(\alpha_B = 0.8\) (80% parallelizable)

  • \(\alpha_C = 0.5\) (50% parallelizable)

Calculate the overall speedup of the workflow when executed on 2, 4, 8, and 16 and 64 processors. (No code is required).

  1. Illustrate the speedup using Amdahl’s law in a plot with varying degrees of parallelism and a range of resources:

  • Consider parallelizing 25%, 50%, 75%, and 90% of the tasks.

  • Vary the number of resources from 1 to 1024.

Discuss your graph in maximum one paragraph (approximately 3 lines).

12.2. Gustafson’s Law

Gustafson’s Law emphasizes scalability, stating that as the problem size increases, the parallelizable portion grows, allowing better scaling with more resources. Gustafson’s Law introduces the idea that we can adjust the problem size to utilize larger parallel systems. While Amdahl’s Law highlights limitations in fixed-sized problems. This law is named after John Gustafson, who introduced it in 1988.

In mathematical terms, Gustafson’s Law can be expressed as:

\[T_p = (1 - \alpha) + \alpha\]
\[T_1(p) = (1 - \alpha) + \alpha \cdot p\]
\[S(p) = \frac{T_1(p)}{T_p} = \frac{1 - \alpha + \alpha \cdot p}{1}\]

where:

  • \(T_p\) is the execution time for parallel processing with \(p > 1\) worker,

  • \(T_1(p)\) is the execution time on one worker,

  • \(\alpha\) is the fraction of the problem that can be parallelized.

  • \(S(p)\) is the speedup with \(p\) workers

Additionally:

\[E(p) = \frac{S(p)}{p} = \frac{1 - \alpha}{p} + \alpha\]

This expression represents the parallel efficiency \(E(p)\) and is defined in terms of speedup \(S(p)\) and the number of resources \(p\). It shows that as the number of resources approaches infinity, the efficiency goes towards the parallel fraction \(\alpha\).

Task

Assume you are optimizing a large scale simulation tool which is modeling weather predictions. The current one worker version of the model takes 12 seconds to simulate one day of weather for a specific geographic region. The model involves solving complex fluid dynamics equations and has a parallelizable fraction \(\alpha\) of 0.8, indicating that 80% of the computation can be parallelized.

  1. Calculate Gustafson’s speedup achieved by parallelizing the weather simulation model when running it on 64 resources. (No code is required).

  2. Calculate Gustafson’s efficiency of a parallelized model using the same fixed values. (No code is required).

  3. Given a supercomputer with an increasing number of resources \(p = 1, 2, 4, ..., 1024\), analyze the behavior of speedup and efficiency. Discuss your graph in maximum one paragraph (approximately 3 lines).

  4. Consider the effects of the parallelizable fraction \(\alpha\) on overall performance. Discuss how changes in \(\alpha\) might impact the optimal number of resources for achieving maximum speedup and efficiency.

12.3. Performance Model

Consider a parallel computing system where the execution time \(T_p\) on \(p\) resources is influenced by various factors, such as computations \(T^p_\text{comp}\), communication \(T^p_\text{comm}\), and idle time \(T^p_\text{idle}\). The overall execution time on \(p\) resources \(T_p\) is the sum of these three components.

The overall execution time \(T_p\) can be expressed as a function \(f\) of parameters like the problem size \(N\), the number of resources \(p\), and potentially other factors:

\[T \approx f(N, p, \ldots)\]

Now, let’s break down the execution time on a single resource \(p\) into three components.

\[T_p = T^p_\text{comp} + T^p_\text{comm} + T^p_\text{idle}\]

Weather simulation involves solving complex fluid dynamics equations, which is computationally intensive.

\[T^p_\text{comp} = k \cdot \frac{N}{p}\]
\[T^p_\text{comm} = c \cdot \frac{N}{p}\]
\[T^p_\text{idle} = \frac{m}{p}\]

Where:

  • \(N\) is the problem size.

  • \(p\) is the number of resources.

  • \(k\) is a factor representing the efficiency of parallelization.

  • \(c\) is a constant term representing communication overhead.

  • \(m\) is a factor representing synchronization delays.

Task

  1. Consider a performance model with following values and plot \(T^p_\text{comp}\), \(T^p_\text{comm}\), \(T^p_\text{idle}\) and \(T_p\) as a function of increasing number of resources \(p = 1, 2, 4, ..., 1024\).

  • \(k\) = \(0.8\)

  • \(c\) = \(5\)

  • \(m\) = \(0.5\)

  • The problem size is a mesh with \(N^3\) cells for compute time and for the communication time the problem is each surface of this mesh which is \(6 \times N^2\), where \(N = 5000\).

  1. Report your observations in maximum two paragraphs, including the following points:

  • Which time component is the most significant factor in the overall execution time?

  • What is the relation between the compute and communication time in a mesh problem, adn what is the relation between these values and the problem size?

Warning

For all the exercises, along with your plots, codes, and report file submit your non-coding solutions separately and step-by-step for each section in a readable format, either handwritten or typed.