Some loosely structured notes on the IHPC “Work and Span” lectures to help me prepare for the midterm. Explaining things in my own words helps me focus on the material and gives me a searchable artifact to find later. The contents of this one probably won’t be that useful for others though and there’s a lot of messily written non-LaTeX equations. I recommend checking out the official notes for this section.
Work Span Analysis
Formalism for analyzing parallel algorithms
- Total number of vertices (work) in a DAG (directed acyclic graph)
Span D(n) (D represents depth)
- Total number of vertices in the longest path through a DAG
The average available parallelism for an algorithm can be represented by W(n)/D(n).
- Tp(n) >= D(n)
The time to execute given p processors is greater than or equal to the span of the algorithm.
- Tp(n) >= ceil(W(n)/p)
The time to execute given p processors is greater than or equal to the work of the algorithm divided by the number of processors. Ceiling cause we’re dealing with integers.
- Tp(n) >= max(D(n), ceil(W(n)/p))
Both laws hold so can be combined. Simply take the max of the two! :)
The Work-Span Law gives us a lower bound for the time to execute a parallel algorithm. Brent’s Theorem helps us find the upper bound! 👋 😳 👍 The paper describing this theorem can be found here.
- Tp(n) <= (W(n) - D(n))/p + D(n)
The time to execute the DAG is no more than the time to execute the critical path (longest path – the span) plus the time to schedule the rest of the work across the p processors.
Putting it all together…
- max(D(n), ceil(W(n)/p)) <= Tp(n) <= (W(n) - D(n))/p + D(n)
Speedup & Work-Optimality
How much faster the parallel algorithm is compared to the best sequential time. Linear speedup means Tp scales linearly with the number of available processors. Note: the * in W*(n) denotes the work of the best sequential algorithm.
speedup = best sequential time / parallel time
Sp(n) = W(n) / Tp(n) >= p / (W(n)/W(n) + (p - 1)/(W*(n) / D(n)))
- W(n) / W*(n) should hopefully equal 1
If you get a highly parallel algorithm by dramatically increasing the work relative to the best sequential algorithm, this is bad for speedup.
I found this video helpful.