These are some of the parallel algorithm “primitives” that are discussed in the first half of the course and used to solve some parallel programming problems. Some are more generic and some are pretty specialized. There’s some ruby-esque pseudocode as well that I’ve adopted from the pseudocode in the lectures. This is all to help me prepare for the midterm exam, so if you’ve stumbled on this somehow I recommend just watching the lectures for yourself (I’ll try and include links). Those should have better explanations.
All pseudocode examples will assume the presence of a spawn
function for spawning a concurrent task and a sync
function for blocking and waiting until all spawned tasks have finished.
Contents
- reduce
- parallel for loop
- bitonic split
- bitonic merge
- bitonic sort
- parallel scan
- gather if
- parallel root finding
Reduce
Reduce, also known as fold provides a means of recursively processing a datastructure (e.g. array, list, etc.) and combining its constituents into a single result.
def reduce(A[0:n-1])
if n >= 2
a = spawn(reduce(A[0:(n/2 - 1)]))
b = spawn(reduce(A[(n/2):(n - 1)]))
sync
return a + b
else # base case is a single element array
return A[0]
- W(n): O(n)
- D(n): O(log n)
- Lecture Link: Reduce
Par-for
Par-for (i.e. for-any, for-all), or the parallel for loop, provides a means executing the iterations of a loop in parallel – provided that all iterations are independent of each other. See the OpenMP parallel loop docs. You might think the span of a par-for loop would be constant, but since there is overhead around scheduling (spawning) the tasks, it actually results in a O(log n) span. See the following pseudocode.
# if you wanted to run the following loop in parallel
par_for 0..n do |i|
foo(i)
# this really becomes something like:
par_for_impl(foo, 0, n)
def par_for_impl(func, first_idx, last_idx):
n = last_idx - first_idx + 1
if n == 1
func(first_idx)
else
m = first_idx + (n / 2)
# divide and conquer to get that log(n)
spawn par_for_impl(func, first_idx, m - 1)
par_for_impl(func, m, last_idx)
sync
- W(n): O(n)
- D(n): O(log n)
- Lecture Link: par-for
Bitonic Split
When you pair the elements of a bitonic sequence such that each min is paired with the max. In other words the smallest is paired with the largest, the second smallest paired with the second largest, and so on so forth. Then you separate the mins and maxes from each pair into two separate subsequences. This helps enable the use of bitonic sequences in divide and conquer algorithms. This site has a nice explanation as well.
def bitonic_split(A[0:n-1])
# assume n is evenly divisible by 2
par_for 0..(n/2 - 1) do |i|
a = A[i]
b = A[i + n/2]
A[i] = min(a, b)
A[i + n/2] = max(a,b)
- W(n): O(n)
- D(n): O(log n)
- Lecture Link: bitonic split
Bitonic Merge
You can use the bitonic split defined above to divide and conquer an existing bitonic sequence to end up with a sorted sequence. See the following pseudocode:
def bitonic_merge(A[0:n-1])
# assume A is a bitonic sequence
# assume n is evenly divisible by 2
if n >= 2
bitonic_split(A[:]) # split everything in A
spawn(bitonic_merge(A[0:(n/2 - 1)]))
bitonic_merge(A[(n/2):(n - 1)])
sync
- W(n): O(n)
- D(n): O(log n)
- Lecture Link: bitonic merge
Bitonic Sort
Now you can use bitonic split and bitonic merge together to generate a bitonic sequence out of an unordered sequence and sort it. This requires two modified versions of bitonic_merge – bitonic_merge_asc and bitonic_merge_desc – where one produces a positively ascending sequence and the other a negatively descending sequence.
def gen_bitonic(A[0:n-1])
# assume n is evenly divisible by 2
if n >= 2
spawn(gen_bitonic(A[0:(n/2 - 1)]))
gen_bitonic(A[(n/2):(n - 1)])
sync
spawn(bitonic_merge_asc(A[0:(n/2 - 1)]))
bitonic_merge_desc(A[(n/2):(n - 1)])
sync
def bitonic_sort(A[0:n-1])
gen_bitonic(A[:])
bitonic_merge_asc(A[:])
- W(n): O(n log² n) (not work-optimal)
- D(n): O(log³ n)
- Lecture Link: bitonic sort
Parallel Scan
Scans are a generalization of the prefix-sum operation. Parallel scans allows us to apply any associate operator across a list in parallel. See the following pseudocode for the add_scan
primitive that implements a parallel prefix-sum.
# This pseudocode is pretty rough...
# definitely recommend watching the lectures for this one
def add_scan(A[1:n])
# assume n = 2^k (power of two)
# assume 1 indexed arrays ¯\_(ツ)_/¯
if n == 1
return A[1]
else
Iodds[1:n/2] ~= [1, 3, 5, ...] # odd indices
Ievens[1:n/2] ~= [2, 4, 6, ...] # even indices
A[Ievens] = par_for 1..(n/2) { |i| A[Ievens[i]] + A[Iodds[i]] }
A[Ievens] = add_scan(A[Ievens])
A[Iodds] = par_for 2..(n/2) { |i| A[Ievens[i]] + A[Iodds[i]] }
- W(n): O(n)
- D(n): O(log² n)
- Lecture Link: parallel scans
Gather If
First construct an array of flags by applying a comparison in parallel across all elements of a list.
GatherIf takes this array of flags (trues and falses) and then uses an add scan to determine the indices of the elements that matched the condition. The following code shows how a get_smaller_equal
method might work hand-in-hand with gather_if
to return only the elements that are less than or equal to the pivot
value. This can be useful in implementing a parallel quicksort.
# again some nice 1-indexed array pseudocode :)
def get_smaller_equal(A[1:n], pivot)
F[1:n] = [] # array of {0, 1} boolean flags
F[:] = par_for { |i| A[i] <= pivot }
gather_if(A[1:n], F[1:n])
def gather_if(A[1:n], F[1:n]):
K[1:n] = [] # for indices of matching elements
K[:] = add_scan(F[:])
L[1:K[n]] = [] # output array size of largest prefix-sum
par_for 1..n do |i|
# Now, in parallel, for all true elements we put their value
# in the corresponding index of the output array
if F[i]
L[K[i]] = A[i]
return L[:]
- W(n): O(n)
- D(n): O(log² n)
- Lecture Link: gatherIf
Parallel Root Finding
This uses a the idea of jumping from tree nodes -> grandparents to be able to find the root of a tree by starting at all of its leaves in parallel.
# again some nice 1-indexed array pseudocode :)
def has_grandparent(k, P[1:n])
# k is a node
# Where P is an array indexed by k where values point
# to the parent of k
return (k != nil) && (P[k] > 0)
&& (P[P[k]] > 0)
def adopt(P[1:n], G[1:n]):
# P is again an array of parent pointers
# G will be an array of grandparent pointers
par_for 1..n do |i|
if has_grandparent(i, P[:])
G[i] = P[P[i]]
else
G[i] = P[i]
def find_roots(P[1:n], G[1:n]):
Pcur[1:n] = P[:]
Pnext[1:n] = [temp buf]
for 1..ceil(log(n)) do |level|
adopt(Pcur[:], Pnext[:])
Pcur[:] = Pnext[:]
R[:] = Pcur[:]
- W(n): O(n log n) (not work-optimal)
- D(n): O(log² n) (?)
- Lecture Link: Parallel Root Finding