OpenMP: For
The for loop construct is probably one of the most widely used features of the OpenMP. The construct’s aim is to parallelize an explicitly written for loop.
For loop construct
The syntax of the loop construct is
The parallel
construct specifies the region which should be
executed in parallel. A program without the parallel construct will be executed
sequentially.
The for
loop construct (or simply the loop construct)
specifies that the iterations of the following for loop will be executed in
parallel. The iterations are distributed across the threads that already exist.
If there is only one #pragma omp for
inside #pragma
omp parallel
we can simplify both constructs into the combined
construct.
Shared clause
The clauses
are additional options which we can set to the
constructs.
An example of a clause for parallel
construct is
shared(...)
clause. When a program encounters the
parallel
construct, the program forks a team of threads. The
variables, which are listed in the shared(...)
clause, are
then shared between all the threads.
Example
Let us write a first parallel loop with the OpenMP loop construct.
We define the vector of ints. Each thread increments its corresponding
entry in the vector. At the end, the i
-th entry of the vector
tells us how many iterations was executed by the i
-th thread.
The parallel
construct creates a team of threads which
execute in parallel. The variables iterations
and
n
are shared between all the threads.
The loop construct specifies that the for
loop should be
executed in parallel.
We use a couple of OpenMP functions. The
omp_get_max_threads()
returns an upper bound on the number of
threads that could form a new team of threads. This upper bound is valid only if
we later do not explicitly specify the number of threads in the
team. Additionally, we also used omp_get_thread_num()
which
returns the number of the calling thread.
Canonical loop form
When we compile the upper program, the compilation fails with an error. The error is
What is wrong with our program? The printings suggests that the condition of the for loop is invalid.
Well, OpenMP is able to parallelize a loop only if it has a certain structure. The structure is
The initialize
expression is of the form var =
lb
, where var
is an integer or a random access
iterator and lb
is a loop invariant.
The test
expression must have the form var operator
b
or b operator var
, where b
is
the loop invariant and operator
is one of the following
<
,<=
,>
,>=
.
The increment
expression has to be one of the following
++var
,var++
,--var
,var--
,var += incr
,var -= incr
,var = var + incr
,var = incr + var
,var = var - incr
,
where incr
is a loop invariant integer expression.
The loop, which satisfies these conditions, has the canonical loop form (defined in the OpenMP specification). You can find the conditions and the precise definition of the canonical loop form in the OpenMP specification on the page 53.
Corrections
Having the knowledge about the canonical loop form, we can correct the test in the for loop. The program now looks like
The whole source code is available here.
The output of the program is
The program used eight threads to execute the loop. OpenMP uniformly divided the iterations between all the threads – each one executed 12 or 13 iterations.
Summary
In this article, we looked at the basics of the OpenMP loop construct. In order to parallelize the for loop, it must be in the canonical loop form. We examined the definition of the canonical loop form. At the end, we parallelized a for loop with the OpenMP loop clause.
Links: