The for loop construct is probably one of the most widely used features of the OpenMP. The construct’s aim is to parallelize an explicitly written for loop.
For loop construct
The syntax of the loop construct is
parallel construct specifies the region which should be
executed in parallel. A program without the parallel construct will be executed
for loop construct (or simply the loop construct)
specifies that the iterations of the following for loop will be executed in
parallel. The iterations are distributed across the threads that already exist.
If there is only one
#pragma omp for inside
omp parallel we can simplify both constructs into the combined
clauses are additional options which we can set to the
An example of a clause for
parallel construct is
shared(...) clause. When a program encounters the
parallel construct, the program forks a team of threads. The
variables, which are listed in the
shared(...) clause, are
then shared between all the threads.
Let us write a first parallel loop with the OpenMP loop construct.
We define the vector of ints. Each thread increments its corresponding
entry in the vector. At the end, the
i-th entry of the vector
tells us how many iterations was executed by the
parallel construct creates a team of threads which
execute in parallel. The variables
n are shared between all the threads.
The loop construct specifies that the
for loop should be
executed in parallel.
We use a couple of OpenMP functions. The
omp_get_max_threads() returns an upper bound on the number of
threads that could form a new team of threads. This upper bound is valid only if
we later do not explicitly specify the number of threads in the
team. Additionally, we also used
returns the number of the calling thread.
Canonical loop form
When we compile the upper program, the compilation fails with an error. The error is
What is wrong with our program? The printings suggests that the condition of the for loop is invalid.
Well, OpenMP is able to parallelize a loop only if it has a certain structure. The structure is
initialize expression is of the form
var is an integer or a random access
lb is a loop invariant.
test expression must have the form
b operator var, where
the loop invariant and
operator is one of the following
increment expression has to be one of the following
var += incr,
var -= incr,
var = var + incr,
var = incr + var,
var = var - incr,
incr is a loop invariant integer expression.
The loop, which satisfies these conditions, has the canonical loop form (defined in the OpenMP specification). You can find the conditions and the precise definition of the canonical loop form in the OpenMP specification on the page 53.
Having the knowledge about the canonical loop form, we can correct the test in the for loop. The program now looks like
The whole source code is available here.
The output of the program is
The program used eight threads to execute the loop. OpenMP uniformly divided the iterations between all the threads – each one executed 12 or 13 iterations.
In this article, we looked at the basics of the OpenMP loop construct. In order to parallelize the for loop, it must be in the canonical loop form. We examined the definition of the canonical loop form. At the end, we parallelized a for loop with the OpenMP loop clause.