OpenMP specialty is a parallelization of loops. The loop construct enables the parallelization.
OpenMP then takes care about all the details of the parallelization. It creates a team of threads and distributes the iterations between the threads.
In this article, we explore how OpenMP schedules the iterations between the threads and how can we change this behavior. First, we examine different ways to specify a scheduling type of a loop. Then, we describe five different scheduling types.
We describe how a programmer can determine a scheduling type of a loop.
If the loop construct has explicit
then OpenMP uses
scheduling the iterations of the for loop.
scheduling-type (in the schedule clause of the loop
construct) is equal to
runtime then OpenMP determines the
scheduling by the internal control variable
can set this variable by setting the environment variable
OMP_SCHEDULE to the desired scheduling type. For example, in
bash-like terminals, we can do
Another way to specify
run-sched-var is to set it with
If the loop construct does not have an explicit
then OpenMP uses the default scheduling type. It is defined by the internal
def-sched-var and it is implementation
The following figure summarizes how OpenMP determines a scheduling type of a for loop.
The scheduling types
We can choose between five different scheduling types:
schedule(static, chunk-size) clause of the loop construct
specifies that the for loop has the static scheduling type. OpenMP divides the
iterations into chunks of size
chunk-size and it distributes
the chunks to threads in a circular order.
chunk-size is specified, OpenMP divides iterations
into chunks that are approximately equal in size and it distributes at most one
chunk to each thread.
Here are three examples of static scheduling.
Let me explain the examples. We parallelized a for loop with 64 iterations and we used four threads to parallelize the for loop. Each row of stars in the examples represents a thread. Each column represents an iteration.
The first example (
schedule(static)) has 16 stars in the
first row. This means that the first tread executes iterations 1, 2, 3, …, 15
and 16. The second row has 16 blanks and then 16 stars. This means that the
second thread executes iterations 17, 18, 19, …, 31, 32. Similar applies to
the threads three and four.
We see that for
schedule(static) OpenMP divides iterations
into four chunks of size 16 and it distributes them to four threads. For
schedule(static, 4) and
8) OpenMP divides iterations into chunks of
size 4 and 8, respectively.
The static scheduling type is appropriate when all iterations have the same computational cost.
schedule(dynamic, chunk-size) clause of the loop
construct specifies that the for loop has the dynamic scheduling type. OpenMP
divides the iterations into chunks of size
thread executes a chunk of iterations and then requests another chunk until
there are no more chunks available.
There is no particular order in which the chunks are distributed to the threads. The order changes each time when we execute the for loop.
If we do not specify
chunk-size, it defaults to one.
Here are four examples of dynamic scheduling.
We can see that for
schedule(dynamic, 1) OpenMP determines similar
scheduling. The size of chunks is equal to one in both instances. The
distribution of chunks between the threads is arbitrary.
schedule(dynamic, 4) and
8) OpenMP divides iterations into chunks of size four and eight,
respectively. The distribution of chunks to the threads has no pattern.
The dynamic scheduling type is appropriate when the iterations require different computational costs. This means that the iterations are poorly balanced between each other. The dynamic scheduling type has higher overhead then the static scheduling type because it dynamically distributes the iterations during the runtime.
The guided scheduling type is similar to the dynamic scheduling type. OpenMP again divides the iterations into chunks. Each thread executes a chunk of iterations and then requests another chunk until there are no more chunks available.
The difference with the dynamic scheduling type is in the size of chunks. The size of a chunk is proportional to the number of unassigned iterations divided by the number of the threads. Therefore the size of the chunks decreases.
The minimum size of a chunk is set by
determine it in the scheduling clause:
chunk-size). However, the chunk which contains the last iterations
may have smaller size than
If we do not specify
chunk-size, it defaults to one.
Here are four examples of the guided scheduling.
We can see that the size of the chunks is decreasing. First chunk has always 16
iterations. This is because the for loop has 64 iterations and we use 4 threads
to parallelize the for loop. If we divide
64 / 4, we get 16.
We can also see that the minimum chunk size is determined in the schedule clause. The only exception is the last chunk. Its size might be lower then the prescribed minimum size.
The guided scheduling type is appropriate when the iterations are poorly balanced between each other. The initial chunks are larger, because they reduce overhead. The smaller chunks fills the schedule towards the end of the computation and improve load balancing. This scheduling type is especially appropriate when poor load balancing occurs toward the end of the computation.
The auto scheduling type delegates the decision of the scheduling to the compiler and/or runtime system.
In the following example, the compiler/system determined the static scheduling.
The runtime scheduling type defers the decision about the scheduling until the
runtime. We already described different ways of specifying the scheduling type in
this case. One option is with the environment variable
OMP_SCHEDULE and the other option is with the function
If we do not specify the scheduling type in a for loop
OpenMP uses the default scheduling type (defined by the internal control
If I do not specify the scheduling type in my machine, I get the following result.
We recognize that the default scheduling type in my machine is the static scheduling type.
We learned different ways to specify a scheduling type of a loop. We also described five different scheduling types.
- Source code of all presented examples
- More information about the scheduling is available in the OpenMP specification on the page 60.
Jaka’s Corner OpenMP series: