OpenMP: Cancel and cancellation points
In the previous article, we explained the OpenMP cancellation model.
This article has three parts. First, we summarize the model. Afterwards, we explain the OpenMP constructs for the cancellation. And in the end, we demonstrate the usage of the constructs with an example.
Thread cancellation model
Let us assume that we have a parallel loop. We would like to cancel this loop early. How can we do it?
For a sequential loop, we would use something similar to the break statement. But the parallel case is more complicated. However, OpenMP provides the cancellation model. We can use this model for canceling parallel loops.
OpenMP has a cancel construct which we can put in a parallel loop. This construct imitates the “parallel break statement”.
When a thread reaches the cancel construct, the thread activates the cancellation and exits the loop.
The other threads check if the cancellation is activated at the cancellation points. If the cancellation is activated then the other threads also exit the loop. Otherwise, they continue.
Let us illustrate this model with an example.
Example
The following figure presents the outline of the example.
There are three threads: the blue, the green and the brown thread. They
parallelize the loop. They may encounter CP
, which stands for
a cancellation point, or BR
, which stands for the “parallel
break statement” (the cancel construct).
The cancellation is not activated at the beginning of the loop. We illustrate this fact on the right side, with the answer to the question: “Should thread break?”.
The green thread first encounters a cancellation point. Since the cancellation is not activated, the thread continues executing the loop.
Then, the blue thread reaches the “parallel break statement”. The thread activates the cancellation and exits the loop.
Afterwards, the brown thread reaches a cancellation point. Since the cancellation is activated, the thread exits the loop.
Finally, the green thread reaches its second cancellation point. The thread exits the loop, because the cancellation is activated.
If you are interested in a more detailed description of the cancellation model, look at this article.
Cancel and cancellation points
Now, we are familiar with the OpenMP cancellation model. The next step is to use this model in our programs.
In order to use it, we have to do the following steps:
- We have to enable the cancellation features of OpenMP.
- We have to activate the cancellation.
- And we have to set up the cancellation points.
We describe each of these steps in the next sections.
Enabling the cancellation
The cancellation of the parallel loops is an expensive operation. OpenMP performs it only if the environmental variable
is set to true
. We can check the value of the
OMP_CANCELLATION
with the
function.
The first time I wanted to use the cancellation features of OpenMP, I forgot to
set the OMP_CANCELLATION
. The program was still correct but the
performance was below expectation. You should learn from my mistake and do not
forget to set the environmental variable.
Activate the cancellation
The cancel construct activates the cancellation. In the next example
the cancel construct activates the cancellation of the for loop.
Again, OpenMP ignores the cancel constructs if the
OMP_CANCELLATION
is not set to true
.
Setting up the cancellation points
OpenMP sets the cancellation points at the following locations
cancel
constructs- implicit and explicit barriers
- and
cancellation points
.
Since OpenMP sets a cancellation point at the cancel
constructs, a thread which activates the cancellation also immediately breaks out
of the loop.
OpenMP also sets the cancellation points at the implicit and explicit barriers. We explained how OpenMP sets these barriers in this article.
We can also explicitly set the cancellation points. If we activated the cancellation of the for loop with
we can set the cancellation point for this region with
Again, OpenMP ignores the cancellation points if
OMP_CANCELLATION
is not set to true
.
Because a compiler sets some cancellation points automatically, we can still use the cancellation model without setting them explicitly. However, we might significantly improve the performance of a program if we do set them.
Disclaimer
In the examples above, we assumed that we are canceling a loop. Therefore, we
always had the for
option in the cancel construct and in the
cancellation point construct.
But there are other possibilities. We could also cancel
- parallel region,
- sections or
taskgroup
.
If we cancel a parallel region, then the constructs would be
and
Example: has a matrix a zero entry?
In one of the previous articles, we wrote a parallel program which checks if a matrix has a zero entry. However, there was an idea to improve the program by using the break statement.
Let us implement this idea. We start by writing the sequential version of the program. Then we parallelize it.
Sequential version
The idea of this solution is to “break” out of the loops as soon as we know that a zero entry exists. The sequential solution is then:
The program loops through each row and column of a matrix. If an entry of the matrix is equal to zero, the program exits the loops. Since the first break statement only breaks out of the inner loop, we must also include the second break statement, which breaks out of the outer loop.
The results is stored in has_zero
variable. If the matrix has
a zero entry, then has_zero
is true
,
otherwise has_zero
is false
.
OK, now let us parallelize this program.
Parallel version
The idea of this version is to parallelize the outer most loop and to use the cancel construct to break out of the loop. The code looks like this:
The parallel region has two shared variables: matrix
and
has_zero
.
The matrix is read only variable, therefore it does not introduce any problems.
However, multiple threads might modify has_zero
. In order to
avoid data races, we use the critical
construct.
If an entry is equal to zero, then a thread sets has_zero
to
true
, activates the cancellation and exits the loop. This is
similar to the break statement in the sequential version.
OpenMP also inserts two cancellation points. One is at the cancel construct and the other one is in the end of the for loop.
Improvement
The previous version is correct, but we might improve its performance by adding some cancellation points. However, if a thread encounters them too often, the performance might be worse.
With this in mind, we do not add a cancellation point in the inner loop. In this case, a thread would have to check it after accessing each entry of the matrix.
Therefore, we add a cancellation point in the end of the outer loop. In this case, a thread checks the cancellation point after processing the entire row of the matrix.
The final code looks like this:
The source code of all three versions is available here.
Comparisons
We compare the execution time of the sequential implementation, the implementation without explicit cancellation points and the implementation with a cancellation point.
The size of the matrix is 8000 x 8000
. Each entry in the
matrix is equal to one, except the entry at the position (7000,
10)
, which is equal to zero.
The first comparison gives us the following results:
In this experiment, we forgot to set OMP_CANCELLATION
to
true
. The execution times are quite similar but still
different enough that they might confuse us :-).
Now, we do the comparison the right way:
We can see that the parallel implementation without explicit cancellation points is faster than the sequential version. But the parallel implementation which has the additional cancellation point heavily outperforms the other two implementations.
Confession
I have a confession to make. In the previous example I chose such a matrix that the parallel implementations outperformed the sequential one.
If we choose a different matrix, we get different results.
For example, we get the following results for a matrix of the size 8000
x 8000
with only one zero entry, which is at the position (0,
100)
:
For this matrix, the sequential version is the fastest.
Conclusion
The performance of the paralellization heavily depends on the input.
This means that we should analyze the performance for expected sets of inputs before we choose a final implementation.
Summary
In this article, we first summarized the cancellation model. Then, we explained
the two main constructs of the model: cancel
and
cancellation point
. In the end, we demonstrated the usage of
the constructs with an example. We parallelized a program which checks if a
matrix has a zero entry.
The parallel implementation does not always outperform the sequential one. We should analyze how the performance behaves for the desired sets of inputs. Then we can decide if it is worth using the parallel implementation.
Links:
- Source code
- The cancel construct (OpenMP API, page 172)
- The cancellation point construct (OpenMP API, page 176)
Jaka’s Corner OpenMP series:
- OpenMP: Introduction
- OpenMP: For
- OpenMP: Sections
- OpenMP: Monte Carlo method for Pi
- OpenMP: For & Reduction
- OpenMP: For & Scheduling
- OpenMP: Data-Sharing Rules
- OpenMP: default(none) and const variables
- OpenMP: Barrier
- OpenMP: OpenMP: Critical construct and zero matrix entries
- OpenMP: Thread cancellation model