OpenMP: Barrier
What is a barrier?
It is a point in the execution of a program where threads wait for each other. No thread is allowed to continue until all threads in a team reach the barrier.
Basically, a barrier is a synchronization point in a program. We can visualize it with a wall.
In the figure, the red threads are waiting at the wall for the blue threads. The red threads can not go beyond the wall. They can proceed only when all threads reach the wall.
Pros and cons
The main reason for a barrier in a program is to avoid data races and to ensure the correctness of the program.
Of course there are some downsides. Each synchronization is a threat for performance. When a thread waits for other threads, it does not do any useful work and it spends valuable resources.
Another problem might occur if we are not carefully inserting barriers. As soon as one thread reaches the barrier then all threads in the team must reach the barrier. Otherwise, the threads waiting at the barrier will wait forever (except if we use a cancel construct, but this is a topic for another article).
The following figure shows how a couple of blue threads avoids the barrier. In this case, the red threads will wait forever for the blue threads.
How to add a barrier to a program?
We can explicitly insert a barrier in a program by adding the barrier construct:
This is an explicit way of adding a barrier.
There are also many other situations, where a compiler inserts a barrier instead of us. This happens because many OpenMP constructs imply a barrier. For example, the parallel construct implies a barrier in the end of the parallel region. The loop construct implies a barrier in the end of the loop. The single construct implies a barrier in the end of the single region.
However, there are also OpenMP constructs which do not imply a barrier. The master construct is such example. This construct is very similar to the single construct: the code inside the master construct is executed by only one (master) thread. But the difference is that the master construct does not imply a barrier while the single construct does.
How can we figure out which constructs imply a barrier and which do not?
We have to look at the OpenMP specification. The description of each construct contains the information about the existence of the barrier.
Avoiding the implicit barriers
A natural question that arises is: Can we omit the implicit barriers?
This depends on the constructs. Some constructs support the removal of a barrier, while the others do not support such a feature. Again, OpenMP specification can tell us if a construct supports this feature.
The loop construct supports the removal of a barrier. A programmer can then omit
the barrier by adding nowait
clause to the loop construct.
In this case the thread that finishes early proceeds straight to the next instruction and does not wait for the other threads in the team.
Using the nowait clause can improve the performance of a program. But we must be careful, because removing a barrier might introduce a data race.
An example
In the article about the single construct, we presented several programs which accumulate the salaries of all employees in two companies. The third version was the following:
Mats Brorsson commented on LinkedIn that this version suffers from oversynchronization (read as: it has too many barriers). I agree! Therefore, we now explain the problem with the program and different solutions to the problem.
The key is to notice where are the implicit barriers. They are
- in the end of the parallel region,
- in the end of the first for loop,
- in the end of the single construct and
- in the end of the second for loop.
Let us analyze each barrier.
The first barrier is in the end of the first for loop. If we omit the barrier
there, we might introduce a data race. This is because the next instruction
after the for loop accesses the reduction variable:
salaries1
. Without the barrier, one thread might access
salaries1
for printing while some other thread might still
update the value of the salaries1
. Therefore, we should not
add nowait
clause to the first for loop.
The second barrier is in the end of the single construct. In the single
construct, the program prints the value of salaries1
. This is
the last time when the program reads/writes salaries1
. The
next instructions already compute salaries2
. Because of this
independence, we can safely remove the barrier in the end of the single
construct. We can do this by inserting the nowait clause.
There is also another option. We can replace the single construct with the master construct. The master construct is very similar to the single construct. The main differences are that the master construct is executed by the master thread and that the master construct does not imply a barrier.
There are two more barriers left. They are both in the end of the parallel region. The parallel construct does not support the nowait clause. Thus, the only possibility to eliminate the barrier is in the end of the second loop. The elimination does not introduce a data race, because there exists the barrier of the parallel construct, which synchronizes the threads. Therefore, it is safe to omit the implicit barrier in the end of the second loop. Note that a compiler might do this automatically.
Summary
We studied barriers. We explained how to add a barrier to a program and how a compiler adds implicit barriers to a program. We showed how to omit an implicit barrier with the nowait clause.
In the end, we analyzed implicit barriers of an example. The valid removals of barriers might improve the efficiency of a program. Of course, we should measure it to check if this really is the case.
Thanks to Mats Brorsson for giving me the idea for this article.
Links:
- The barrier construct, OpenMP specification, page 151
- The master construct, OpenMP specification, page 148
Jaka’s Corner OpenMP series: