When we write parallel/concurrent programs, we start with a sequential version. The next step is to completely change the program by introducing special parts of code, which make the program parallel. OpenMP helps us to write programs which doesn’t require a lot of modification to the original sequential version. If the compiler does not support OpenMP, the program ideally still runs in a sequential manner.

OpenMP

OpenMP is a programming model for parallel programming with a shared memory. It is a specification / API. The implementers of the compilers look at the specification and they implement it. Therefore, the compilers know how to compile a program which uses OpenMP.

In order to enable OpenMP for the C++ compiler, we must add the flag -fopenmp to the other compilation flags. For the GNU compiler, it would look like this

g++ program.cpp -fopenmp -o program

Parallelization

The multithreading in OpenMP programs is supported by the so-called fork-join programming model. The idea is that at the beginning, we have one initial thread. When the initial thread encounters OpenMP parallel construct, the thread creates (forks) a team of threads, which runs in parallel. At the end of the parallel construct, the team is joined and only the initial thread continues.

Fork-Join

Programming in OpenMP

There are three ways to use OpenMP functionalities:

  • OpenMP API provides a set of functions. These are just like the ordinary C/C++ functions. They all start with omp_. An example of a function is omp_get_thread_num(), which returns the identification number of the current thread.

  • Second way to use the OpenMP is via the environmental variables. They all start with OMP_. An example is OMP_NUM_THREADS, which sets the number of threads the program can use.

  • The last way is to use the so-called pragmas. They all start with #pragma omp. An example is

#pragma omp parallel
{ 
    // structured block
}
  

which starts parallel execution of the structured block.

First OpenMP program

The first program creates a parallel region.

#include <iostream>
#include <omp.h>


int main ()  
{
    
#pragma omp parallel 
    {
        std::cout << "Thread = " << omp_get_thread_num()
                  << std::endl;


        if (0 == omp_get_thread_num()) 
        {
            std::cout << "Number of threads = "<< omp_get_num_threads()
                      << std::endl;
        }

    }
    
    return 0;
}

Since we use the function from the OpenMP specification, we must include the header <omp.h>. The #pragma omp parallel creates a team of threads. Then, the program prints the thread number of each thread. If the thread number is equal to zero, the program prints the number of threads in the team. The compiler chooses the number of threads for us if we do not specify it.

The output of the program might look like:

$ ./openmpStart 
Thread = 6
Thread = 2
Thread = 4
Thread = 0
Number of threads = 8
Thread = 7
Thread = 1
Thread = 5
Thread = 3

or

$ ./openmpStart 
Thread = Thread = Thread = 1
Thread = Thread = 4Thread = 3
0
Number of threads = 8
Thread = 7
Thread = 2
6
5

We have a data race. The OpenMP does not automatically figure out that multiple threads access std::cout. It is a responsibility of the programmer to state this in the code.

Setting the number of threads

We can manually set the number of threads via the environment variable. For example, in bash-like terminal, we can do

$ export OMP_NUM_THREADS=4

and then the output of the program might look like

$ ./openmpStart 
Thread = 0
Number of threads = 4
Thread = 3
Thread = 2
Thread = 1

Summary

In this article, we looked at the basics of the OpenMP specification. With the help of the OpenMP, we wrote a simple multithreaded program.

Links: