Data race and mutex
Data race
The problem, which we had in the last article
(arguments.cpp
), was the messy output of the program. Namely,
there were two sentences in one line and also some blank lines appeared:
You might experience different outputs when you run the program on your computer. This problem is so typical for concurrent programming that it has its own name: data race. A data race happens when two threads try to change a resource at the same time. They are racing which one will change the resource first. The end result depends on the winner of the race.
In arguments.cpp
, threads t1
and
t2
are racing for the resource
std::cout
. The threads change internal state of
std::cout
– the data which will be printed to the standard
output.
One might argue that this is not a problematic thing: the program still prints
everything on the screen, just the ordering is a bit off. But the data race can
be more dangerous than “messy printing”. Let’s look at the
dataRace.cpp
.
The function countEven
gets a vector of ints and a reference
numEven
which represents the number of odd numbers so
far. The function loops through the vector and for each odd entry increments
the numEven
. The main function constructs two
vectors of ints and initializes numEven
to zero. Then for
each vector, the countEven
function concurrently counts the number
of odd entries.
Let’s test the program. Here are two of my several experiments:
Nooo, the results are different! This happens because of a data race. Two threads
are racing for the numEven
and when both are tying to
increment it at the same time, strange stuff happens and the outcome is
unpredictable. This is much worse than the example from the previous article.
There are several solutions. We could count separately in every thread and then sum results together in the main thread. But if we would still like to count with one variable, we must use a mutex.
The solution: mutex
The mutex stands for mutual exclusion and puts a restriction on access of a resource. Before accessing the resource, you lock the mutex, and after you are done accessing, you unlock the mutex. A thread, which wants to lock a mutex, has two possibilities
- If the mutex is not locked, the thread locks it.
- If the mutex is locked, the thread waits until the mutex is unlocked and then locks it.
A resource, which is guarded by a mutex, can not be a reason for a data race. Only one thread at a time can access the resource, all the other threads must wait for the unlock of the mutex.
Let’s make dataRace.cpp
data race free with a mutex:
In order to use mutexes, we must include them into the program with
#include <mutex>
statement. We declare a global mutex, lock
it before incrementing and unlock it afterwards. The main function is
unchanged. This makes our program data race free. For an exercise, you can make
the arguments.cpp
from the previous article data race free.
Usually, we don’t use global mutexes, but we group them and the protected resource together in a class. This also makes the relation between the mutex and the resource more apparent.
A mutex looks like a very useful tool in concurrent programming. But, it is also a source of problems. In the next article, we will explore common problems associated with mutexes.
Summary
We defined and explored data races. One possible solution for data races are mutexes.
In the next article, we will look more deeply into mutexes.
Links:
-
Source code: dataRace.cpp