[an error occurred while processing this directive]

OpenMP Programs

Contents:
  1. First Naive Approach
  2. Choosing Threads
  3. Memory Locations
  4. Further Hints

First Naive Approach

OpenMP (Open Multi-Processing) is a standardized application programming interface (API) that supports shared memory programming in C, C++, and Fortran. Here we focus on C and start with a small example program omp_1.c. We compile it with an additional option -fopenmp (compiler specific, for icc use -qopenmp).

gcc -g -O0 -fopenmp omp_1.c -o omp_1

When looking into the code, you will find lines starting with
#ifdef _OPENMP
When compiling with an OpenMP enabling compilation flag, in our case -fopenmp, the macro definition -D_OPENMP is set automatically. Thus the above given conditional preprocessor directive #ifdef succeeds and subsequent code is included, otherwise the code after #else, if present, is used.

Encapsulating references to OpenMP runtime functions, like omp_get_thread_num(), with this preprocessor directive is strongly recommended. This allows to use and compile the same code without OpenMP support without errors. When debugging a parallel code, it is very helpful to be able to check, whether the code produces wrong results without parallelization, or not.

After invoking Totalview we stop add the Startup Parameters and use the possibility to set environment variables. In our case, OMP_NUM_THREADS specifies the number of OpenMP threads to be used. We may be back and change the value during our experiments or just set it to 8.

Now we set a breakpoint (click line number 17) and select Go from the toolbar.

As expected, Process 1 is at the breakpoint and the associated Thread 1 as well. So far, the program runs single threaded, that is without parallelization.

Now we define the next breakpoint at line 28 within a parallel region and select Go.

As expected, Process 1 is at the breakpoint and the root window shows 8 threads. But the status bar in the main window informs us, that Thread 1 is Stopped and did not reach the breakpoint. We may select Threads in the Action Points, Threads pane and recognize, that only thread 8 is marked with a red B, indicating that only thread 8 has reached the breakpoint. You may select Go again and may see assembler code displayed for Thread 1 and you may change from thread to thread (T- and T+ button) and be back to the source code with the corresponding thread.

In my case this is Thread 2 as shown above. We may set a second breakpoint in the omp single region (see left), which should only be executed by one of the threads.

After selecting Go we end with a Segmentation violation for Thread 8 at source code beyond our breakpoint (see below).

At the indicated line we may dive for variable i, select the View tab, select Show Across, and then Threads. This opens the possibility to see the values for i across all threads. As i is a private variable, each thread owns its own copy and the values for i are all different. Each thread should work on a different range of the loop running from 0 to iter=100.

Everything seems to be as it should, except for the segmentation violation.

We better click on Restart in the toolbar and start over.

Choosing Threads

Back at the breakpoint in the parallel region at line 28 we may select the thread to be moved in the toolbar. For example we may select Thread 1.1. This thread was in my case just Stopped but did not reach the breakpoint. After selecting it, we may press Go in the toolbar and recognize in the Action Points, Threads pane, focussed on Threads that now 2 threads are stopped at the breakpoint - 2 are marked by a red B.

Go, Next, Step, Out in the toolbar act on the compound selected. Either a group, a process or a thread.

Instead of selecting a compound we may as well work with the tabs Group, Process and Thread.

Just change with T+ on the Action Points, Threads pane to the next thread. Now you may use the toolbar to move it or the Thread tab.

Back to the code. We select thread 2 after having reached the breakpoint and Go. The thread will keep on running quite a while and after having lost patience we may press Halt in the toolbar. The displayed assembler code is not very helpfull, but we may step back to the source code moving downwards in the Stack Trace pane.

Ok what happened. We run thread 2. It executed the single region and nowait kept it running. Then we continued to the omp for loop. At the end of a typical omp for loop is a barrier for all threads. And in this barrier we have caught thread 2. It will never finish, as all other threads have to reach that barrier as well. But they are stopped and therefore will never reach it without interaction.

By the way: Dive into i. It is equal to 100. And if we let the Group Go, then the program will finish without an error in the elements of c. Why? Because thread 2 has done all the iterations in the omp for loop itself. The way of parallelising that loop is determined by schedule(dynamik,chunk), which gives the 1 thread asking for work chunk increments, then the 2nd thread and so on. And if there is only thread 2 asking, it will do everything. The other threads were stopped and could not ask.

But now, we may reason, that there is a problem depending on the execution order of the omp for loop. So we delete all breakpoints and set a new one at line 42. And back again with a Restart. Again, one thread has reached the breakpoint, the others are stopped somewhere close to it. Setting a breakpoint at line 45 within the loop thus would not help, because all threads won't pass the breakpoint at line 42. But if we move to the Action Points, Thread pane and click in Action Points on the read breakpoint, it will turn pale and is now deactivated. The same change we observe in the source code.

Now we may dive into variable i and array count. Variable i is different to all threads, thus View, Show Across and Threads will show all values. Array count is shared to all threads, so nothing to be done there.

Now we learn a new trick. Again, only one thread has reached the break point. But, if we delete it (click on line 45 again) and set it anew - ooh, now all threads are caught at the breakpoint (see Threads tab). Now we may let go some or all threads and see in the open Variable Windows changing values marked yellow. Especially the values for count are a bit strange and now we recognize the reason for a possible segmentation violation. Array count has only MAXTIDS=4 elements and we work with 8 threads. Enlarge MAXTIDS and everything will be fine.

Top

Memory Locations

To practice further, we switch to omp_2.c. We can even run it on the command line without problems but wrong results. The final Confirmation of Scheduling should indicate, which thread worked one which parts of a loop. But instead of different entries, all values are the same and even worse, they are different, when we repeat.

We start the program, define the number of threads we want to work with (set OMP_NUM_THREADS as Startup Parameter) and set a breakpoint at line 36. We now know how to dive into variable i and watch its values across threads and to dive into array a.

First we notice, that, as expected, the loop length N=100 has been divided by the number of threads (=8) and the threads work on increasing i with increasing thread number. Delete the breakpoint and set it again and Go (trick above), will show you, that all i have been changed (yellow) and several elements of a. Each thread has changed a single value within its range. But all elements are given the same value, which should be tid, the id of each thread.

We dive into tid across all threads. Next we click on the small picture and select Address and recognize, that not only all values are the same but the addresses as well. Thus tid occupies shared and not private memory and that is indeed the error in this small program.
Top

Further Hints

In the root window you may double click (left), or right click and select, on a thread. This will open a new process window for this thread. So you may have one process window per thread.

You can insert an evaluation point instead of a breakpoint and specify a thread specific stop condition like:

if ($tid == 4) $stop
Top