Debugging with gdb

Let's move on to the file simple2.c to illustrate the usage of gdb.

Again this program produces a Segmentation fault. After compilation with -g we start debugging with gdb ./simple2 followed by run 100. This reveals

0x0000000000400637 in add_it (n=1093664768, squares=0x7fffffffdb30, 
    sum=0x7fffffffdb2c) at simple2.c:11
11                 *sum+=squares[i];

The fault happens in add_it. To see how and where this function is called, we may use

(gdb) backtrace
#0  0x0000000000400637 in add_it (n=1093664768, squares=0x7fffffffdb30, 
    sum=0x7fffffffdb2c) at simple2.c:11
#1  0x00000000004006c5 in main (argc=2, argv=0x7fffffffdc48) at simple2.c:24

add_it is called at line 24. A bit strange is the value of n passed to add_it, which instead should be equal to nmax=100.

To inspect whether nmax is correct in main we want to rerun the program and stop the execution the line before add_it is called and control the value of nmax. We set a breakpoint at line 23 followed by a restart of the execution:

(gdb) break 23
(gdb) run 100
Breakpoint 1, main (argc=2, argv=0x7fffffffdc48) at simple2.c:23
23         set_it(nmax,squares);

To get a bit larger overview on the code we may type list. The value of nmax is indeed 100:

(gdb)print nmax
$2 = 100

To navigate further we may choose:

Command	Abbreviation	Description
step step #lines	s	Step to next line and into functions.
next next #lines	n	Step to next line without stepping into functions.
until item	u	Continue until item, which may be function, adress, filename:function, filename:line
continue continue #points	c	Continue until next break/watch point.
finish	-	Leave current function

Moving around and controling nmax we find out, that nmax is changed in set_it though we passed nmax by value. Thus somehow, though not intended, set_it changes the value at the memory location of nmax but shouldn't - thus the Segmentation fault reported is caused by an error that modifies one of the variables in main but this misbehaviour is not reported.

Caveat with Segmentation faults:
A segmentation fault is reported if the memory your code wants to access is outside the memory range given to your program. There is no complain, if your code modifies an adress within your memory range at an unintended place.

And this is what happens here. There is a memory violation in set_it that modifies nmax. But the violation is within the allowed memory range and thus not reported. This is the very nasty face of a memory problem. It may hit a value, like in our case, that produces a faulti, but it may as well just modify a value producing nothing but a wrong result. It may as well hit the program code itself and modify an instruction ending in an Illegal instruction fault (SIGILL) or even worse in a different instruction producing again wrong results.

How to find such errors? In our case its simple, as we know a place to look for, that is nmax. But this variable is not known in set_it thus we have to watch the memory location of nmax instead.

// Do not type anything after a //
(gdb)run 100      // rerun
(gdb)p &nmax      // which adress for nmax
$5 = (int *) 0x7fffffffdb5c
(gdb) watch *0x7fffffffdb5c
Hardware watchpoint 2: *0x7fffffffdb5c
(gdb) break 24    // breakpoint at next line to finish set_it
(gdb) c
Continuing.
Hardware watchpoint 2: *0x7fffffffdb5c

Old value = 100
New value = 1093664768
set_it (n=100, squares=0x7fffffffdb30) at simple2.c:5
5          for(i=0;i<n;i++)

This tells us, that somehow in the loop at line 5 something nasty is happening. A print i reveals n=11. The loop in doubt looks

5          for(i=0;i<n;i++) 
6                  squares[i]=i*1.0f;

Typically wrong allocated arrays cause memory problems. So we inspect the memory location of array squares. From gdb output we know, that squares=0x7fffffffdb30 which is pretty close to the adress of nmax at 0x7fffffffdb5c. Let us look which adress the 11th element of squares occupies:

(gdb) print squares[i] // this is the value
$7 = 11
(gdb) print &squares[i] // this is the adress
$8 = (float *) 0x7fffffffdb5c

But this is exactly the adress of nmax. We just have written (float) 11 at the adress of nmax which seems to be (int) 1093664768 due to the different internal way of representing floats and ints.

And now we easily see, that squares[10] of course has no space for element 11 and that this exceeded array range causes the fault.

The next chapter of this tutorial treats the inspection of core files and attaching gdb to a running program.

Of course gdb allows much more than discussed here. For further reading see here.

Main Navigation

Contents

Debugging with gdb