Main Navigation

Secondary Navigation

Page Contents

Contents

Analyzing core dumps with gdb

Frequently program execution faults cause the operating system to dump a complete copy of a process to disc. These may become many very large files when debugging parallel programs there each task - it may be thousands - dump a core with several GBs in size. Therefore on Elwetritsch we have limited the size of possible core files to 0 bytes and organized that all cores of all users are dumped into /scratch/corefiles/ and clean this directory regularly. In case a user wants to inspect his core dumps he may use:

ulimit -c unlimited
cp /scratch/corefiles/<my_core_file> ./core
and inspect the memory contents of his program with gdb.

gdb ./simple2 core
Reading symbols from /home/schuele/Projekte/DOC/Debug/simple2...done.
Core was generated by `./simple2 100'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000400660 in add_it (n=1093664768, squares=0x7ffff15cb960, sum=0x7ffff15cb95c) 
at simple2.c:11
11                 *sum+=squares[i];
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64
We recognize very fast that the segmentation fault occured at line 11 in add_it in a loop at i=2472:
gdb) list
6                  squares[i]=i*1.0f;
7       }
8       void add_it(int n,float *squares, float *sum) {
9       int i;
10         for(i=0;i<n;i++) 
11                 *sum+=squares[i];
12      }
13
14
15      int main(int argc, char **argv) {
(gdb) print i
$1 = 2472
In case we want to find out how add_it is called we may enter
(gdb) backtrace
#0  0x0000000000400660 in add_it (n=1093664768, squares=0x7ffff15cb960, sum=0x7ffff15cb95c) 
at simple2.c:11
#1  0x00000000004006ec in main (argc=2, argv=0x7ffff15cba78) at simple2.c:24
(gdb) up
#1  0x00000000004006ec in main (argc=2, argv=0x7ffff15cba78) at simple2.c:24
24         add_it(nmax,squares,&sum);
and navigate with up and down the call stack thereby printing variables of the current program segment and so on.

Attaching gdb to a running program

We may step to simple4.c there a call to sleep to start the program in its own window, grep its process id and attach gdb to it.

This technique may be used, if a program runs for hours or is started by another process we can not control.

[] gcc -g -O0 simple4.c -o simple4
[] ./simple4 100
[] ps -fu $USER |grep simple4|awk '{print $2}'
438476
gdb ./simple4 438476
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Reading symbols from /home/schuele/Projekte/DOC/Debug/simple4...done.
Attaching to program: /home/schuele/Projekte/DOC/Debug/./simple4, process 438476
(gdb) backtrace
#0  0x00007fdfc3635f90 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x00007fdfc3635e44 in sleep () from /lib64/libc.so.6
#2  0x0000000000400710 in main (argc=2, argv=0x7fff5d1c89b8) at simple4.c:11

The first 2 lines are executed in a new window. gdb takes over there the program is currently executing which is in our case the sleep call. We recognize that sleep itself calls nanosleep. To finish both calls we type finish twice and may continue our usual debugging at line 12 of main.

Back to content.