Home / Linux / Oops! Debugging Kernel Panics | Linux Journal
Oops! Debugging Kernel Panics | Linux Journal
Oops! Debugging Kernel Panics | Linux Journal

Oops! Debugging Kernel Panics | Linux Journal

Oops! Debugging Kernel Panics | Linux Journal

A glance into what reasons kernel panics and a few utilities to lend a hand achieve
additional information.

Working in a Linux surroundings, how continuously have you ever noticed a kernel panic?
When it occurs, your machine is left in a crippled state till
you reboot it utterly. And, even after you get your machine again right into a
useful state, you might be nonetheless left with the query: why? You would possibly haven’t any
concept what came about or why it came about. Those questions may also be replied
despite the fact that,
and the next information will permit you to root out the reason for one of the most stipulations
that resulted in the unique crash.


Figure 1. A Typical Kernel Panic

Let’s get started via taking a look at a collection of utilities referred to as
kexec and kdump. kexec lets you boot into
some other kernel from an current (and working) kernel, and
kdump is a
kexec-based crash-dumping mechanism for Linux.

Installing the Required Packages

First and major, your kernel will have to have the next elements statically in-built to its picture:


You can in finding this in /boot/config-`uname -r`.

Make positive that your running machine is up to the moment with the latest-and-greatest bundle variations:

$ sudo apt replace && sudo apt improve

Install the next applications
(I am recently the use of Debian, however the
similar will have to and can follow to Ubuntu):

$ sudo apt set up gcc make binutils linux-headers-`uname -r`
 ↪kdump-tools crash `uname -r`-dbg

Note: Package names would possibly range
throughout distributions.

During the set up, you are going to be caused with inquiries to allow
kexec to care for reboots (solution no matter you would like, however I replied
“no”; see Figure 2).


Figure 2.
kexec Configuration Menu

And to allow kdump to run and cargo at machine boot, solution
“yes” (Figure Three).


Figure Three.
kdump Configuration Menu

Configuring kdump

Open the /and many others/default/kdump-tools report, and on the very best,
you will have to see the next:


Eventually, you can write a customized module that may set off an OOPS kernel
situation, and with a purpose to have kdump collect and save the state of the
machine for autopsy research, you can want to allow your kernel to
panic in this OOPS situation. In order to try this, uncomment the road
that begins with KDUMP_SYSCTL:


The preliminary trying out would require that SysRq be enabled. There
are a couple of tactics to try this, however right here I supply directions
to allow toughen for this option on machine reboot. Open the
/and many others/sysctl.d/99-sysctl.conf report, and ensure that the
following line (nearer to the ground of the report) is uncommented:


Now, open this report: /and many others/default/grub.d/kdump-tools.default. You
will discover a unmarried line that appears like this:


Modify the segment that reads crashkernel=384M-:128M to

Now, replace your GRUB boot configuration report:

$ sudo update-grub
[sudo] password for petros:
Generating grub configuration report ...
Found linux picture: /boot/vmlinuz-Four.nine.Zero-Eight-amd64
Found initrd picture: /boot/initrd.img-Four.nine.Zero-Eight-amd64

And, reboot the machine.

Verifying Your kdump Environment

After getting back from the reboot, dmesg will log the

$ sudo dmesg |grep -i crash
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-Four.nine.Zero-Eight-amd64
 ↪root=UUID=bd76b0fe-9d09-40a9-a0d8-a7533620f6fa ro quiet
[    0.000000] Reserving 128MB of reminiscence at 720MB for crashkernel
 ↪(System RAM: 4095MB)
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/
 ↪root=UUID=bd76b0fe-9d09-40a9-a0d8-a7533620f6fa ro
 ↪quiet crashkernel=128M

While your kernel may have the next options enabled (a “1”
manner enabled):

$ sudo sysctl -a|grep kernel|grep -e panic_on_oops -e sysrq
kernel.panic_on_oops = 1
kernel.sysrq = 1

Your kdump carrier will have to be working:

$ sudo systemctl standing kdump-tools.carrier
 kdump-tools.carrier - Kernel crash unload seize carrier
   Loaded: loaded (/lib/systemd/machine/kdump-tools.carrier;
    ↪enabled; seller preset: enabled)
   Active: lively (exited) since Tue 2019-02-26 08:13:34 CST;
    ↪1h 33min in the past
  Process: 371 ExecStart=/and many others/init.d/kdump-tools get started
   ↪(code=exited, standing=Zero/SUCCESS)
 Main PID: 371 (code=exited, standing=Zero/SUCCESS)
    Tasks: Zero (restrict: 4915)
   CGroup: /machine.slice/kdump-tools.carrier

Feb 26 08:13:34 deb-panic systemd[1]: Starting Kernel crash
 ↪unload seize carrier...
Feb 26 08:13:34 deb-panic kdump-tools[371]: Starting
 ↪kdump-tools: loaded kdump kernel.
Feb 26 08:13:34 deb-panic kdump-tools[505]: /sbin/kexec -p
 ↪--command-line="BOOT_IMAGE=/boot/vmlinuz-Four.nine.Zero-Eight-amd64 root=
Feb 26 08:13:34 deb-panic kdump-tools[506]: loaded kdump kernel
Feb 26 08:13:34 deb-panic systemd[1]: Started Kernel crash unload
 ↪seize carrier.

Your crash kernel will have to be loaded (into reminiscence and within the 128M area
you outlined previous):

$ cat /sys/kernel/kexec_crash_loaded

You can test your kdump configuration additional right here:

$ sudo kdump-config display
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2d000000
   /var/lib/kdump/vmlinuz: symbolic hyperlink to /boot/
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic hyperlink to /var/lib/kdump/
present state:    in a position to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/
↪vmlinuz-Four.nine.Zero-Eight-amd64 root=UUID=bd76b0fe-9d09-40a9-
↪a0d8-a7533620f6fa ro quiet irqpoll nr_cpus=1 nousb
 ↪--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

Let’s additionally verify it with out if truth be told working it:

$ sudo kdump-config verify
USE_KDUMP:         1
KDUMP_SYSCTL:      kernel.panic_on_oops=1
KDUMP_COREDIR:     /var/crash
crashkernel addr:  0x2d000000
kdump kernel addr:
kdump kernel:
   /var/lib/kdump/vmlinuz: symbolic hyperlink to /boot/
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic hyperlink to
kexec command for use:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/
↪vmlinuz-Four.nine.Zero-Eight-amd64 root=UUID=bd76b0fe-9d09-40a9-
↪a0d8-a7533620f6fa ro quiet irqpoll nr_cpus=1 nousb
 ↪--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

The Moment of Truth

Now that your surroundings is loaded to use kdump, you
most likely will have to verify it, and one of the simplest ways to check it’s via forcing a
kernel crash over SysRq. Assuming your kernel is constructed with SysRq toughen,
crashing a working kernel is so simple as typing:

$ echo "c" | sudo tee -a /proc/sysrq-trigger

What will have to you are expecting? You’ll see a kernel panic/crash very similar to the
one proven in Figure 1. Following this crash, the kernel loaded over kexec will
acquire the state of the machine, which contains the whole lot related in
reminiscence, at the CPU, in dmesg, in loaded modules and extra. It then
will save this treasured crash knowledge someplace in /var/crash for
additional research. Once the number of data completes, the machine
will reboot robotically and can convey you again to a useful state.

What Now?

You now have your crash report, and once more, it is situated in

$ cd /var/crash/
$ ls
201902261006  kexec_cmd
$ cd 201902261006/

Although sooner than opening the crash report, you most likely will have to set up the
kernel’s supply bundle:

$ sudo apt supply linux-image-`uname -r`

Earlier, you put in a debug model of your Linux kernel containing
the unstripped debug symbols required for this sort of debugging
research. Now you want that kernel. Open the kernel crash report with the
crash software:

$ sudo crash unload.201902261006 /usr/lib/debug/

Once the whole lot a lot, a abstract of the panic will seem at the display:

      KERNEL: /usr/lib/debug/vmlinux-Four.nine.Zero-Eight-amd64
    DUMPFILE: unload.201902261006  [PARTIAL DUMP]
        CPUS: Four
        DATE: Tue Feb 26 10:07:21 2019
      UPTIME: 00:04:09
LOAD AVERAGE: Zero.00, Zero.00, Zero.00
       TASKS: 100
    NODENAME: deb-panic
     RELEASE: Four.nine.Zero-Eight-amd64
     VERSION: #1 SMP Debian Four.nine.144-Three (2019-02-02)
     MACHINE: x86_64  (2592 Mhz)
      MEMORY: Four GB
       PANIC: "sysrq: SysRq : Trigger a crash"
         PID: 563
     COMMAND: "tee"
        TASK: ffff88e69628c080 [THREAD_INFO: ffff88e69628c080]
         CPU: 2

Notice the cause of the panic: sysrq: SysRq : Trigger
a crash
. Also, realize the command that resulted in it:
tee. None of this will have to be a marvel because you
prompted it.

If you run a backtrace of what the kernel purposes have been that resulted in the
panic, you will have to see the next (processed via CPU core no. 2):

crash> bt
PID: 563    TASK: ffff88e69628c080  CPU: 2   COMMAND: "tee"
 #Zero [ffffa67440b23ba0] machine_kexec at ffffffffa0c53f68
 #1 [ffffa67440b23bf8] __crash_kexec at ffffffffa0d086d1
 #2 [ffffa67440b23cb8] crash_kexec at ffffffffa0d08738
 #Three [ffffa67440b23cd0] oops_end at ffffffffa0c298b3
 #Four [ffffa67440b23cf0] no_context at ffffffffa0c619b1
 #five [ffffa67440b23d50] __do_page_fault at ffffffffa0c62476
 #6 [ffffa67440b23dc0] page_fault at ffffffffa121a618
    [exception RIP: sysrq_handle_crash+18]
    RIP: ffffffffa102be62  RSP: ffffa67440b23e78  RFLAGS: 00010282
    RAX: ffffffffa102be50  RBX: 0000000000000063  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff88e69fd10648  RDI: 0000000000000063
    RBP: ffffffffa18bf320   R8: 0000000000000001   R9: 0000000000007eb8
    R10: 0000000000000001  R11: 0000000000000001  R12: 0000000000000004
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa67440b23e78] __handle_sysrq at ffffffffa102c597
 #Eight [ffffa67440b23ea0] write_sysrq_trigger at ffffffffa102c9db
 #nine [ffffa67440b23eb0] proc_reg_write at ffffffffa0e7ac00
#10 [ffffa67440b23ec8] vfs_write at ffffffffa0e0b3b0
#11 [ffffa67440b23ef8] sys_write at ffffffffa0e0c7f2
#12 [ffffa67440b23f38] do_syscall_64 at ffffffffa0c03b7d
#13 [ffffa67440b23f50] entry_SYSCALL_64_after_swapgs at ffffffffa121924e
    RIP: 00007f3952463970  RSP: 00007ffc7f3a4e58  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007f3952463970
    RDX: 0000000000000002  RSI: 00007ffc7f3a4f60  RDI: 0000000000000003
    RBP: 00007ffc7f3a4f60   R8: 00005648f508b610   R9: 00007f3952944480
    R10: 0000000000000839  R11: 0000000000000246  R12: 0000000000000002
    R13: 0000000000000001  R14: 00005648f508b530  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: Zero02b

In your backtrace, you will have to realize the logo cope with of what’s saved in
your Return Instruction Pointer (RIP): ffffffffa102be62. Let’s check out this image cope with:

crash> sym ffffffffa102be62
ffffffffa102be62 (t) sysrq_handle_crash+18 ./debian/construct/
↪build_amd64_none_amd64/./drivers/tty/sysrq.c: 144

Wait a minute! The exception turns out to were prompted in line 144
of the drivers/tty/sysrq.c report and within the
sysrq_handle_crash serve as. Hmm…I ponder what is going down
on this kernel supply report. (This is why I had you put in your kernel supply
bundle moments in the past.) Let’s navigate to the /usr/src
listing and untar the supply bundle:

$ cd /usr/src
$ ls
linux_4.nine.144-Three.debian.tar.xz  linux_4.nine.144.orig.tar.xz
linux_4.nine.144-Three.dsc            linux-headers-Four.nine.Zero-Eight-amd64
$ sudo tar xJf linux_4.nine.144.orig.tar.xz
$ vim linux-Four.nine.144/drivers/tty/sysrq.c

Locate the sysrq_handle_crash serve as:

static void sysrq_handle_crash(int key)

    char *killer = NULL;

    /* we want to unencumber the RCU learn lock right here,
     * differently we get an hectic
     * 'BUG: slumbering serve as referred to as from invalid context'
     * criticism from the kernel sooner than the panic.
    panic_on_oops = 1;      /* power panic */
    *killer = 1;

And extra in particular, have a look at line 144:

*killer = 1;

It used to be this line that resulted in the web page fault logged in line #6 of the

#6 [ffffa67440b23dc0] page_fault at ffffffffa121a618

Okay. So, now you’ll have a fundamental working out of the right way to debug dangerous
kernel code,
however what occurs if you wish to debug your very personal customized kernel modules
(as an example, drivers)? I wrote a easy Linux kernel module that necessarily
invokes a identical taste of a kernel crash when loaded. Call it
test-module.c and reserve it someplace in your house listing:

#come with <linux/init.h>
#come with <linux/module.h>
#come with <linux/model.h>

static int test_module_init(void)

        int *p = 1;
printk("%dn", *p);
        go back Zero;

static void test_module_exit(void)

        go back;


You’ll want a Makefile to assemble this kernel module (reserve it within the
similar listing):

obj-m += test-module.o

    $(MAKE) -C/lib/modules/$(shell uname -r)/construct M=$(PWD)

Run the make command to assemble the module and do
now not delete any of the compilation artifacts; you can want
the ones later:

$ make
make -C/lib/modules/Four.nine.Zero-Eight-amd64/construct M=/house/petros
make[1]: Entering listing '/usr/src/
  CC [M]  /house/petros/test-module.o
/house/petros/test-module.c: In serve as "test_module_init":
/house/petros/test-module.c:7:11: caution: initialization makes
 ↪pointer from integer with out a solid [-Wint-conversion]
  int *p = 1;
  Building modules, degree 2.
  MODPOST 1 modules
  LD [M]  /house/petros/test-module.ko
make[1]: Leaving listing '/usr/src/

Note: you may even see a compilation caution. Ignore it
for now. This caution can be what triggers your kernel crash.

Be cautious now. Once you load the .ko report, the machine will
crash, so ensure the whole lot is stored and synchronized to disk:

$ sync && sudo insmod test-module.ko

Similar to sooner than, the machine will crash, the kexec
kernel/surroundings will lend a hand collect the whole lot and reserve it someplace in
/var/crash, adopted via an automated reboot. After you will have
rebooted and are again right into a useful state, find the brand new crash
listing and grow to be it:

$ cd /var/crash/201902261035/

Also, replica the unstripped kernel object report to your test-module from
your own home listing and into the present running listing:

$ sudo cp ~/verify.o /var/crash/201902261035/

Load the crash report together with your debug kernel:

$ sudo crash unload.201902261035 /usr/lib/debug/

Your abstract will have to glance one thing like this:

      KERNEL: /usr/lib/debug/vmlinux-Four.nine.Zero-Eight-amd64
    DUMPFILE: unload.201902261035  [PARTIAL DUMP]
        CPUS: Four
        DATE: Tue Feb 26 10:37:47 2019
      UPTIME: 00:11:16
LOAD AVERAGE: Zero.24, Zero.06, Zero.02
       TASKS: 102
    NODENAME: deb-panic
     RELEASE: Four.nine.Zero-Eight-amd64
     VERSION: #1 SMP Debian Four.nine.144-Three (2019-02-02)
     MACHINE: x86_64  (2592 Mhz)
      MEMORY: Four GB
       PANIC: "BUG: not able to care for kernel NULL pointer
 ↪dereference at 0000000000000001"
         PID: 1493
     COMMAND: "insmod"
        TASK: ffff893c5a5a5080 [THREAD_INFO: ffff893c5a5a5080]
         CPU: Three

The reason why for the kernel crash is summarized as follows:
BUG: not able to care for kernel NULL pointer dereference at
. The userspace command that resulted in the panic
used to be your insmod.

A backtrace will expose a web page fault exception at cope with

crash> bt
PID: 1493   TASK: ffff893c5a5a5080  CPU: Three  COMMAND: "insmod"
 #Zero [ffff9dcd013b79f0] machine_kexec at ffffffffa3a53f68
 #1 [ffff9dcd013b7a48] __crash_kexec at ffffffffa3b086d1
 #2 [ffff9dcd013b7b08] crash_kexec at ffffffffa3b08738
 #Three [ffff9dcd013b7b20] oops_end at ffffffffa3a298b3
 #Four [ffff9dcd013b7b40] no_context at ffffffffa3a619b1
 #five [ffff9dcd013b7ba0] __do_page_fault at ffffffffa3a62476
 #6 [ffff9dcd013b7c10] page_fault at ffffffffa401a618
    [exception RIP: init_module+5]
    RIP: ffffffffc05ed005  RSP: ffff9dcd013b7cc8  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000080000000  RSI: ffff893c5a5a5ac0  RDI: ffffffffc05ed00Zero
    RBP: ffffffffc05ed00Zero   R8: 0000000000020098   R9: 0000000000000006
    R10: 0000000000000000  R11: ffff893c5a4d8100  R12: ffff893c5880d460
    R13: ffff893c56500e80  R14: ffffffffc05ef00Zero  R15: ffffffffc05ef050
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff9dcd013b7cc8] do_one_initcall at ffffffffa3a0218e
 #Eight [ffff9dcd013b7d38] do_init_module at ffffffffa3b81531
 #nine [ffff9dcd013b7d58] load_module at ffffffffa3b04aaa
#10 [ffff9dcd013b7e90] SYSC_finit_module at ffffffffa3b051f6
#11 [ffff9dcd013b7f38] do_syscall_64 at ffffffffa3a03b7d
#12 [ffff9dcd013b7f50] entry_SYSCALL_64_after_swapgs at ffffffffa401924e
    RIP: 00007f124662c469  RSP: 00007fffc4ca04a8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000564213d111f0  RCX: 00007f124662c469
    RDX: 0000000000000000  RSI: 00005642129d3638  RDI: 0000000000000003
    RBP: 00005642129d3638   R8: 0000000000000000   R9: 00007f12468e3ea0
    R10: 0000000000000003  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000564213d10130  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000139  CS: 0033  SS: Zero02b

Let’s strive to take a look at the logo on the cope with

crash> sym ffffffffc05ed005
ffffffffc05ed005 (t) init_module+five [test-module]

Hmm. The factor passed off someplace within the module initialization code of
the test-module kernel driving force. But what came about to all of
the main points proven within the previous research? Well, as a result of this code is
now not a part of the debug kernel picture, you can want to have the opportunity to load
it into your crash research. This is why I steered you to replicate over the
unstripped object report into your present running listing. Now it is time to load
the module’s object report:

crash> mod -s verify ./verify.o
     MODULE       NAME                   SIZE  OBJECT FILE
ffffffffc05ef00Zero  verify                  16384  ./verify.o

Now you’ll return and have a look at the similar image cope with:

crash> sym ffffffffc05ed005
ffffffffc05ed005 (T) init_module+five [test-module]
 ↪/house/petros/test-module.c: Eight

And, now it is time to revisit on your code and have a look at line Eight:

$ sed -n 8p verify.c
        printk("%dn", *p);

There you will have it. The web page fault passed off whilst you tried to
print the poorly outlined pointer. Remember the compilation caution from
previous? Well, it used to be caution you for a reason why, and on this present case,
it is the reason why that prompted the kernel panic. You will not be as
lucky in long run coding circumstances.

What Else Can You Do Here?

The kernel crash report will maintain many artifacts out of your machine on the
tournament of your crash. You can record a brief abstract of to be had instructions with the
lend a hand command:

crash> lend a hand

*            recordsdata        mach         repeat       timer
alias        foreach      mod          runq         tree
ascii        fuser        mount        seek       union
bt           gdb          internet          set          vm
btop         lend a hand         p            sig          vtop
dev          ipcs         playstation           struct       waitq
dis          irq          pte          change         whatis
eval         kmem         ptob         sym          wr
go out         record         ptov         sys          q
prolong       log          rd           process

For example, if you wish to see a common abstract of reminiscence usage:

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   979869       Three.7 GB         ----
         FREE   835519       Three.2 GB   85% of TOTAL MEM
         USED   144350     563.nine MB   14% of TOTAL MEM
       SHARED     8374      32.7 MB    Zero% of TOTAL MEM
      BUFFERS     3849        15 MB    Zero% of TOTAL MEM
       CACHED        Zero            Zero    Zero% of TOTAL MEM
         SLAB     5911      23.1 MB    Zero% of TOTAL MEM

   TOTAL SWAP  1047807         Four GB         ----
    SWAP USED        Zero            Zero    Zero% of TOTAL SWAP
    SWAP FREE  1047807         Four GB  100% of TOTAL SWAP

 COMMIT LIMIT  1537741       five.nine GB         ----
    COMMITTED    16370      63.nine MB    1% of TOTAL LIMIT

If you need to peer what dmesg logged as much as the purpose of
the failure:

crash> log

[    0.000000] Linux model Four.nine.Zero-Eight-amd64
 ↪([email protected]) (gcc model 6.Three.Zero
 ↪20170516 (Debian 6.Three.Zero-18+deb9u1) ) #1 SMP Debian
 ↪Four.nine.144-Three (2019-02-02)
[    0.000000] Command line: BOOT_IMAGE=/boot/
↪vmlinuz-Four.nine.Zero-Eight-amd64 root=UUID=bd76b0fe-9d09-40a9-
↪a0d8-a7533620f6fa ro quiet crashkernel=128M
[    0.000000] x86/fpu: Supporting XSAVE function 0x001:
 ↪'x87 floating level registers'
[    0.000000] x86/fpu: Supporting XSAVE function 0x002:
 ↪'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE function 0x004:
 ↪'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:

[ .... ]

Using the similar crash software, you’ll drill even deeper into reminiscence
places and their contents, what’s being treated via each and every CPU core
on the time of the crash and so a lot more. If you need to be told extra
about those purposes, merely sort lend a hand adopted via the
serve as identify:

crash> lend a hand mount

Something very similar to a person web page will load onto your display.


So, there you will have it: an creation into kernel crash debugging. This
slightly scrapes the outside, however confidently, it’ll supply
you with a correct place to begin to lend a hand diagnose kernel crashes in
manufacturing, construction and verify environments.

Check Also

How to Watch TCP and UDP Ports in Real-time

How to Watch TCP and UDP Ports in Real-time

How to Watch TCP and UDP Ports in Real-time In tool phrases, particularly on the …

Leave a Reply

Your email address will not be published. Required fields are marked *


Recent Posts