crash

mirror of https://github.com/crash-utility/crash synced 2025-02-18 22:46:50 +00:00

Author	SHA1	Message	Date
Kazuhito Hagio	dda5b2d02b	gdb: print details of unnamed struct and union Currently gdb's "ptype" command does not print the details of unnamed structure and union deeper than second level in a structure, it prints only "{...}" instead. And crash's "struct" and similar commands also inherit this behavior, so we cannot get the full information of them. To print the details of them, change the show variable when it is an unnamed one like crash-7.x. Without the patch: crash> struct -o page struct page { [0] unsigned long flags; union { struct {...}; struct {...}; ... With the patch: crash> struct -o page struct page { [0] unsigned long flags; union { struct { [8] struct list_head lru; [24] struct address_space *mapping; [32] unsigned long index; [40] unsigned long private; }; struct { [8] dma_addr_t dma_addr; }; ... Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-06-01 08:48:12 +09:00
Qi Zheng	0f162febeb	bt: arm64: add support for 'bt -n idle' The '-n idle' option of bt command can help us filter the stack of the idle process when debugging the dumpfiles captured by kdump. This patch supports this feature on ARM64. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>	2022-05-26 11:45:44 +09:00
Qi Zheng	6833262bf8	bt: x86_64: filter out idle task stack When we use crash to troubleshoot softlockup and other problems, we often use the 'bt -a' command to print the stacks of running processes on all CPUs. But now some servers have hundreds of CPUs (such as AMD machines), which causes the 'bt -a' command to output a lot of process stacks. And many of these stacks are the stacks of the idle process, which are not needed by us. Therefore, in order to reduce this part of the interference information, this patch adds the -n option to the bt command. When we specify '-n idle' (meaning no idle), the stack of the idle process will be filtered out, thus speeding up our troubleshooting. And the option works only for crash dumps captured by kdump. The command output is as follows: crash> bt -a -n idle [...] PID: 0 TASK: ffff889ff8c34380 CPU: 8 COMMAND: "swapper/8" PID: 0 TASK: ffff889ff8c32d00 CPU: 9 COMMAND: "swapper/9" PID: 0 TASK: ffff889ff8c31680 CPU: 10 COMMAND: "swapper/10" PID: 0 TASK: ffff889ff8c35a00 CPU: 11 COMMAND: "swapper/11" PID: 0 TASK: ffff889ff8c3c380 CPU: 12 COMMAND: "swapper/12" PID: 150773 TASK: ffff889fe85a1680 CPU: 13 COMMAND: "bash" #0 [ffffc9000d35bcd0] machine_kexec at ffffffff8105a407 #1 [ffffc9000d35bd28] __crash_kexec at ffffffff8113033d #2 [ffffc9000d35bdf0] panic at ffffffff81081930 #3 [ffffc9000d35be70] sysrq_handle_crash at ffffffff814e38d1 #4 [ffffc9000d35be78] __handle_sysrq.cold.12 at ffffffff814e4175 #5 [ffffc9000d35bea8] write_sysrq_trigger at ffffffff814e404b #6 [ffffc9000d35beb8] proc_reg_write at ffffffff81330d86 #7 [ffffc9000d35bed0] vfs_write at ffffffff812a72d5 #8 [ffffc9000d35bf00] ksys_write at ffffffff812a7579 #9 [ffffc9000d35bf38] do_syscall_64 at ffffffff81004259 RIP: 00007fa7abcdc274 RSP: 00007fffa731f678 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa7abcdc274 RDX: 0000000000000002 RSI: 0000563ca51ee6d0 RDI: 0000000000000001 RBP: 0000563ca51ee6d0 R8: 000000000000000a R9: 00007fa7abd6be80 R10: 000000000000000a R11: 0000000000000246 R12: 00007fa7abdad760 R13: 0000000000000002 R14: 00007fa7abda8760 R15: 0000000000000002 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b [...] Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com> Acked-by: Lianbo Jiang <lijiang@redhat.com>	2022-05-26 11:45:17 +09:00
Kazuhito Hagio	9705669a49	Makefile: add missing crash_target.o to be cleaned Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-05-26 11:37:31 +09:00
Lianbo Jiang	3750803f6a	sbitmapq: fix invalid offset for "sbitmap_word_depth" on Linux v5.18-rc1 Kernel commit 3301bc53358a ("lib/sbitmap: kill 'depth' from sbitmap_word") removed the depth member from struct sbitmap_word. Without the patch, the sbitmapq will fail: crash> sbitmapq 0xffff8e99d0dc8010 sbitmapq: invalid structure member offset: sbitmap_word_depth FILE: sbitmap.c LINE: 84 FUNCTION: __sbitmap_weight() Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-05-24 17:21:47 +09:00
Lianbo Jiang	530fe6ad7e	sbitmapq: fix invalid offset for "sbitmap_queue_round_robin" on Linux v5.13-rc1 Kernel commit efe1f3a1d583 ("scsi: sbitmap: Maintain allocation round_robin in sbitmap") moved the round_robin member from struct sbitmap_queue to struct sbitmap. Without the patch, the sbitmapq will fail: crash> sbitmapq 0xffff8e99d0dc8010 sbitmapq: invalid structure member offset: sbitmap_queue_round_robin FILE: sbitmap.c LINE: 378 FUNCTION: sbitmap_queue_context_load() Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-05-24 17:21:47 +09:00
Lianbo Jiang	a295cb40cd	sbitmapq: fix invalid offset for "sbitmap_queue_alloc_hint" on Linux v5.13-rc1 Kernel commit c548e62bcf6a ("scsi: sbitmap: Move allocation hint into sbitmap") moved the alloc_hint member from struct sbitmap_queue to struct sbitmap. Without the patch, the sbitmapq will fail: crash> sbitmapq 0xffff8e99d0dc8010 sbitmapq: invalid structure member offset: sbitmap_queue_alloc_hint FILE: sbitmap.c LINE: 365 FUNCTION: sbitmap_queue_context_load() Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-05-24 17:21:47 +09:00
Lianbo Jiang	364b2e413c	sbitmapq: remove struct and member validation in sbitmapq_init() Let's remove the struct and member validation from sbitmapq_init(), which will help the crash to display the actual error when the sbitmapq fails. Without the patch: crash> sbitmapq ffff8e99d0dc8010 sbitmapq: command not supported or applicable on this architecture or kernel With the patch: crash> sbitmapq ffff8e99d0dc8010 sbitmapq: invalid structure member offset: sbitmap_queue_alloc_hint FILE: sbitmap.c LINE: 365 FUNCTION: sbitmap_queue_context_load() Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-05-24 17:21:47 +09:00
Sourabh Jain	ae52398a13	ppc64: update the NR_CPUS to 8192 Since the kernel commit 2d8ae638bb86 ("powerpc: Make the NR_CPUS max 8192") the NR_CPUS on Linux kernel ranges from 1-8192. So let's match NR_CPUS with the max NR_CPUS count on the Linux kernel. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>	2022-05-10 10:35:33 +09:00
Kazuhito Hagio	0ca55e4607	Mark start of 8.0.2 development phase with version 8.0.1++ Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-05-10 10:27:44 +09:00
Kazuhito Hagio	2d193468e5	crash-8.0.0 -> crash-8.0.1 Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-04-26 10:56:43 +09:00
Huang Shijie	b811a045ec	diskdump: Optimize the boot time 1.) The vmcore file maybe very big. For example, I have a vmcore file which is over 23G, and the panic kernel had 767.6G memory, its max_sect_len is 4468736. Current code costs too much time to do the following loop: .............................................. for (i = 1; i < max_sect_len + 1; i++) { dd->valid_pages[i] = dd->valid_pages[i - 1]; for (j = 0; j < BITMAP_SECT_LEN; j++, pfn++) if (page_is_dumpable(pfn)) dd->valid_pages[i]++; .............................................. For my case, it costs about 56 seconds to finish the big loop. This patch moves the hweightXX macros to defs.h, and uses hweight64 to optimize the loop. For my vmcore, the loop only costs about one second now. 2.) Tests result: # cat ./commands.txt quit Before: #echo 3 > /proc/sys/vm/drop_caches; #time ./crash -i ./commands.txt /root/t/vmlinux /root/t/vmcore > /dev/null 2>&1 ............................ real 1m54.259s user 1m12.494s sys 0m3.857s ............................ After this patch: #echo 3 > /proc/sys/vm/drop_caches; #time ./crash -i ./commands.txt /root/t/vmlinux /root/t/vmcore > /dev/null 2>&1 ............................ real 0m55.217s user 0m15.114s sys 0m3.560s ............................ Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-04-11 15:29:39 +08:00
Huang Shijie	a334423974	diskdump: use mmap/madvise to improve the start-up Sometimes, the size of bitmap in vmcore can be very large, such as over 256M. This patch uses mmap/madvise to improve the performance of reading bitmap in the non-FLAT_FORMAT code path. Without the patch: #echo 3 > /proc/sys/vm/drop_caches; #time ./crash -i ./commands.txt /root/t/vmlinux /root/t/vmcore > /dev/null 2>&1 ............................ real 0m55.217s user 0m15.114s sys 0m3.560s ............................ With the patch: #echo 3 > /proc/sys/vm/drop_caches; #time ./crash -i ./commands.txt /root/t/vmlinux /root/t/vmcore > /dev/null 2>&1 ............................ real 0m44.097s user 0m19.031s sys 0m1.620s ............................ Note: Test files: vmlinux: 272M vmcore : 23G (bitmap_len: 4575985664) #cat ./commands.txt quit Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-04-10 22:35:58 +08:00
Rongwei Wang	87369080a4	arm64: handle 1GB block for VM_L4_4K When arm64 is configured with PAGE_SIZE=4k and 4 level translation, the pagetable of all pages may be created with block mapping or contiguous mapping as much as possible, likes disable CONFIG_RODATA_FULL_DEFAULT_ENABLED. But now, vtop command can not handle 1GB block (PUD mapping) well, and just shows a seek error: crash> vtop ffff00184a800000 VIRTUAL PHYSICAL ffff00184a800000 188a800000 PAGE DIRECTORY: ffff8000110aa000 PGD: ffff8000110aa000 => 203fff9003 PUD: ffff001fffff9308 => 68001880000705 PMD: ffff0018400002a0 => ffff8000103b4fd0 vtop: seek error: kernel virtual address: ffff7fffd03b4000 type: "page table" This patch fixes it, and shows as following: crash> vtop ffff00184a800000 VIRTUAL PHYSICAL ffff00184a800000 188a800000 PAGE DIRECTORY: ffff8000110aa000 PGD: ffff8000110aa000 => 203fff9003 PUD: ffff001fffff9308 => 68001880000705 PAGE: 1880000000 (1GB) PTE PHYSICAL FLAGS 68001880000705 1880000000 (VALID\|SHARED\|AF\|PXN\|UXN) PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffffe00610a0000 188a800000 0 0 0 77fffe0000000000 Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com> Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>	2022-04-09 19:39:06 +08:00
xiaer1921	b89f9ccf51	Fix for "kmem -s\|-S" on Linux 5.17+ with CONFIG_SLAB Since the following kernel commits split slab info from struct page into struct slab, crash cannot get several slab related offsets from struct page. d122019bf061 ("mm: Split slab into its own type") 401fb12c68c2 ("mm: Differentiate struct slab fields by sl*b implementations") 07f910f9b729 ("mm: Remove slab from struct page") Without the patch, "kmem -s\|-S" options cannot work correctly on kernels configured with CONFIG_SLAB with the following error: crash> kmem -s kmem: invalid structure member offset: page_active FILE: memory.c LINE: 12225 FUNCTION: verify_slab_overload_page() Resolves: https://github.com/crash-utility/crash/issues/115 Signed-off-by: xiaer1921 <xiaer1921@gmail.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-04-09 16:31:00 +08:00
Lianbo Jiang	8d49ad6662	Fix the failure of resolving ".rodata" on s390x The commit <cd8954023bd4> broke crash-utility on s390x and got the following error: crash: cannot resolve ".rodata" The reason is that all symbols containing a "." may be filtered out on s390x. To prevent the current failure, do not filter out the symbol ".rodata" on s390x. In addition, a simple way is to check whether the symbol ".rodata" exists before calculating the value of a symbol, just to be on the safe side. Fixes: `cd8954023b` ("kernel: fix start-up time degradation caused by strings command") Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-03-29 16:41:00 +08:00
HATAYAMA Daisuke	cd8954023b	kernel: fix start-up time degradation caused by strings command verify_namelist() uses strings command and scans full part of vmlinux file to find linux_banner string. However, vmlinux file is quite large these days, reaching over 500MB. As a result, this degradates start-up time of crash command 10 or more seconds. (Of course, this depends on machines you use for investigation, but I guess typically we cannot use such powerful machines to investigate crash dump...) To resolve this issue, let's use bfd library and read linux_banner string in vmlinux file directly. A simple benchmark shows the following result: Without the fix: # cat ./commands.txt quit # time ./crash -i ./commands.txt \ /usr/lib/debug/lib/modules/5.16.15-201.fc35.x86_64/vmlinux \ /var/crash//vmcore >/dev/null 2>&1 real 0m20.251s user 0m19.022s sys 0m1.054s With the fix: # time ./crash -i ./commands.txt \ /usr/lib/debug/lib/modules/5.16.15-201.fc35.x86_64/vmlinux \ /var/crash//vmcore >/dev/null 2>&1 real 0m6.528s user 0m6.143s sys 0m0.431s Note that this commit keeps the original logic that uses strings command for backward compatibility for in case. Signed-off-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>	2022-03-25 18:43:00 +08:00
Huang Shijie	8827424f2b	arm64: fix the seek error of "pud page" for live debugging Crash reported an error on kernel v5.7 when live debugging with the command "crash vmlinux /proc/kcore": "crash: seek error: kernel virtual address: ffff75e9fffff000 type: "pud page"" The reason is that the PTOV() and arm64_vtop_4level_4k() do not work as expected due to incorrect physvirt_offset. To fix the above issue, need to read out the virtual address of "physvirt_offset" from the "/proc/kallsyms", and update the ms->phys_offset which is initialized with a wrong value in kernel version [5.4, 5.10). Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-03-25 16:32:19 +08:00
Huang Shijie	49df472da9	arm64: fix the wrong vmemmap_end The VMEMMAP_END did not exist before the kernel v5.7, but for now, the value of vmemmap_end may be set to -1(0xffffffffffffffffUL). According to the arch/arm64/mm/dump.c (before kernel v5.7): .................................................. { VMEMMAP_START + VMEMMAP_SIZE, "vmemmap end" } .................................................. The vmemmap_end should always be: vmemmap_end = vmemmap_vaddr + vmemmap_size; This patch fixes the above issue. Fixes: `e397e1bef2` ("arm64: update the modules/vmalloc/vmemmap ranges") Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-03-25 13:22:05 +08:00
Huang Shijie	01689f3ee2	arm64: use the vmcore info to get module/vmalloc/vmemmap ranges Since the kernel commit <2369f171d5c5> ("arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges"), crash can obtain the range of module/vmalloc/vmemmap from the vmcore info, and no need to calculate them manually. This patch adds a new hook arm64_get_range_v5_18 which could parse out all the module/vmalloc/vmemmap ranges from the vmcore info. Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-03-25 12:40:31 +08:00
Huang Shijie	e397e1bef2	arm64: update the modules/vmalloc/vmemmap ranges Currently, the crash is implemented for arm64 based on kernel v4.20(and earlier), and so far the kernel has evolved to v5.17-rc4. But the ranges of MODULE/VMALLOC/VMEMMAP have not been updated since kernel v4.20. Without the patch: crash> help -m ... vmalloc_start_addr: ffff800048000000 vmalloc_end: fffffdffbffeffff modules_vaddr: ffff800040000000 modules_end: ffff800047ffffff vmemmap_vaddr: fffffdffffe00000 vmemmap_end: ffffffffffffffff ... With the patch: crash> help -m ... vmalloc_start_addr: ffff800010000000 vmalloc_end: fffffdffbffeffff modules_vaddr: ffff800008000000 modules_end: ffff80000fffffff vmemmap_vaddr: fffffdffffe00000 vmemmap_end: ffffffffffffffff ... Link: https://listman.redhat.com/archives/crash-utility/2022-March/009625.html Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-03-12 14:42:12 +08:00
Sergey Samoylenko	4cf262e237	sbitmap.c: use readmem more carefully Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com>	2022-03-12 14:03:01 +08:00
Sergey Samoylenko	7c7a4eddb4	Fix memory leak in __sbitmap_for_each_set function Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com>	2022-03-12 14:02:58 +08:00
Kazuhito Hagio	a92ff262d4	help.c: Fix a missing new line in "sbitmapq" help page Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-03-01 17:20:22 +09:00
Pingfan Liu	e3bdc32aab	arm64: deduce the start address of kernel code, based on kernel version After kernel commit e2a073dde921 ("arm64: omit [_text, _stext) from permanent kernel mapping"), the range [_text, _stext] is reclaimed. But the current crash code still assumes kernel starting from "_text". This change only affects the vmalloced area on arm64 and may result a false in arm64_IS_VMALLOC_ADDR(). Since vmcore has no extra information about this trival change, it can only be deduced from kernel version, which means ms->kimage_text can not be correctly initialized until kernel_init() finishes. Here on arm64, it can be done at the point machdep_init(POST_GDB). This is fine since there is no access to vmalloced area at this stage. Signed-off-by: Pingfan Liu <piliu@redhat.com>	2022-02-25 14:25:50 +08:00
Huang Shijie	8f19ddea50	Makefile: Change the behavior of target "cscope" Make the "make cscope" only generate cscope index, not call the cscope. Also fix a typo: cscope_out --> cscope.out Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-02-25 14:25:37 +08:00
Lianbo Jiang	c1f45f89dc	Fix sys command to display its help information correctly Sometimes, the sys command may be misused, but it doesn't display the expected help information, for example: Without the patch: crash> sys kmem NAME kmem - kernel memory SYNOPSIS kmem [-f\|-F\|-c\|-C\|-i\|-v\|-V\|-n\|-z\|-o\|-h] [-p \| -m member[,member]] [[-s\|-S\|-S=cpu[s]\|-r] [slab] [-I slab[,slab]]] [-g [flags]] [[-P] address]] ... crash> sys abc crash> With the patch: crash> sys kmem Usage: sys [-c [name\|number]] [-t] [-i] config Enter "help sys" for details. crash> sys abc Usage: sys [-c [name\|number]] [-t] [-i] config Enter "help sys" for details. Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-02-25 14:25:10 +08:00
Tao Liu	0260367da7	Makefile: crash multi-target and multithread compile support This patch will support making crash as follows: $ make -j8 warn lzo Without this patch, the "make -j jobs warn lzo" will output the following error during crash build: ... mv: cannot stat 'Makefile.new': No such file or directory Makefile: cannot create new Makefile please copy Makefile.new to Makefile make: * [Makefile:321: lzo] Error 1 make: * Waiting for unfinished jobs.... TARGET: X86_64 CRASH: 8.0.0++ GDB: 10.2 ... Signed-off-by: Tao Liu <ltao@redhat.com>	2022-02-23 19:22:52 +08:00
Tao Liu	b1fb3cdd87	x86_64_init: Refresh vmalloc region addresses in POST_RELOC instead of POST_GDB phase Previously for x86_64, when memory is randomized, the region addresses such as vmalloc_start_addr/vmemmap_vaddr/modules_vaddr are firstly set to a default value before POST_RELOC phase, then get refreshed with the actual value in POST_GDB phase. However for crash mininal mode, POST_GDB phase is not called, which leaving the region addresses unrefreshed and incorrect. As a consequence, the x86_64_IS_VMALLOC_ADDR check will give a faulty result when value_search tries to search a symbol by address. For example, in crash minimal mode we can observe the following issue: crash> dis -f panic dis: cannot resolve address: ffffffffb20e0d30 crash> sym panic ffffffffb20e0d30 (T) panic /usr/src/debug/kernel-4.18.0-290/linux-4.18.0-290/kernel/panic.c: 168 crash> sym ffffffffb20e0d30 symbol not found: ffffffffb20e0d30 In this patch, we will move the code which update the region addresses into POST_RELOC phase, so in mininal mode the regions can get the correct addresses. Signed-off-by: Tao Liu <ltao@redhat.com>	2022-02-22 14:06:14 +08:00
Sergey Samoylenko	fb64fdd11d	sbitmapq: add '-p' option The -p option says, an associated with sbitmap_queue array contains the pointers on a structure. This allows the sbitmapq command works correctly with the array of pointers attached to the sbitmap_queue. Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com>	2022-02-22 09:53:30 +08:00
Sergey Samoylenko	ac86cc3558	Introduce sbitmapq command Patch adds new 'sbitmapq' command. This command dumps the contents of the sbitmap_queue structure and the used bits in the bitmap. Also, it shows the dump of a structure array associated with the sbitmap_queue. Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com>	2022-02-22 09:53:27 +08:00
Huang Shijie	6ecb8a23ca	arm64: Use CONFIG_ARM64_VA_BITS to initialize VA_BITS_ACTUAL We can get VA_BITS_ACTUAL from CONFIG_ARM64_VA_BITS by guess. Without this patch, we may need to use "--machdep vabits_actual=48" to set the VA_BITS_ACTUAL. Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>	2022-02-22 09:48:29 +08:00
Shogo Matsumoto	3ed30b5128	log: support "log -t\|-m" option for output of printk safe buffers Suppress the output of safe buffer name with the "log -t" option and display the message log level with "log -m" option. Signed-off-by: Shogo Matsumoto <shogo.matsumoto@fujitsu.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-02-17 10:04:16 +09:00
Shogo Matsumoto	b0d447d78b	log: introduce "log -s" option to display printk safe buffers Introduce a new "log -s" option, which outputs unflushed logs in the printk safe buffers (safe_print_seq and nmi_print_seq) as follows: crash> log -s PRINTK_SAFE_SEQ_BUF: nmi_print_seq CPU: 0 ADDR: ffff8ca4fbc19ce0 LEN: 150 MESSAGE_LOST: 0 Uhhuh. NMI received for unknown reason 20 on CPU 0. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue ... The buffers are displayed for each CPU. For an empty buffer, '(empty)' will be printed. Also append those to the bottom of "log" command output so as not to overlook them like this: crash> log ... [nmi_print_seq] Uhhuh. NMI received for unknown reason 30 on CPU 0.", [nmi_print_seq] Do you have a strange power saving mode enabled?", [nmi_print_seq] Dazed and confused, but trying to continue", Note that the safe buffer (struct printk_safe_seq_buf) was introduced at kernel-4.11 (Merge commit 7d91de74436a69c2b78a7a72f1e7f97f8b4396fa) and removed at kernel-5.15 (93d102f094be9beab28e5afb656c188b16a3793b). Link: https://listman.redhat.com/archives/crash-utility/2022-January/msg00052.html Signed-off-by: Shogo Matsumoto <shogo.matsumoto@fujitsu.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-02-17 10:04:16 +09:00
Kazuhito Hagio	def34f57e8	Makefile: Fix build failure with "make -j jobs" option The "make -j jobs" option sometimes fails with an error like this: $ make clean ; make -j $(nproc) warn ... ar: creating crashlib.a CXXLD gdb /usr/bin/ld: ../../crashlib.a(main.o): in function `dump_build_data': /home/crash/main.c:1829: undefined reference to `build_command' /usr/bin/ld: /home/crash/main.c:1830: undefined reference to `build_data' collect2: error: ld returned 1 exit status make[4]: * [Makefile:1872: gdb] Error 1 make[3]: * [Makefile:10072: all-gdb] Error 2 make[2]: *** [Makefile:860: all] Error 2 crash build failed This is because build_data.c is compiled by two jobs and they write to build_data.o simultaneously and break it. Remove one of them. Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-02-16 11:51:15 +08:00
Sven Schnelle	74ac929712	Support for multiple jobs to build crash This patch saves compilation time for crash build, which did the following things: [1] add --no-print-directory to MAKEFLAGS right in the beginning to avoid repeating it in all make calls. [2] use "make -C" instead of "cd x; make" [3] replace make by $(MAKE) Link: https://listman.redhat.com/archives/crash-utility/2021-December/msg00049.html Link: https://listman.redhat.com/archives/crash-utility/2021-December/msg00048.html Link: https://listman.redhat.com/archives/crash-utility/2021-December/msg00047.html Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-02-16 11:48:06 +08:00
Lianbo Jiang	0a4434f4cb	Doc: update man page for the option "--src directory" The "--src directory" option information is missing from the man page of crash utility. Originally it was added by commit `9254c7f206` ("Added a new "--src <directory>"...), let's sync this option information to the man page. Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-02-15 10:08:17 +09:00
Lianbo Jiang	1ecb351309	Fix for "bpf -m\|-M" options to appropriately display MEMLOCK and UID Kernel commit 80ee81e0403c ("bpf: Eliminate rlimit-based memory accounting infra for bpf maps") removed the struct bpf_map_memory member from struct bpf_map at Linux 5.11. Without the patch, the "bpf -m\|-M" options will print "(unknown)" for MEMLOCK and UID: crash> bpf -m 1 ID BPF_MAP BPF_MAP_TYPE MAP_FLAGS 1 ffff96ba41804400 ARRAY 00000000 KEY_SIZE: 4 VALUE_SIZE: 8 MAX_ENTRIES: 64 MEMLOCK: (unknown) NAME: "dist" UID: (unknown) Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-02-15 09:48:58 +09:00
Kazuhito Hagio	5f390ed811	Fix for "kmem -s\|-S" and "bt -F[F]" on Linux 5.17-rc1 Since the following kernel commits split slab info from struct page into struct slab, crash cannot get several slab related offsets from struct page. d122019bf061 ("mm: Split slab into its own type") 07f910f9b729 ("mm: Remove slab from struct page") Without the patch, "kmem -s\|-S" and "bt -F[F]" options cannot work correctly with the following errors: crash> kmem -s kmem_cache CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME kmem: page_to_nid: invalid page: ffff9454afc35020 kmem: kmem_cache: cannot gather relevant slab data ffff945140042000 216 ? ? ? 8k kmem_cache crash> bt -F ... bt: invalid structure member offset: page_slab FILE: memory.c LINE: 9477 FUNCTION: vaddr_to_kmem_cache() Signed-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-02-07 11:58:23 +08:00
Kazuhito Hagio	dd35cf6fc5	arm64: Fix segfault by "bt" command with offline cpus Currently on arm64, NT_PRSTATUS notes in dumpfile are not mapped to online cpus and machine_specific->panic_task_regs correctly. As a result, the "bt" command can cause a segmentation fault. crash> bt -c 0 PID: 0 TASK: ffff8000117fa240 CPU: 0 COMMAND: "swapper/0" Segmentation fault (core dumped) To fix this, 1) make map_cpus_to_prstatus_kdump_cmprs() map the notes to dd->nt_prstatus_percpu also on arm64, and 2) move arm64_get_crash_notes() to machdep_init(POST_INIT) in order to apply the mapping to machine_specific->panic_task_regs. Resolves: https://github.com/crash-utility/crash/issues/105 Reported-by: xuchunmei000 <xuchunmei@linux.alibaba.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com> Tested-by: David Wysochanski <dwysocha@redhat.com>	2022-01-30 10:55:03 +08:00
Tao Liu	e389667cf6	Improve the ps performance for vmcores with large number of threads Previously, the ps command will iterate over all threads which have the same tgid, to accumulate their rss value, in order to get a thread/process's final rss value as part of the final output. For non-live systems, the rss accumulation values are identical for threads which have the same tgid, so there is no need to do the iteration and accumulation repeatly, thus a lot of readmem calls are skipped. Otherwise it will be the performance bottleneck if the vmcores have a large number of threads. In this patch, the rss accumulation value will be stored in a cache, next time a thread with the same tgid will take it directly without the iteration. For example, we can monitor the performance issue when a vmcore has ~65k processes, most of which are threads for several specific processes. Without the patch, it will take ~7h for ps command to finish. With the patch, ps command will finish in 1min. Signed-off-by: Tao Liu <ltao@redhat.com>	2022-01-28 18:16:12 +08:00
Lianbo Jiang	ce92e45850	GDB: fix completion related libstdc++ assert Currently crash built with some specific flags (-D_GLIBCXX_ASSERTIONS and etc.) may abort and print the following error when running the gdb list command or tab-completion of symbols. For example: crash> l panic /usr/include/c++/11/string_view:234: ... Aborted (core dumped) crash> p "TAB completion" crash> p /usr/include/c++/11/string_view:234: ... Aborted (core dumped) When the name string is null (the length of name is zero), there are multiple places where array access is out of bounds in the gdb/ada-lang.c (see ada_fold_name() and ada_lookup_name_info()). The patch backports these gdb patches: 6a780b676637 ("Fix completion related libstdc++ assert when using -D_GLIBCXX_DEBUG") 2ccee230f830 ("Fix off-by-one error in ada_fold_name") Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-01-27 14:54:33 +09:00
Kazuhito Hagio	2ebd8c5ecf	Remove ptype command from "ps -t" option to reduce memory and time With some vmlinux e.g. RHEL9 ones, the first execution of the gdb ptype command heavily consumes memory and time. The "ps -t" option uses it in start_time_timespec(), and it can be replaced with the crash macros. This can reduce about 1.4 GB memory and 6 seconds time comsumption in the following test: $ echo "ps -t" \| time crash vmlinux vmcore Without the patch: 11.60user 0.43system 0:11.94elapsed 100%CPU (0avgtext+0avgdata 1837964maxresident)k 0inputs+400outputs (0major+413636minor)pagefaults 0swaps With the patch: 5.40user 0.16system 0:05.46elapsed 101%CPU (0avgtext+0avgdata 417896maxresident)k 0inputs+384outputs (0major+41528minor)pagefaults 0swaps Although the ptype command and similar ones cannot be fully removed, but removing some of them will make the use of crash safer, especially for an automatic crash reporter. Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2022-01-21 16:28:33 +09:00
Lianbo Jiang	d16dc6fff0	Move the initialization of "boot_date" to task_init() The "boot_date" is initialized conditionally in the cmd_log(), which may display incorrect "boot_date" value with the following command before running the "log -T" command: crash> help -k \| grep date date: Wed Dec 22 13:39:29 IST 2021 boot_date: Thu Jan 1 05:30:00 IST 1970 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The calculation of "boot_date" depends on the HZ value, and the HZ will be calculated in task_init() at the latest, so let's move it here. Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-01-19 11:21:18 +08:00
Alexander Egorenkov	14f8c46047	memory: Handle struct slab changes on Linux 5.17-rc1 and later Since kernel commit d122019bf061 ("mm: Split slab into its own type"), the struct slab is used for both SLAB and SLUB. Therefore, don't depend on the non-presence of the struct slab to decide whether SLAB implementation should be chosen and use the member variable "cpu_slab" of the struct kmem_cache instead, it should be present only in SLUB. Without the patch, crash fails to start with the error message: crash: invalid structure member offset: kmem_cache_s_num FILE: memory.c LINE: 9619 FUNCTION: kmem_cache_init() Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>	2022-01-12 09:23:17 +08:00
Lianbo Jiang	b9dc76e232	Fix for HZ calculation on Linux 5.14 and later Kernel commit 3e9a99eba058 ("block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()") renamed dd_init_queue to dd_init_sched. Without the patch, the 'help -m' may print incorrect hz value as follows: crash> help -m \| grep hz hz: 1000 <---The correct hz value on ppc64le machine is 100. ^^^^ Fixes: `b93027ce5c` ("Add alternate HZ calculation using write_expire") Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-01-11 19:19:02 +08:00
Lianbo Jiang	0d3d80b47d	Fix for "bt -v" option to display the stack-end address correctly The "bt -v" command prints incorrect stack-end address when the "CONFIG_THREAD_INFO_IN_TASK=y" is enabled in kernel, the "bt -v" command output shows that the value stored at 0xffff8dee0312c198 is 0xffffffffc076400a, however, the value stored actually at 0xffff8dee0312c198 is NULL(0x0000000000000000), the stack-end address is incorrect. Without the patch: crash> bt -v PID: 28642 TASK: ffff8dee0312c180 CPU: 0 COMMAND: "insmod" possible stack overflow: ffff8dee0312c198: ffffffffc076400a != STACK_END_MAGIC ^^^^^^^^^^^^^^^^ crash> rd 0xffff8dee0312c198 ffff8dee0312c198: 0000000000000000 ........ ^^^^^^^^^^^^^^^^ With the patch: crash> bt -v PID: 28642 TASK: ffff8dee0312c180 CPU: 0 COMMAND: "insmod" possible stack overflow: ffff991340bc0000: ffffffffc076400a != STACK_END_MAGIC crash> rd 0xffff991340bc0000 ffff991340bc0000: ffffffffc076400a .@v..... Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-01-11 18:40:03 +08:00
Lianbo Jiang	70a27ae9f2	Fix for "timer -r" option to display all the per-CPU clocks Currently, the hrtimer_max_clock_bases is hard-coded to 3, which makes that crash only prints three clocks, and the rest of clocks are not displayed. Without the patch: crash> timer -r -C 11 CPU: 11 HRTIMER_CPU_BASE: ffff9a775f95ee00 CLOCK: 0 HRTIMER_CLOCK_BASE: ffff9a775f95ee80 [ktime_get] (empty) CLOCK: 1 HRTIMER_CLOCK_BASE: ffff9a775f95ef00 [ktime_get_real] (empty) CLOCK: 2 HRTIMER_CLOCK_BASE: ffff9a775f95ef80 [ktime_get_boottime] (empty) With the patch: crash> timer -r -C 11 CPU: 11 HRTIMER_CPU_BASE: ffff9a775f95ee00 CLOCK: 0 HRTIMER_CLOCK_BASE: ffff9a775f95ee80 [ktime_get] (empty) CLOCK: 1 HRTIMER_CLOCK_BASE: ffff9a775f95ef00 [ktime_get_real] (empty) CLOCK: 2 HRTIMER_CLOCK_BASE: ffff9a775f95ef80 [ktime_get_boottime] (empty) ... CLOCK: 7 HRTIMER_CLOCK_BASE: ffff9a775f95f200 [ktime_get_clocktai] (empty) Signed-off-by: Lianbo Jiang <lijiang@redhat.com>	2022-01-10 11:05:46 +08:00
Lianbo Jiang	98b417fc63	Handle blk_mq_ctx member changes for kernels 5.16-rc1 and later Kernel commit 9a14d6ce4135 ("block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes") removed the member rq_dispatched and rq_completed from struct blk_mq_ctx. Without the patch, "dev -d\|-D" options will fail with the following error: crash> dev -d MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC dev: invalid structure member offset: blk_mq_ctx_rq_dispatched FILE: dev.c LINE: 4229 FUNCTION: get_one_mctx_diskio() Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2021-12-28 11:18:56 +08:00
Qi Zheng	7eba220e1a	Fix pvops Xen detection for arm machine Since the xen_start_info on the arm/arm64 platform points to a static variable '_xen_start_info'(see its definition as below), which makes that the address of xen_start_info will never be null. arch/arm/xen/enlighten.c:40:static struct start_info _xen_start_info; arch/arm/xen/enlighten.c:41:struct start_info *xen_start_info = &_xen_start_info; arch/arm/xen/enlighten.c:42:EXPORT_SYMBOL(xen_start_info); As a result, the is_pvops_xen() in commit `4badc6229c` ("Fix pvops Xen detection for kernels >= v4.20") always returns TRUE because it can always read out the non-null address of xen_start_info, finally the following error will be reported on arm/arm64 platform(non-Xen environment) because p2m_mid_missing and xen_p2m_addr are not defined: crash: cannot resolve "p2m_top" For the arm/arm64 platform, fix it by using xen_vcpu_info instead of xen_start_info to detect Xen dumps. In addition, also explicitly narrow the scope of the xen_start_info check to x86 with the machine_type(), there is no need to check it on other architectures. Fixes: `4badc6229c` ("Fix pvops Xen detection for kernels >= v4.20") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com>	2021-12-21 16:31:34 +08:00

1 2 3 4 5 ...

941 Commits