crash

mirror of https://github.com/crash-utility/crash synced 2025-02-24 17:36:50 +00:00

Author	SHA1	Message	Date
Dave Anderson	e4cc9e7faf	Fix for the X86_64 "bt" and "mach" commands when running against kernels that have the following Linux 3.18 commit, which removes the special per-cpu exception stack for handling stack segment faults: commit 6f442be2fb22be02cafa606f1769fa1e6f894441 x86_64, traps: Stop using IST for #SS Without this patch, backtraces that originate on any of the other 4 per-cpu exception stacks will be mis-labeled at the transition point back to the previous stack. For example, backtraces that that originate on the NMI stack will indicate that they are coming from the "DOUBLEFAULT" stack. The patch examines all idt_table entries during initialization, looking for gate descriptors that have non-zero index values, and when found, pulls out out the handler function address; from that information, the exception stack name string array is properly initialized rather than being hard-coded. This fix also properly labels the exception stack names on x86_64 CONFIG_PREEMPT_RT realtime kernels, which only utilize 3 exception stacks instead of the traditional 5 (now 4 with this kernel commit), instead of just showing "RT". Also, without the patch, the "mach" command will mis-label the stack names when it displays the base addresses of each per-cpu exception stack. (anderson@redhat.com)	2014-12-15 15:23:52 -05:00
Dave Anderson	361bdc2fc4	Added a new "vm -M <mm_struct>" option. When a task is exiting, the mm_struct address pointer in its task_struct is NULL'd out, and as a result, the "vm" command looks like this: crash> vm PID: 4563 TASK: ffff88049863f500 CPU: 8 COMMAND: "postgres" MM PGD RSS TOTAL_VM 0 0 0k 0k However, the mm_struct address can be retrieved from the task's kernel stack and entered manually with this option, which allows the "vm" command to attempt to dump the virtual memory data of the task. It may, or may not, work, depending upon how far the virtual memory deconstruction has proceeded. This option only verifies that the address entered is from the "mm_struct" slab cache, and that its mm_struct.mm_count is non-zero. (qiaonuohan@cn.fujitsu.com, anderson@redhat.com)	2014-12-11 17:01:50 -05:00
Dave Anderson	8b2cb365d7	Fix for the "kmem [-s\|-S] <address>" command, and the "rd -S[S]" and "bt -F[F]" options. Without the patch, if the page structure associated with a memory address still contains a (stale) pointer to the address of a kmem_cache structure, but whose page.flags does not have the PG_slab bit set, the address is incorrectly presumed to be contained within that slab cache. As as result, the "kmem" command may display one or more messages indicating a "bad inuse counter", a "bad next pointer" or a "bad s_mem pointer", followed by an "address not found in cache" error message. The "rd -S[S]" and "bt -F[F]" commands may mislabel memory locations as belonging to slab caches. (anderson@redhat.com)	2014-12-10 15:04:40 -05:00
Dave Anderson	2562642c5f	Enhancement of the "kmem -i" option to display memory overcommit information, which will be appended to the traditional output of the command. For example: crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 1965332 7.5 GB ---- FREE 78080 305 MB 3% of TOTAL MEM USED 1887252 7.2 GB 96% of TOTAL MEM SHARED 789954 3 GB 40% of TOTAL MEM BUFFERS 110606 432.1 MB 5% of TOTAL MEM CACHED 1212645 4.6 GB 61% of TOTAL MEM SLAB 146563 572.5 MB 7% of TOTAL MEM TOTAL SWAP 1970175 7.5 GB ---- SWAP USED 5 20 KB 0% of TOTAL SWAP SWAP FREE 1970170 7.5 GB 99% of TOTAL SWAP COMMIT LIMIT 2952841 11.3 GB ---- COMMITTED 1150595 4.4 GB 38% of TOTAL LIMIT The COMMIT LIMIT and COMMITTED information is similar to that displayed by the CommitLimit and Committed_AS lines in /proc/meminfo. (atomlin@redhat.com)	2014-12-09 12:35:40 -05:00
Dave Anderson	8698aedb9a	Fix for the kernel virtual address to symbol name translation for special text region delimiter symbols declared in vmlinux.lds.S with VMLINUX_SYMBOL(), such as __sched_text_start, __lock_text_start, __kprobes_text_start, __entry_text_start and __irqentry_text_start. Without the patch, if the addresses of those symbols are the same value as the first "real" symbol in those text regions, commands such as "dis" and "sym" may show the "_text_start" symbol name instead of the desired text symbol name. (qiaonuohan@cn.fujitsu.com, anderson@redhat.com)	2014-12-09 10:50:50 -05:00
Dave Anderson	c64fc95e3e	Implemented a new "net -n [pid\|task]" option that displays the list of network devices with respect the network namespace of the current context, or that of a task specified by the optional "pid" or "task" argument. The former "net -n <address>" option that translates an IPv4 address expressed as a decimal or hexadecimal value into a standard numbers-and-dots notation has been changed to "net -N". (ws@parallels.com)	2014-12-05 15:00:42 -05:00
Dave Anderson	964173c343	Fix for the "waitq" command when it is passed the address of a wait_queue_head_t structure. Without the patch, if the entries on the list are dynamically-created __wait_queue structures on kernel stacks, the tasks owning the kernel stack are not displayed. (anderson@redhat.com)	2014-12-04 16:45:28 -05:00
Dave Anderson	b4af1d9b48	Fix for finding the starting stack and instruction pointer hooks for the active tasks in x86_64 ELF or compressed dumpfiles created by the KVM "virsh dump --memory-only" facility. Without the patch, the backtraces of active tasks may show an invalid starting frame that indicates "__schedule". The fix displays the exception RIP and dumps the register contents that are stored in the dumpfile header. If the active task was operating in the kernel, the backtrace continues from there; if the task was operating in user-space, the backtrace is complete at that point. (anderson@redhat.com)	2014-12-02 17:26:40 -05:00
Dave Anderson	7e5c0cedef	Fix for a misleading fatal error message if a 32-bit crash binary built on an X86_64 host with "make target=X86" or "make target=ARM" is used on a live X86_64 system without specifying a vmlinux namelist. Without the patch, the session fails with the message "crash: cannot find booted kernel -- please enter namelist argument". The error message will be "crash: compiled for the X86 architecture" or "crash: compiled for the ARM architecture". (anderson@redhat.com)	2014-11-21 15:19:20 -05:00
Dave Anderson	5054415b99	Fix for the "crash --log <dumpfile>" option on both of the PPC64 architectures. Without the patch, the command fails with the message "crash: seek error: physical address: <address> type: log_buf pointer", followed by "crash: cannot read log_buf value". This bug was introduced in crash-7.0.0 by a patch that added support for the PPC64 BOOK3E processor family. (anderson@redhat.com)	2014-11-20 15:34:11 -05:00
Dave Anderson	d5d022d9fb	Implemented the capability of building crash as an x86_64 binary for analyzing little-endian PPC64 dumpfiles on an x86_64 host, which can be done by entering "make target=PPC64". After the initial build is complete, subsequent builds can be done by entering "make" alone. (anderson@redhat.com)	2014-11-20 14:52:36 -05:00
Dave Anderson	ffe155026b	Fix for the handling of multiple ramdump images. Without the patch, entering more than one ramdump image on the command line may result in a segmentation violation. (oza@broadcom.com)	2014-11-18 14:26:19 -05:00
Dave Anderson	f15a48817f	Support for "irq" and "irq -u" on the S390 and S390X architectures if they are running Linux 3.12 and later kernels. Older kernels without GENERIC_HARDIRQ support will fail with the error message "irq: cannot determine number of IRQs". (sebott@linux.vnet.ibm.com)	2014-11-17 13:48:21 -05:00
Dave Anderson	8cccbed4cb	crash-7.0.8 -> crash-7.0.9	2014-11-13 15:53:08 -05:00
Dave Anderson	51e17d89d7	Fix for the support of compressed kdump clones created with the KVM "virsh dump --memory-only --format <compression-type>" command, where the compression-type is either "kdump-zlib", "kdump-lzo" or "kdump-snappy". Without the patch, if an x86_64 guest kernel was loaded with a non-zero "phys_base", the "--machdep phys_base=<offset>" command line option was required as a workaround or the crash session would fail with the warning message "WARNING: cannot read linux_banner string" followed by the fatal error message "crash: vmlinux and <dumpfile name> do not match!". (anderson@redhat.com)	2014-11-13 14:40:54 -05:00
Dave Anderson	c964753dde	Cosmetic fix for the "help -[n\|D]" translation of the bitmap contents of the kdump_sub_header.dump_level flag in compressed kdump dumpfiles. (anderson@redhat.com)	2014-11-12 11:16:19 -05:00
Dave Anderson	1f7acceb50	Implementation of a new "sys -t" option that displays kernel taint information. If the "tainted_mask" symbol exists, the option will show its hexadecimal value and translate each bit set to the symbolic letter of the taint type. On kernels prior to 2.6.28 which had the "tainted" symbol, only its hexadecimal value is shown. The relevant kernel sources should be consulted for the meaning of the letter(s) or hexadecimal bit value(s). (anderson@redhat.com)	2014-11-11 14:04:48 -05:00
Dave Anderson	eb73907e70	Implemented support for this Linux 3.18 commit for kernels that are configured with CONFIG_SLAB: commit bf0dea23a9c094ae869a88bb694fbe966671bf6d mm/slab: use percpu allocator for cpu cache The commit above redesigned the kmem_cache.array_cache[] from a hardwired array to a per-cpu pointer referencing external array_cache structures. Without the patch, the crash session would fail during initialization with the message "crash: cannot resolve cache_cache". Note that it could be worked around by using the "--no_kmem_cache" command line option, with a resulting loss of functionality for commands requiring slab-related data. (anderson@redhat.com)	2014-10-31 11:48:14 -04:00
Dave Anderson	2d1aaa687c	If a kernel has been configured with CONFIG_DEBUG_INFO_REDUCED, then the crash utility will fail to initialize, typically with a message indicating "no debugging data available". However, it has been reported (on a 32-bit ARM system) that the initialization sequence continued on beyond that message point, and the session failed later on with the message "neither runqueue nor rq structures exist". As an aid to understanding why the session failed, if the target kernel is configured with CONFIG_IKCONFIG, and CONFIG_DEBUG_INFO_REDUCED has been set to "y", a relevant warning message will be displayed. (anderson@redhat.com)	2014-10-30 14:13:42 -04:00
Dave Anderson	045c00ac34	Added recognition of the new DUMP_DH_COMPRESSED_INCOMPLETE flag in the header of compressed kdumps, and the new DUMP_ELF_INCOMPLETE flag in the header of ELF kdumps. If the makedumpfile(8) facility fails to complete the creation of compressed or ELF kdump vmcore files due to ENOSPC or other error, it will mark the vmcore as incomplete. If either flag is set, the crash utility will issue a warning that the dumpfile is known to be incomplete during initialization, just prior to the system banner display. When reads are attempted on missing data, a read error will be returned. As an alternative, zero-filled data will be returned if the "--zero_excluded" command line flag is used, or the "zero_excluded" runtime variable is set to "on". In either case, the read errors or zero-filled memory may cause the crash session to fail entirely, cause commands to fail, or may result in other unpredictable runtime behavior. (anderson@redhat.com, zhouwj-fnst@cn.fujitsu.com)	2014-10-30 10:42:38 -04:00
Dave Anderson	1a5e568cc7	Correction of the copyright and authorship of ramdump.c. (oza@broadcom.com)	2014-10-30 09:55:15 -04:00
Dave Anderson	3edf484e29	Fix for data access from "split" compressed kdump dumpfiles. Without the patch, if a dumpfile read targets physical memory in the first memory page stored in the second or later sequential split dumpfile, incorrect data will be returned. (qiaonuohan@cn.fujitsu.com)	2014-10-29 11:35:17 -04:00
Dave Anderson	8095fe408b	Fix for a SIGSEGV generated by the "bt -a" or "help -r" commands if the NT_PRSTATUS notes in a compressed kdump are invalid/corrupt. If all cpus are online but the dumpfile initialization that cycles through the NT_PRSTATUS notes does not find exactly one note per cpu, then the register contents in those notes should not be used. (anderson@redhat.com)	2014-10-22 16:29:26 -04:00
Dave Anderson	187cb0c09a	Fix for a possible SIGSEGV generated during session initialization while "please wait... (determining panic task)" is being displayed. This was caused by a patch introduced in crash-7.0.8, and can only happen when analyzing dumpfiles whose header does not contain the requisite information to determine the panic task and the active tasks do not have any crash-related traces in their kernel stacks. It should be noted that the SIGSEGV can be avoided by entering "--no_panic" on the crash command line. (anderson@redhat.com)	2014-10-22 14:18:35 -04:00
Dave Anderson	3e61decc7e	Fix for the determination of the cpu count on ARM64 machines. Without the patch, if certain patterns of cpus are offline, the count may be too small, causing cpu-dependent commands to not recognize online cpus. (Jan.Karlsson@sonymobile.com, anderson@redhat.com)	2014-10-17 10:30:11 -04:00
Dave Anderson	86aa3c1cef	Fix for a missing exception frame dump by the X86_64 "bt" command when an IRQ is received while a task is running on its per-cpu interrupt stack with interrupts enabled. (anderson@redhat.com)	2014-10-16 12:10:42 -04:00
Dave Anderson	25bd7d9bf2	Fix for the determination of the cpu count on 32-bit ARM machines. Without the patch, if certain patterns of cpus are offline, the count may be too small, causing cpu-dependent commands to not recognize online cpus. (Jan.Karlsson@sonymobile.com, anderson@redhat.com)	2014-10-16 09:56:05 -04:00
Dave Anderson	0c0f2e7440	Make the "bt -E" option conform to a "-c cpu(s)" specification when the the two options are used together. Without the patch, "bt -E" ignores a cpu specifier. (anderson@redhat.com)	2014-10-15 13:30:29 -04:00
Dave Anderson	00cfb79c04	Adjustment to the "offline" patch-set to make the initial system banner, the "sys" command, and the X86_64 "mach" command, to only show the "OFFLINE" cpu count if there are actually offline cpus. (anderson@redhat.com)	2014-10-15 10:07:29 -04:00
Dave Anderson	d5b362edf7	Implement a new "offline" internal crash variable that can be set to either "show" (the default) or "hide". When set to "hide", certain command output associated with offline cpus will be hidden from view, and the output will indicate that the cpu is "[OFFLINE]". The new variable can be set during invocation on the crash command line via the option "--offline [show\|hide]". During runtime, or in a .crashrc or other crash input file, the variable can be set by entering "set offline [show\|hide]". The commands or options that are affected when the variable is set to "hide" are as follows: o On X86_64 machines, the "bt -E" option will not search exception stacks associated with offline cpus. o On X86_64 machines, the "mach" command will append "[OFFLINE]" to the addresses of IRQ and exception stacks associated with offline cpus. o On X86_64 machines, the "mach -c" command will not display the cpuinfo_x86 data structure associated with offline cpus. o The "help -r" option has been fixed so as to not attempt to display register sets of offline cpus from ELF kdump vmcores, compressed kdump vmcores, and ELF kdump clones created by "virsh dump --memory-only". o The "bt -c" option will not accept an offline cpu number. o The "set -c" option will not accept an offline cpu number. o The "irq -s" option will not display statistics associated with offline cpus. o The "timer" command will not display hrtimer data associated with offline cpus. o The "timer -r" option will not display hrtimer data associated with offline cpus. o The "ptov" command will append "[OFFLINE]" when translating a per-cpu address offset to a virtal address of an offline cpu. o The "kmem -o" option will append "[OFFLINE]" to the base per-cpu virtual address of an offline cpu. o The "kmem -S" option in CONFIG_SLUB kernels will not display per-cpu data associated with offline cpus. o When a per-cpu address reference is passed to the "struct" command, the data structure will not be displayed for offline cpus. o When a per-cpu symbol and cpu reference is passed to the "p" command, the data will not be displayed for offline cpus. o When the "ps -[l\|m]" option is passed the optional "-C [cpus]" option, the tasks queued on offline cpus are not shown. o The "runq" command and the "runq [-t/-m/-g/-d]" options will not display runqueue data for offline cpus. o The "ps" command will replace the ">" active task indicator to a "-" for offline cpus. The initial system information banner and the "sys" command will display the total number of cpus as before, but will append the count of offline cpus. Lastly, a fix has been made for the initialization time determination of the maximum number of per-cpu objects queued in a CONFIG_SLAB kmem_cache so as to continue checking all cpus higher than the first offline cpu. These changes in behavior are not dependent upon the setting of the crash "offline" variable. (qiaonuohan@cn.fujitsu.com)	2014-10-06 15:32:37 -04:00
Dave Anderson	a234add3c4	Fortify the protection against the use of an invalid/corrupted CONFIG_SLAB kmem_cache per-cpu array_cache.limit value during session initialization. In a recently seen vmcore, several of the array_cache.limit values were corrupted such that they were stored as negative values, which in turn caused the "kmem -[sS]" options to fail immediately with a dump of the internal memory buffer allocation statistics and the error message "kmem: cannot allocate any more memory!". (anderson@redhat.com)	2014-10-02 11:19:04 -04:00
Dave Anderson	4c0a1b34d4	Update the "ps" command's "ST" task state display to recognize the TASK_PARKED state in Linux 3.9 and later kernels. Without the patch, the command's "ST" column entry for parked tasks shows "??". The state column will now show "PA", and the foreach command will accept "PA" as a "state" argument. (anderson@redhat.com)	2014-09-30 11:07:46 -04:00
Dave Anderson	5b78ac4071	Fix the error message displayed if the vmlinux or vmcore file is not the same endian as the crash utility binary. Without the patch the filename is shown with the incorrect/opposite endian type. (hukeping@huawei.com)	2014-09-30 10:02:05 -04:00
Dave Anderson	5da8ffe605	Set the 32-bit ARM HZ value to a default value of 100 if the kernel was not configured with CONFIG_IKCONFIG. Without the patch, the initial system banner and the "sys" command show "UPTIME: (cannot calculate: unknown HZ value)", the "ps -t" option shows "RUN TIME: (cannot calculate: unknown HZ value)", and the "timer -r" option kills the crash session with a floating point exception. (hukeping@huawei.com)	2014-09-29 11:33:27 -04:00
Dave Anderson	a3a441aeab	Fix for the "ps" command performance degradation patch the was introduced in crash-7.0.8. Without this patch, it is possible that the "ps" command may fail prematurely with the error message "ps: bsearch for tgid failed: task: <address> tgid: <number>" when running on a live system or against a "live" dumpfile. (panfy.fnst@cn.fujitsu.com)	2014-09-22 16:25:16 -04:00
Dave Anderson	506b3caf29	Fix "defs.h" for building extension modules outside of the crash utility source tree on PPC and PPC64 machines. Without the patch, both PPC and PPC64 will get #define'd if the extension module build procedure does not #define one or the other, which in turn causes multiple conflicting declarations. (anderson@redhat.com)	2014-09-22 16:02:05 -04:00
Dave Anderson	8185107da8	Improve the method for determining whether a 32-bit ARM vmlinux is an LPAE enabled kernel by first checking whether CONFIG_ARM_LPAE exists in the vmcoreinfo data, and if it does not, by then checking whether the next higher symbol above "swapper_pg_dir" is 0x5000 bytes higher in value. (sdu.liu@huawei.com)	2014-09-22 14:37:17 -04:00
Dave Anderson	c6afa51af3	Update the "extensions/snap.mk" file to allow the "snap.so" extension module to be built outside of a crash source tree on a ppc64le PPC64 little-endian host. Without the patch, "make -f snap.mk" would fail to compile, indicating "gcc: error: macro name missing after '-D'" (anderson@redhat.com)	2014-09-22 14:09:43 -04:00
Dave Anderson	62b294b27c	Fix for the one-time (dumpfile), or as-required (live system), gathering of tasks from the kernel pid_hash[] in 2.6.24 and later kernels. Without the patch, if an entry in a pid_hash[] chain is not related to the "init_pid_ns" pid_namespace structure, any remaining entries in the hlist chain are skipped. (vvs@parallels.com)	2014-09-19 14:20:57 -04:00
Dave Anderson	4010619625	Addressed 3 Coverity Scan issues: (1) task.c: initialize the "curr" and "curr_my_q" variables in the dump_tasks_in_task_group_cfs_rq() function. (2) ramdump.c: make the "rd" and "len" return values from read() and write() calls in write_elf() to be ssize_t types. (3) cmdline.c: make the parsed PATH string buffer equal to the size of the PATH string + 1 to prevent a possible buffer overflow when a command line starts with a "!". (anderson@redhat.com)	2014-09-18 13:27:45 -04:00
Dave Anderson	68c3828210	Add "/lib/modules/<version>/build" to the list of directories that are searched for the currently-running kernel on live systems. This will automatically locate the vmlinux namelist for kernels that were locally installed with "make modules_install install". (lrintel@redhat.com)	2014-09-12 15:37:40 -04:00
Dave Anderson	df8d23ff21	Fix the CPU timer and clock comparator output for the "bt -a" command on S390X machines. The output of CPU timer and clock comparator has always been incorrect because: - We added S390X_WORD_SIZE (8) instead of 4 to get the second word - We did not left shift the clock comparator by 8 The fix gets the complete 64 bit values and by shifting the clock comparator correctly. (holzheu@linux.vnet.ibm.com)	2014-09-12 15:13:25 -04:00
Dave Anderson	1aeeb2a5ae	crash-7.0.7 -> crash-7.0.8	2014-09-11 14:23:21 -04:00
Dave Anderson	f0c5229269	Address a "ps" command performance degradation that was introduced by a crash-7.0.4 patch which added per-thread task_struct.rss_stat page counts to the task's mm_struct.rss_stat page counts in order to show an accurate/synchronized RSS value. Without the patch, the "ps" command performance would degrade as the number of tasks increased, most notably when there were thousands of tasks. (panfy.fnst@cn.fujitsu.com, anderson@redhat.com)	2014-09-11 11:31:14 -04:00
Dave Anderson	fcd4a192d5	Maintain backwards-compatibility for "kvmdump" dumpfiles that were created by older development versions of KVM tools in which the cpu version id was 12, but the cpu device headers did not contain the additional XSAVE related fields. (uobergfe@redhat.com)	2014-09-09 14:27:29 -04:00
Dave Anderson	fce4684d04	Fix for SMP active task register-gathering from "kvmdump" dumpfiles that were created with a cpu version id of 12 or greater that contain additional XSAVE related fields in their cpu device headers. Without the patch, active tasks running on cpus above 0 may have truncated backtraces. (uobergfe@redhat.com)	2014-09-09 10:50:03 -04:00
Dave Anderson	f64b1a5954	Implement support for the ppc64le PPC64 little-endian architecture. Since this required a large number of patches to be applied to architecture-neutral files in the gdb-7.6 tree, the changes are only applied if the host build system is a ppc64le. (ptesarik@suse.cz, normand@linux.vnet.ibm.com)	2014-09-05 10:34:10 -04:00
Dave Anderson	dc53849af7	Fortify the validity verification of the data structures traversed by the "kmem [-sS]" options for kernels configured with CONFIG_SLUB. Without the patch, the contents of several structure members are not validated, and may generate bogus or never-ending output, typically seen when running the commands on a "live dump" where the dumpfile was taken while the kernel was still running. The patch aborts the relevant parts of per-kmem_cache output when invalid data is encountered or if an object list contains duplicate entries, and error messages have been enhanced to more accurately describe the issues encountered. (anderson@redhat.com)	2014-09-04 16:50:52 -04:00
Dave Anderson	e7fcb3a35b	On a live system during session initialization, delay the first read error message (typically when reading the "cpu_possible_mask") until it is confirmed that all of the following are true: (1) /dev/crash does not exist, and (2) /dev/mem is restricted via CONFIG_STRICT_DEVMEM, and (3) /proc/kcore cannot be read/accessed. The "kernel may be configured with CONFIG_STRICT_DEVMEM" and the "trying /proc/kcore as an alternative" messages will still be displayed when appropriate. The read error message be displayed only if all three live memory read options fail. (anderson@redhat.com)	2014-08-12 14:57:20 -04:00
Dave Anderson	de3daee5ee	Fix to recognize that the live system "crash.ko" memory driver may be compressed and named "crash.ko.xz". Without the patch, the driver is not recognized and loaded, and as a result the /dev/mem driver and/or /proc/kcore will be tried as the live memory source. (anderson@redhat.com)	2014-08-12 11:15:49 -04:00

1 2 3

141 Commits