Commit Graph

116 Commits

Author SHA1 Message Date
Dave Anderson
86aa3c1cef Fix for a missing exception frame dump by the X86_64 "bt" command
when an IRQ is received while a task is running on its per-cpu
interrupt stack with interrupts enabled.
(anderson@redhat.com)
2014-10-16 12:10:42 -04:00
Dave Anderson
25bd7d9bf2 Fix for the determination of the cpu count on 32-bit ARM machines.
Without the patch, if certain patterns of cpus are offline, the count
may be too small, causing cpu-dependent commands to not recognize
online cpus.
(Jan.Karlsson@sonymobile.com, anderson@redhat.com)
2014-10-16 09:56:05 -04:00
Dave Anderson
0c0f2e7440 Make the "bt -E" option conform to a "-c cpu(s)" specification when
the the two options are used together.  Without the patch, "bt -E"
ignores a cpu specifier.
(anderson@redhat.com)
2014-10-15 13:30:29 -04:00
Dave Anderson
00cfb79c04 Adjustment to the "offline" patch-set to make the initial system
banner, the "sys" command, and the X86_64 "mach" command, to only
show the "OFFLINE" cpu count if there are actually offline cpus.
(anderson@redhat.com)
2014-10-15 10:07:29 -04:00
Dave Anderson
d5b362edf7 Implement a new "offline" internal crash variable that can be set to
either "show" (the default) or "hide".  When set to "hide", certain
command output associated with offline cpus will be hidden from view,
and the output will indicate that the cpu is "[OFFLINE]".  The new
variable can be set during invocation on the crash command line via
the option "--offline [show|hide]".  During runtime, or in a .crashrc
or other crash input file, the variable can be set by entering
"set offline [show|hide]".  The commands or options that are affected
when the variable is set to "hide" are as follows:

  o  On X86_64 machines, the "bt -E" option will not search exception
     stacks associated with offline cpus.
  o  On X86_64 machines, the "mach" command will append "[OFFLINE]"
     to the addresses of IRQ and exception stacks associated with
     offline cpus.
  o  On X86_64 machines, the "mach -c" command will not display the
     cpuinfo_x86 data structure associated with offline cpus.
  o  The "help -r" option has been fixed so as to not attempt to
     display register sets of offline cpus from ELF kdump vmcores,
     compressed kdump vmcores, and ELF kdump clones created by
     "virsh dump --memory-only".
  o  The "bt -c" option will not accept an offline cpu number.
  o  The "set -c" option will not accept an offline cpu number.
  o  The "irq -s" option will not display statistics associated with
     offline cpus.
  o  The "timer" command will not display hrtimer data associated
     with offline cpus.
  o  The "timer -r" option will not display hrtimer data associated
     with offline cpus.
  o  The "ptov" command will append "[OFFLINE]" when translating a
     per-cpu address offset to a virtal address of an offline cpu.
  o  The "kmem -o" option will append "[OFFLINE]" to the base per-cpu
     virtual address of an offline cpu.
  o  The "kmem -S" option in CONFIG_SLUB kernels will not display
     per-cpu data associated with offline cpus.
  o  When a per-cpu address reference is passed to the "struct"
     command, the data structure will not be displayed for offline
     cpus.
  o  When a per-cpu symbol and cpu reference is passed to the "p"
     command, the data will not be displayed for offline cpus.
  o  When the "ps -[l|m]" option is passed the optional "-C [cpus]"
     option, the tasks queued on offline cpus are not shown.
  o  The "runq" command and the "runq [-t/-m/-g/-d]" options will not
     display runqueue data for offline cpus.
  o  The "ps" command will replace the ">" active task indicator to
     a "-" for offline cpus.

The initial system information banner and the "sys" command will
display the total number of cpus as before, but will append the count
of offline cpus.  Lastly, a fix has been made for the initialization
time determination of the maximum number of per-cpu objects queued
in a CONFIG_SLAB kmem_cache so as to continue checking all cpus
higher than the first offline cpu.  These changes in behavior are not
dependent upon the setting of the crash "offline" variable.
(qiaonuohan@cn.fujitsu.com)
2014-10-06 15:32:37 -04:00
Dave Anderson
a234add3c4 Fortify the protection against the use of an invalid/corrupted
CONFIG_SLAB kmem_cache per-cpu array_cache.limit value during
session initialization.  In a recently seen vmcore, several of the
array_cache.limit values were corrupted such that they were stored
as negative values, which in turn caused the "kmem -[sS]" options
to fail immediately with a dump of the internal memory buffer
allocation statistics and the error message "kmem: cannot allocate
any more memory!".
(anderson@redhat.com)
2014-10-02 11:19:04 -04:00
Dave Anderson
4c0a1b34d4 Update the "ps" command's "ST" task state display to recognize the
TASK_PARKED state in Linux 3.9 and later kernels.  Without the patch,
the command's "ST" column entry for parked tasks shows "??".  The
state column will now show "PA", and the foreach command will accept
"PA" as a "state" argument.
(anderson@redhat.com)
2014-09-30 11:07:46 -04:00
Dave Anderson
5b78ac4071 Fix the error message displayed if the vmlinux or vmcore file is
not the same endian as the crash utility binary.  Without the patch
the filename is shown with the incorrect/opposite endian type.
(hukeping@huawei.com)
2014-09-30 10:02:05 -04:00
Dave Anderson
5da8ffe605 Set the 32-bit ARM HZ value to a default value of 100 if the kernel
was not configured with CONFIG_IKCONFIG.  Without the patch, the
initial system banner and the "sys" command show "UPTIME: (cannot
calculate: unknown HZ value)", the "ps -t" option shows "RUN TIME:
(cannot calculate: unknown HZ value)", and the "timer -r" option
kills the crash session with a floating point exception.
(hukeping@huawei.com)
2014-09-29 11:33:27 -04:00
Dave Anderson
a3a441aeab Fix for the "ps" command performance degradation patch the was
introduced in crash-7.0.8.  Without this patch, it is possible that
the "ps" command may fail prematurely with the error message
"ps: bsearch for tgid failed: task: <address> tgid: <number>"
when running on a live system or against a "live" dumpfile.
(panfy.fnst@cn.fujitsu.com)
2014-09-22 16:25:16 -04:00
Dave Anderson
506b3caf29 Fix "defs.h" for building extension modules outside of the crash
utility source tree on PPC and PPC64 machines.  Without the patch,
both PPC and PPC64 will get #define'd if the extension module build
procedure does not #define one or the other, which in turn causes
multiple conflicting declarations.
(anderson@redhat.com)
2014-09-22 16:02:05 -04:00
Dave Anderson
8185107da8 Improve the method for determining whether a 32-bit ARM vmlinux is
an LPAE enabled kernel by first checking whether CONFIG_ARM_LPAE
exists in the vmcoreinfo data, and if it does not, by then checking
whether the next higher symbol above "swapper_pg_dir" is 0x5000 bytes
higher in value.
(sdu.liu@huawei.com)
2014-09-22 14:37:17 -04:00
Dave Anderson
c6afa51af3 Update the "extensions/snap.mk" file to allow the "snap.so" extension
module to be built outside of a crash source tree on a ppc64le PPC64
little-endian host.  Without the patch, "make -f snap.mk" would fail
to compile, indicating "gcc: error: macro name missing after '-D'"
(anderson@redhat.com)
2014-09-22 14:09:43 -04:00
Dave Anderson
62b294b27c Fix for the one-time (dumpfile), or as-required (live system),
gathering of tasks from the kernel pid_hash[] in 2.6.24 and later
kernels.  Without the patch, if an entry in a pid_hash[] chain is
not related to the "init_pid_ns" pid_namespace structure, any
remaining entries in the hlist chain are skipped.
(vvs@parallels.com)
2014-09-19 14:20:57 -04:00
Dave Anderson
4010619625 Addressed 3 Coverity Scan issues:
(1) task.c: initialize the "curr" and "curr_my_q" variables in the
      dump_tasks_in_task_group_cfs_rq() function.
  (2) ramdump.c: make the "rd" and "len" return values from read()
      and write() calls in write_elf() to be ssize_t types.
  (3) cmdline.c: make the parsed PATH string buffer equal to the size
      of the PATH string + 1 to prevent a possible buffer overflow
      when a command line starts with a "!".
(anderson@redhat.com)
2014-09-18 13:27:45 -04:00
Dave Anderson
68c3828210 Add "/lib/modules/<version>/build" to the list of directories that
are searched for the currently-running kernel on live systems.  This
will automatically locate the vmlinux namelist for kernels that were
locally installed with "make modules_install install".
(lrintel@redhat.com)
2014-09-12 15:37:40 -04:00
Dave Anderson
df8d23ff21 Fix the CPU timer and clock comparator output for the "bt -a" command
on S390X machines.  The output of CPU timer and clock comparator has
always been incorrect because:
  - We added S390X_WORD_SIZE (8) instead of 4 to get the second word
  - We did not left shift the clock comparator by 8
The fix gets the complete 64 bit values and by shifting the clock
comparator correctly.
(holzheu@linux.vnet.ibm.com)
2014-09-12 15:13:25 -04:00
Dave Anderson
1aeeb2a5ae crash-7.0.7 -> crash-7.0.8 2014-09-11 14:23:21 -04:00
Dave Anderson
f0c5229269 Address a "ps" command performance degradation that was introduced by
a crash-7.0.4 patch which added per-thread task_struct.rss_stat page
counts to the task's mm_struct.rss_stat page counts in order to show
an accurate/synchronized RSS value.  Without the patch, the "ps"
command performance would degrade as the number of tasks increased,
most notably when there were thousands of tasks.
(panfy.fnst@cn.fujitsu.com, anderson@redhat.com)
2014-09-11 11:31:14 -04:00
Dave Anderson
fcd4a192d5 Maintain backwards-compatibility for "kvmdump" dumpfiles that were
created by older development versions of KVM tools in which the
cpu version id was 12, but the cpu device headers did not contain
the additional XSAVE related fields.
(uobergfe@redhat.com)
2014-09-09 14:27:29 -04:00
Dave Anderson
fce4684d04 Fix for SMP active task register-gathering from "kvmdump" dumpfiles
that were created with a cpu version id of 12 or greater that contain
additional XSAVE related fields in their cpu device headers.  Without
the patch, active tasks running on cpus above 0 may have truncated
backtraces.
(uobergfe@redhat.com)
2014-09-09 10:50:03 -04:00
Dave Anderson
f64b1a5954 Implement support for the ppc64le PPC64 little-endian architecture.
Since this required a large number of patches to be applied to
architecture-neutral files in the gdb-7.6 tree, the changes are
only applied if the host build system is a ppc64le.
(ptesarik@suse.cz, normand@linux.vnet.ibm.com)
2014-09-05 10:34:10 -04:00
Dave Anderson
dc53849af7 Fortify the validity verification of the data structures traversed
by the "kmem [-sS]" options for kernels configured with CONFIG_SLUB.
Without the patch, the contents of several structure members are not
validated, and may generate bogus or never-ending output, typically
seen when running the commands on a "live dump" where the dumpfile
was taken while the kernel was still running.  The patch aborts the
relevant parts of per-kmem_cache output when invalid data is
encountered or if an object list contains duplicate entries, and
error messages have been enhanced to more accurately describe the
issues encountered.
(anderson@redhat.com)
2014-09-04 16:50:52 -04:00
Dave Anderson
e7fcb3a35b On a live system during session initialization, delay the first read
error message (typically when reading the "cpu_possible_mask") until
it is confirmed that all of the following are true:
  (1) /dev/crash does not exist, and
  (2) /dev/mem is restricted via CONFIG_STRICT_DEVMEM, and
  (3) /proc/kcore cannot be read/accessed.
The "kernel may be configured with CONFIG_STRICT_DEVMEM" and
the "trying /proc/kcore as an alternative" messages will still
be displayed when appropriate.  The read error message be displayed
only if all three live memory read options fail.
(anderson@redhat.com)
2014-08-12 14:57:20 -04:00
Dave Anderson
de3daee5ee Fix to recognize that the live system "crash.ko" memory driver may
be compressed and named "crash.ko.xz".  Without the patch, the driver
is not recognized and loaded, and as a result the /dev/mem driver
and/or /proc/kcore will be tried as the live memory source.
(anderson@redhat.com)
2014-08-12 11:15:49 -04:00
Dave Anderson
10db83eb4e Re-run a command in the history list by entering an "!" followed by
the number identifying the command.  However, unlike the similar "r"
pseudo-command, if the number is a command name in the user's PATH,
maintain the current behavior and execute that command.
(anderson@redhat.com)
2014-08-07 15:30:28 -04:00
Dave Anderson
b3f2e7d190 Fix for the "help -[nD]" ELF header translation to recognize the
EM_ARM and EM_AARCH values as "e_machine" types, and ELFOSABI_LINUX
as an "e_ident[EI_OSABI]" type.  Without the patch, the e_machine
translation would show "40 (unsupported)" for 32-bit ARM, or
"183 (unsupported)" on ARM64; and the ELFOSABI_LINUX type would
be translated as "3 (?)".
(anderson@redhat.com)
2014-07-31 15:57:42 -04:00
Dave Anderson
25b61f4a2e Implement support for ARM and ARM64 raw RAM dumpfiles. One or
more "ramdump" files may be entered on the crash command line
in an ordered pair format consisting of the RAM dump filename
and the starting physical address expressed in hexadecimal,
connected with an ampersand:

  $ crash vmlinux ramdump@address [ramdump@address]

A temporary ELF header will be created in /var/tmp, and the
combination of the header and the ramdump file(s) will be handled
like a normal ELF vmcore.  The ELF header will only exist during
the crash session.  If desired, an optional "-o <filename>"
may be entered to create a permanent ELF vmcore file from the
ramdump file(s).
(vinayakm.list@gmail.com, paawan1982@yahoo.com, anderson@redhat.com)
2014-07-31 14:58:26 -04:00
Dave Anderson
97a39ce0c7 If an ARM or ARM64 dumpfile does not contain the register sets of
the active tasks in the kernel's per-cpu crash_notes, there is an
initialization-time warning message indicating "could not retrieve
crash_notes".  It has been changed to a more meaningful warning
message indicating "cannot retrieve registers for active tasks".
(anderson@redhat.com)
2014-07-30 14:11:33 -04:00
Dave Anderson
a96064bec9 Enhancement of the "kmem -S" option for Linux 3.2 and later kernels
configured with CONFIG_SLUB to display the address of each per-cpu
kmem_cache_cpu address and the contents of its per-cpu partial list.
(qiaonuohan@cn.fujitsu.com)
2014-07-24 15:03:32 -04:00
Dave Anderson
520fcee94d Determine the various ARM64 kernel virtual address ranges using the
kernel's VA_BITS value.  It currently is hardwired in the kernel to
one of two values depending upon whether 4K or 64K pages are
configured.  However, there are plans to support 16K paqes, to make
VA_BITS a configurable value, and to make the number of page-table
levels configurable.  Towards that end, the crash utility has been
changed to determine the VA_BITS value based upon known kernel
virtual addresses, and to then calculate the relevant kernel virtual
address ranges on that value instead of hardwiring them based upon
the page size.
(anderson@redhat.com)
2014-07-23 11:14:37 -04:00
Dave Anderson
ee73b32996 When running against an ARM64 dumpfile created with the "snap.so"
extension module, do not attempt to read the crash_notes.  Since the
dumpfile was taken while running on a live system, the crash_notes,
if configured into the kernel, would not contain valid data.  Without
the patch, the message "WARNING: could not retrieve crash_notes" is
displayed during session initialization.
(anderson@redhat.com)
2014-07-18 16:07:42 -04:00
Dave Anderson
773a6822b9 During initialization, reject useless ARM64 "(A)" absolute symbols
that begin with "__crc_".  Without the patch, several thousand of
them may be displayed by "sym -l" prior to the first kernel virtual
address symbol.
(anderson@redhat.com)
2014-07-18 12:03:59 -04:00
Dave Anderson
77e9ca1305 Document the reason behind the deprecation of the "mount -f" option
for Linux 3.13 and later kernels if the option is attempted, and in
the "help mount" output, similar to the deprecated "mount -d" option.
(anderson@redhat.com)
2014-07-08 11:26:52 -04:00
Dave Anderson
ad757cb474 Fix for an ARM64 compilation failure of the embedded gdb file
"aarch-linux-nat.c" in the Fedora fc21 rawhide environment, which
uses glibc-headers-2.19.90-24.fc21.
(anderson@redhat.com)
2014-07-02 14:58:26 -04:00
Dave Anderson
1767e9d4f7 If a compressed kdump is damaged/truncated such that the bitmap data
in the dumpfile header is not contained within the file, attempts
to analyze it with a vmlinux file, or using the "crash --osrelease"
or "crash --log" options with just the vmcore, will result in the
crash utility spinning forever, endlessly performing reads of 0 bytes
from the file without recognizing the EOF condition.
(dwysocha@redhat.com)
2014-06-27 14:37:15 -04:00
Dave Anderson
5094787767 Deprecate the "mount -f" option for Linux 3.13 and later kernels
containing commit eee5cc2702929fd41cce28058dc6d6717f723f87, which
removed the super_block.s_files list_head member and the open files
list that it contained.  Without the patch, the command option fails
with the error message "mount: invalid structure member offset:
super_block_s_files"
(anderson@redhat.com)
2014-06-27 11:18:23 -04:00
Dave Anderson
683ae262f4 Fix for the "mod -S" command to find the debuginfo data for Red Hat
"kpatch" modules.  Without the patch, the command would display
"mod: cannot find or load object file for <kpatch-module> module".
(anderson@redhat.com)
2014-06-26 14:55:08 -04:00
Dave Anderson
754f073901 Fix for the "search -t" option if the system has 2064 or more tasks.
Without the patch, the command fails with a dump of the crash utility
memory allocation statistics, ending with "search: cannot allocate
any more memory!".
(anderson@redhat.com)
2014-06-20 16:16:12 -04:00
Dave Anderson
13c0faff75 Fix for file-handling errors when a compressed vmlinux.debug file
is followed by a vmlinux file on the crash command line.  When the
crash session ends, two errors will occur:
  (1) the vmlinux file will be deleted
  (2) the temporary uncompressed version of the vmlinux.debug file
      will remain in /var/tmp
This problem also occurs in the highly unlikely case where a
compressed vmlinux file is followed by a vmlinux.debug file on the
command line, and the uncompressed temporary version of the vmlinux
file is larger than the vmlinux.debug file.  In that case:
  (1) the vmlinux.debug file will be deleted
  (2) the temporary uncompressed version of the vmlinux file
will remain in /var/tmp
(dmair@suse.com)
2014-06-16 10:11:18 -04:00
Dave Anderson
77537c1273 Fix for the handling of 32-bit ELF xendump dumpfiles if the guest
was configured with more than 4GB of memory.  Without the patch, the
crash session may fail during initialization with the error message
"crash: vmlinux and <dumpfile> do not match!".
(dslutz@verizon.com)
2014-06-12 10:42:33 -04:00
Dave Anderson
625e9d3eb8 crash-7.0.6 -> crash-7.0.7 2014-06-09 14:48:49 -04:00
Dave Anderson
ee0286b3b9 Introduce support for 32-bit ARM kernels that are configured with
CONFIG_ARM_LPAE.  The patch implements the virtual-to-physical
address translation of 64-bit PTEs used by ARM LPAE kernels.
(sdu.liu@huawei.com, weijitao@huawei.com)
2014-06-05 15:17:09 -04:00
Dave Anderson
843ac0e0a6 Fix for the "extend" command when running with an x86_64 crash binary
that was built with "make target=ARM64" in order to analyze ARM64
dumpfiles on an x86_64 host.  Without the patch, if the extend
command is used with an extension module built in the same manner,
it fails with the message "extend: <module>.so: not an ELF format
object file".
(Jan.Karlsson@sonymobile.com)
2014-06-05 09:19:29 -04:00
Dave Anderson
d25e4c9e7f Fix for the "runq -g" command on Linux 3.15 and later kernels, where
the cgroup_name() function now utilizes kernfs_name().  Without the
patch, the command fails with the error message "runq: invalid
structure member offset: cgroup_dentry".
(anderson@redhat.com)
2014-06-03 11:09:04 -04:00
Dave Anderson
0480d56427 Fix to prevent a possible segmentation violation generated by the
"runq -g" command when run on a very active live system due to an
active task on a cpu exiting while the command is running.
(anderson@redhat.com)
2014-06-02 16:04:47 -04:00
Dave Anderson
6200290983 Fix a harmless logic error in the last Makefile update to create an
empty gdb-<version>/gdb-<version>.patch file if it doesn't exist.
(anderson@redhat.com)
2014-06-02 10:17:21 -04:00
Dave Anderson
618066f433 If the gdb-<version>.patch file has changed and a rebuild is being
done from within a previously-existing build tree, "patch -N" the
gdb sources, and start the rebuild from the gdb-<version> directory
instead of the gdb-<version>/gdb directory.
(anderson@redhat.com)
2014-05-30 15:16:12 -04:00
Dave Anderson
3967ec84ff Fix for the "DEBUG KERNEL:" display in the initial system banner
and by the "sys" command when using a System.map file with a
Linux 3.0 and later debug kernel.  Without the patch, the kernel
version is not displayed in parentheses following the debug kernel
name.
(anderson@redhat.com)
2014-05-28 15:05:13 -04:00
Dave Anderson
c9e72263ae If the kernel (live or dumpfile) has the "kpatch" module installed,
the tag "[KPATCH]" will be displayed next to the kernel name in the
initial system banner and by the "sys" command.
(anderson@redhat.com)
2014-05-28 14:19:01 -04:00