ceph/doc/mgr/crash.rst

Crash Module
============
The crash module collects information about daemon crashdumps and stores
it in the Ceph cluster for later analysis.

Enabling
--------

The *crash* module is enabled with::

  ceph mgr module enable crash

The *crash* upload key is generated with::

  ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash'

On each node, you should store this key in
``/etc/ceph/ceph.client.crash.keyring``.


Automated collection
--------------------

Daemon crashdumps are dumped in ``/var/lib/ceph/crash`` by default; this can
be configured with the option 'crash dir'.  Crash directories are named by
time and date and a randomly-generated UUID, and contain a metadata file
'meta' and a recent log file, with a "crash_id" that is the same.

These crashes can be automatically submitted and persisted in the monitors'
storage by using ``ceph-crash.service``.
It watches the crashdump directory and uploads them with ``ceph crash post``.

``ceph-crash`` tries some authentication names: ``client.crash.$hostname``,
``client.crash`` and ``client.admin``.
In order to successfully upload with ``ceph crash post``, these need
the suitable permissions: ``mon profile crash`` and ``mgr profile crash``
and a keyring needs to be in ``/etc/ceph``.


Commands
--------
::

  ceph crash post -i <metafile>

Save a crash dump.  The metadata file is a JSON blob stored in the crash
dir as ``meta``.  As usual, the ceph command can be invoked with ``-i -``,
and will read from stdin.

::

  ceph crash rm <crashid>

Remove a specific crash dump.

::

  ceph crash ls

List the timestamp/uuid crashids for all new and archived crash info.

::

  ceph crash ls-new

List the timestamp/uuid crashids for all newcrash info.

::

  ceph crash stat

Show a summary of saved crash info grouped by age.

::

  ceph crash info <crashid>

Show all details of a saved crash.

::

   ceph crash prune <keep>

Remove saved crashes older than 'keep' days.  <keep> must be an integer.

::

   ceph crash archive <crashid>

Archive a crash report so that it is no longer considered for the ``RECENT_CRASH`` health check and does not appear in the ``crash ls-new`` output (it will still appear in the ``crash ls`` output).

::

   ceph crash archive-all

Archive all new crash reports.


Options
-------

* ``mgr/crash/warn_recent_interval`` [default: 2 weeks] controls what constitutes "recent" for the purposes of raising the ``RECENT_CRASH`` health warning.
* ``mgr/crash/retain_interval`` [default: 1 year] controls how long crash reports are retained by the cluster before they are automatically purged.
doc: Replaced "plugin" with "module" in the Mgr documentation The documentation currently refers to Ceph Manager Modules as "plugins" in many places, while the command line interface uses "module" to enable/disable modules. Replaced all occurences of "plugin" with "module" in the docs, to avoid confusion and to be in alignment with the CLI. Also fixed the capitalizations of some module chapters. Fixes: https://tracker.ceph.com/issues/38481 Signed-off-by: Lenz Grimmer <lgrimmer@suse.com> 2019-02-27 12:49:47 +00:00			`Crash Module`
doc/mgr: add doc for crash mgr module Signed-off-by: Dan Mick <dan.mick@redhat.com> 2018-06-27 22:14:08 +00:00			`============`
doc: Replaced "plugin" with "module" in the Mgr documentation The documentation currently refers to Ceph Manager Modules as "plugins" in many places, while the command line interface uses "module" to enable/disable modules. Replaced all occurences of "plugin" with "module" in the docs, to avoid confusion and to be in alignment with the CLI. Also fixed the capitalizations of some module chapters. Fixes: https://tracker.ceph.com/issues/38481 Signed-off-by: Lenz Grimmer <lgrimmer@suse.com> 2019-02-27 12:49:47 +00:00			`The crash module collects information about daemon crashdumps and stores`
doc/mgr: add doc for crash mgr module Signed-off-by: Dan Mick <dan.mick@redhat.com> 2018-06-27 22:14:08 +00:00			`it in the Ceph cluster for later analysis.`

			`Enabling`
			`--------`

			`The crash module is enabled with::`

			`ceph mgr module enable crash`

doc/mgr/crash: explain needed crash upload permissions Signed-off-by: Jonas Jelten <jj@sft.lol> 2021-03-09 13:10:52 +00:00			`The crash upload key is generated with::`

			`ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash'`

			`On each node, you should store this key in`
			``/etc/ceph/ceph.client.crash.keyring``.


			`Automated collection`
			`--------------------`

			Daemon crashdumps are dumped in ``/var/lib/ceph/crash`` by default; this can
			`be configured with the option 'crash dir'. Crash directories are named by`
			`time and date and a randomly-generated UUID, and contain a metadata file`
			`'meta' and a recent log file, with a "crash_id" that is the same.`

			`These crashes can be automatically submitted and persisted in the monitors'`
			storage by using ``ceph-crash.service``.
			It watches the crashdump directory and uploads them with ``ceph crash post``.

			``ceph-crash`` tries some authentication names: ``client.crash.$hostname``,
			``client.crash`` and ``client.admin``.
			In order to successfully upload with ``ceph crash post``, these need
			the suitable permissions: ``mon profile crash`` and ``mgr profile crash``
			and a keyring needs to be in ``/etc/ceph``.


doc/mgr: add doc for crash mgr module Signed-off-by: Dan Mick <dan.mick@redhat.com> 2018-06-27 22:14:08 +00:00			`Commands`
			`--------`
			`::`

			`ceph crash post -i <metafile>`

			`Save a crash dump. The metadata file is a JSON blob stored in the crash`
			dir as ``meta``. As usual, the ceph command can be invoked with ``-i -``,
			`and will read from stdin.`

			`::`

doc/mgr/crash: Add missing command in rm example Example to remove a crash id was missing the crash subcommand. Fixes: https://tracker.ceph.com/issues/46676 Signed-off-by: Daniël Vos <danielvos@outlook.com> 2020-07-21 13:41:15 +00:00			`ceph crash rm <crashid>`
doc/mgr: add doc for crash mgr module Signed-off-by: Dan Mick <dan.mick@redhat.com> 2018-06-27 22:14:08 +00:00
			`Remove a specific crash dump.`

			`::`

			`ceph crash ls`

mgr/crash: add 'crash ls-new' Signed-off-by: Sage Weil <sage@redhat.com> 2019-07-14 16:26:41 +00:00			`List the timestamp/uuid crashids for all new and archived crash info.`

			`::`

			`ceph crash ls-new`

			`List the timestamp/uuid crashids for all newcrash info.`
doc/mgr: add doc for crash mgr module Signed-off-by: Dan Mick <dan.mick@redhat.com> 2018-06-27 22:14:08 +00:00
			`::`

			`ceph crash stat`

			`Show a summary of saved crash info grouped by age.`

			`::`

			`ceph crash info <crashid>`

			`Show all details of a saved crash.`

			`::`

			`ceph crash prune <keep>`

			`Remove saved crashes older than 'keep' days. <keep> must be an integer.`

doc/mgr/crash: document missing commands, options Signed-off-by: Sage Weil <sage@redhat.com> 2019-07-24 17:57:18 +00:00			`::`

			`ceph crash archive <crashid>`

			Archive a crash report so that it is no longer considered for the ``RECENT_CRASH`` health check and does not appear in the ``crash ls-new`` output (it will still appear in the ``crash ls`` output).

			`::`

			`ceph crash archive-all`

			`Archive all new crash reports.`


			`Options`
			`-------`
doc/mgr: add doc for crash mgr module Signed-off-by: Dan Mick <dan.mick@redhat.com> 2018-06-27 22:14:08 +00:00
doc/mgr/crash: document missing commands, options Signed-off-by: Sage Weil <sage@redhat.com> 2019-07-24 17:57:18 +00:00			* ``mgr/crash/warn_recent_interval`` [default: 2 weeks] controls what constitutes "recent" for the purposes of raising the ``RECENT_CRASH`` health warning.
			* ``mgr/crash/retain_interval`` [default: 1 year] controls how long crash reports are retained by the cluster before they are automatically purged.