mirror of
https://github.com/ceph/ceph
synced 2024-12-24 20:33:27 +00:00
Merge pull request #53976 from phlogistonjohn/jjm-cephadm-gdb-extd-doc
cephadm: document extended gdb debugging techniques Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Zac Dover <zac.dover@proton.me>
This commit is contained in:
commit
2706ecac4a
@ -335,16 +335,22 @@ Deploy the daemon::
|
||||
|
||||
cephadm --image <container-image> deploy --fsid <fsid> --name mgr.hostname.smfvfd --config-json config-json.json
|
||||
|
||||
Analyzing Core Dumps
|
||||
Capturing Core Dumps
|
||||
---------------------
|
||||
|
||||
When a Ceph daemon crashes, cephadm supports analyzing core dumps. To enable core dumps, run
|
||||
A Ceph cluster that uses cephadm can be configured to capture core dumps.
|
||||
Initial capture and processing of the coredump is performed by
|
||||
`systemd-coredump <https://www.man7.org/linux/man-pages/man8/systemd-coredump.8.html>`_.
|
||||
|
||||
|
||||
To enable coredump handling, run:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ulimit -c unlimited
|
||||
|
||||
Core dumps will now be written to ``/var/lib/systemd/coredump``.
|
||||
Core dumps will be written to ``/var/lib/systemd/coredump``.
|
||||
This will persist until the system is rebooted.
|
||||
|
||||
.. note::
|
||||
|
||||
@ -352,16 +358,110 @@ Core dumps will now be written to ``/var/lib/systemd/coredump``.
|
||||
they will be written to ``/var/lib/systemd/coredump`` on
|
||||
the container host.
|
||||
|
||||
Now, wait for the crash to happen again. To simulate the crash of a daemon, run e.g. ``killall -3 ceph-mon``.
|
||||
Now, wait for the crash to happen again. To simulate the crash of a daemon, run
|
||||
e.g. ``killall -3 ceph-mon``.
|
||||
|
||||
Install debug packages including ``ceph-debuginfo`` by entering the cephadm shelll::
|
||||
|
||||
# cephadm shell --mount /var/lib/systemd/coredump
|
||||
[ceph: root@host1 /]# dnf install ceph-debuginfo gdb zstd
|
||||
[ceph: root@host1 /]# unzstd /mnt/coredump/core.ceph-*.zst
|
||||
[ceph: root@host1 /]# gdb /usr/bin/ceph-mon /mnt/coredump/core.ceph-...
|
||||
(gdb) bt
|
||||
#0 0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
|
||||
#1 0x00007fa910d7f8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
|
||||
#2 0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2
|
||||
#3 0x0000563085ca3d7e in main ()
|
||||
Running the Debugger with cephadm
|
||||
----------------------------------
|
||||
|
||||
Running a single debugging session
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
One can initiate a debugging session using the ``cephadm shell`` command.
|
||||
From within the shell container we need to install the debugger and debuginfo
|
||||
packages. To debug a core file captured by systemd, run the following:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
# start the shell session
|
||||
cephadm shell --mount /var/lib/system/coredump
|
||||
# within the shell:
|
||||
dnf install ceph-debuginfo gdb zstd
|
||||
unzstd /var/lib/systemd/coredump/core.ceph-*.zst
|
||||
gdb /usr/bin/ceph-mon /mnt/coredump/core.ceph-*.zst
|
||||
|
||||
You can then run debugger commands at gdb's prompt.
|
||||
|
||||
.. prompt::
|
||||
|
||||
(gdb) bt
|
||||
#0 0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
|
||||
#1 0x00007fa910d7f8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
|
||||
#2 0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2
|
||||
#3 0x0000563085ca3d7e in main ()
|
||||
|
||||
|
||||
Running repeated debugging sessions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When using ``cephadm shell``, like in the example above, the changes made to
|
||||
the container the shell command spawned are ephemeral. Once the shell session
|
||||
exits all of the files that were downloaded and installed are no longer
|
||||
available. One can simply re-run the same commands every time ``cephadm shell``
|
||||
is invoked, but in order to save time and resources one can create a new
|
||||
container image and use it for repeated debugging sessions.
|
||||
|
||||
In the following example we create a simple file for constructing the
|
||||
container image. The command below uses podman but it should work correctly
|
||||
if ``podman`` is replaced with ``docker``.
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
cat >Containerfile <<EOF
|
||||
ARG BASE_IMG=quay.io/ceph/ceph:v18
|
||||
FROM \${BASE_IMG}
|
||||
# install ceph debuginfo packages, gdb and other potentially useful packages
|
||||
RUN dnf install --enablerepo='*debug*' -y ceph-debuginfo gdb zstd strace python3-debuginfo
|
||||
EOF
|
||||
podman build -t ceph:debugging -f Containerfile .
|
||||
# pass --build-arg=BASE_IMG=<your image> to customize the base image
|
||||
|
||||
The result should be a new local image named ``ceph:debugging``. This image can
|
||||
be used on the same machine that built it. Later, the image could be pushed to
|
||||
a container repository, or saved and copied to a node runing other ceph
|
||||
containers. Please consult the documentation for ``podman`` or ``docker`` for
|
||||
more details on the general container workflow.
|
||||
|
||||
Once the image has been built it can be used to initiate repeat debugging
|
||||
sessions without having to re-install the debug tools and debuginfo packages.
|
||||
To debug a core file using this image, in the same way as previously described,
|
||||
run:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm --image ceph:debugging shell --mount /var/lib/system/coredump
|
||||
|
||||
|
||||
Debugging live processes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The gdb debugger has the ability to attach to running processes to debug them.
|
||||
For a containerized process this can be accomplished by using the debug image
|
||||
and attaching it to the same PID namespace as the process to be debugged.
|
||||
|
||||
This requires running a container command with some custom arguments. We can generate a script that can debug a process in a running container.
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm --image ceph:debugging shell --dry-run > /tmp/debug.sh
|
||||
|
||||
This creates a script with the container command cephadm would use to create a
|
||||
shell. Now, modify the script by removing the ``--init`` argument and replace
|
||||
that with the argument to join to the namespace used for a running running
|
||||
container. For example, let's assume we want to debug the MGR, and have
|
||||
determnined that the MGR is running in a container named
|
||||
``ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk``. The new
|
||||
argument
|
||||
``--pid=container:ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk``
|
||||
should be used.
|
||||
|
||||
Now, we can run our debugging container with ``sh /tmp/debug.sh``. Within the shell
|
||||
we can run commands such as ``ps`` to get the PID of the MGR process. In the following
|
||||
example this will be ``2``. Running gdb, we can now attach to the running process:
|
||||
|
||||
.. prompt:: bash (gdb)
|
||||
|
||||
attach 2
|
||||
info threads
|
||||
bt
|
||||
|
@ -5417,6 +5417,10 @@ def command_shell(ctx):
|
||||
privileged=True)
|
||||
command = c.shell_cmd(command)
|
||||
|
||||
if ctx.dry_run:
|
||||
print(' '.join(shlex.quote(arg) for arg in command))
|
||||
return 0
|
||||
|
||||
return call_timeout(ctx, command, ctx.timeout)
|
||||
|
||||
##################################
|
||||
@ -7165,6 +7169,10 @@ def _get_parser():
|
||||
'--no-hosts',
|
||||
action='store_true',
|
||||
help='dont pass /etc/hosts through to the container')
|
||||
parser_shell.add_argument(
|
||||
'--dry-run',
|
||||
action='store_true',
|
||||
help='print, but do not execute, the container command to start the shell')
|
||||
|
||||
parser_enter = subparsers.add_parser(
|
||||
'enter', help='run an interactive shell inside a running daemon container')
|
||||
|
Loading…
Reference in New Issue
Block a user