doc/dev/crimson: Improve crimson.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
This commit is contained in:
Anthony D'Atri 2023-05-21 08:39:56 -04:00
parent 4dbf84a34d
commit 524901d3cf

View File

@ -2,50 +2,50 @@
crimson
=======
Crimson is the code name of crimson-osd, which is the next generation ceph-osd.
It targets fast networking devices, fast storage devices by leveraging state of
the art technologies like DPDK and SPDK, for better performance. And it will
keep the support of HDDs and low-end SSDs via BlueStore. Crimson will try to
be backward compatible with classic OSD.
Crimson is the code name of ``crimson-osd``, which is the next
generation ``ceph-osd``. It improves performance when using fast network
and storage devices, employing state-of-the-art technologies including
DPDK and SPDK. BlueStore continues to support HDDs and slower SSDs.
Crimson aims to be backward compatible with the classic ``ceph-osd``.
.. highlight:: console
Building Crimson
================
Crimson is not enabled by default. To enable it::
Crimson is not enabled by default. Enable it at build time by running::
$ WITH_SEASTAR=true ./install-deps.sh
$ mkdir build && cd build
$ cmake -DWITH_SEASTAR=ON ..
Please note, `ASan`_ is enabled by default if crimson is built from a source
cloned using git.
Please note, `ASan`_ is enabled by default if Crimson is built from a source
cloned using ``git``.
.. _ASan: https://github.com/google/sanitizers/wiki/AddressSanitizer
Testing crimson with cephadm
===============================
The Ceph CI/CD pipeline includes ceph container builds with
crimson-osd subsitituted for ceph-osd.
The Ceph CI/CD pipeline builds containers with
``crimson-osd`` subsitituted for ``ceph-osd``.
Once a branch at commit <sha1> has been built and is available in
shaman, you can deploy it using the cephadm instructions outlined
``shaman``, you can deploy it using the cephadm instructions outlined
in :ref:`cephadm` with the following adaptations.
First, while performing the initial bootstrap, use the --image flag to
use a crimson build rather than a default build:
First, while performing the initial bootstrap, use the ``--image`` flag to
use a Crimson build:
.. prompt:: bash #
cephadm --image quay.ceph.io/ceph-ci/ceph:<sha1>-crimson --allow-mismatched-release bootstrap ...
You'll likely need to include the --allow-mismatched-release flag to
You'll likely need to supply the ``--allow-mismatched-release`` flag to
use a non-release branch.
Additionally, prior to deploying the osds, you'll need enable crimson
and default pools to be created as crimson pools (from cephadm shell):
Additionally, prior to deploying OSDs, you'll need enable Crimson to
direct the default pools to be created as Crimson pools. From the cephadm shell run:
.. prompt:: bash #
@ -53,39 +53,39 @@ and default pools to be created as crimson pools (from cephadm shell):
ceph osd set-allow-crimson --yes-i-really-mean-it
ceph config set mon osd_pool_default_crimson true
The first command enables the crimson experimental feature. Crimson
is highly experimental, and malfunctions up to and including crashes
The first command enables the ``crimson`` experimental feature. Crimson
is highly experimental, and malfunctions including crashes
and data loss are to be expected.
The second enables the allow_crimson OSDMap flag. The monitor will
not allow crimson-osd to boot without that flag.
The second enables the ``allow_crimson`` OSDMap flag. The monitor will
not allow ``crimson-osd`` to boot without that flag.
The last causes pools to be created by default with the crimson flag.
crimson pools are restricted to operations supported by crimson.
crimson-osd won't instantiate pgs from non-crimson pools.
The last causes pools to be created by default with the ``crimson`` flag.
Crimson pools are restricted to operations supported by Crimson.
``Crimson-osd`` won't instantiate PGs from non-Crimson pools.
Running Crimson
===============
As you might expect, crimson is not featurewise on par with its predecessor yet.
As you might expect, Crimson does not yet have as extensive a feature set as does ``ceph-osd``.
object store backend
--------------------
At the moment, ``crimson-osd`` offers both native and alienized object store
backends. The native object store backends perform IO using seastar reactor.
backends. The native object store backends perform IO using the SeaStar reactor.
They are:
.. describe:: cyanstore
CyanStore is modeled after memstore in classic OSD.
CyanStore is modeled after memstore in the classic OSD.
.. describe:: seastore
Seastore is still under active development.
While the alienized object store backends are backed by a thread pool, which
is a proxy of the alien store adaptor running in SeaStar. The proxy issues
The alienized object store backends are backed by a thread pool, which
is a proxy of the alienstore adaptor running in Seastar. The proxy issues
requests to object stores running in alien threads, i.e., worker threads not
managed by the Seastar framework. They are:
@ -95,36 +95,34 @@ managed by the Seastar framework. They are:
.. describe:: bluestore
The object store used by classic OSD by default.
The object store used by the classic ``ceph-osd``
daemonize
---------
Unlike ``ceph-osd``, ``crimson-osd`` does not daemonize itself even if the
``daemonize`` option is enabled. Because, to read this option, ``crimson-osd``
``daemonize`` option is enabled. In order to read this option, ``crimson-osd``
needs to ready its config sharded service, but this sharded service lives
in the seastar reactor. If we fork a child process and exit the parent after
in the Seastar reactor. If we fork a child process and exit the parent after
starting the Seastar engine, that will leave us with a single thread which is
the replica of the thread calls `fork()`_. This would unnecessarily complicate
the code, if we would have tackled this problem in crimson.
Since a lot of GNU/Linux distros are using systemd nowadays, which is able to
daemonize the application, there is no need to daemonize by ourselves. For
those who are using sysvinit, they can use ``start-stop-daemon`` for daemonizing
``crimson-osd``. If this is not acceptable, we can whip up a helper utility
to do the trick.
a replica of the thread that called `fork()`_. Tackling this problem in Crimson
would unnecessarily complicate the code.
Since supported GNU/Linux distributions use ``systemd``, which is able to
daemonize the application, there is no need to daemonize ourselves.
Those using sysvinit can use ``start-stop-daemon`` to daemonize ``crimson-osd``.
If this is does not work out, a helper utility may be devised.
.. _fork(): http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
logging
-------
Currently, ``crimson-osd`` uses the logging utility offered by Seastar. see
``src/common/dout.h`` for the mapping between different logging levels to
the severity levels in Seastar. For instance, the messages sent to ``derr``
will be printed using ``logger::error()``, and the messages with debug level
over ``20`` will be printed using ``logger::trace()``.
``Crimson-osd`` currently uses the logging utility offered by Seastar. See
``src/common/dout.h`` for the mapping between Ceph logging levels to
the severity levels in Seastar. For instance, messages sent to ``derr``
will be issued using ``logger::error()``, and the messages with a debug level
greater than ``20`` will be issued using ``logger::trace()``.
+---------+---------+
| ceph | seastar |
@ -140,18 +138,17 @@ over ``20`` will be printed using ``logger::trace()``.
| > 20 | trace |
+---------+---------+
Please note, ``crimson-osd``
does not send the logging message to specified ``log_file``. It writes
the logging messages to stdout and/or syslog. Again, this behavior can be
Note that ``crimson-osd``
does not send log messages directly to a specified ``log_file``. It writes
the logging messages to stdout and/or syslog. This behavior can be
changed using ``--log-to-stdout`` and ``--log-to-syslog`` command line
options. By default, ``log-to-stdout`` is enabled, and the latter disabled.
options. By default, ``log-to-stdout`` is enabled, and ``--log-to-syslog`` is disabled.
vstart.sh
---------
To facilitate the development of crimson, following options would be handy when
using ``vstart.sh``,
The following options aree handy when using ``vstart.sh``,
``--crimson``
Start ``crimson-osd`` instead of ``ceph-osd``.
@ -160,24 +157,23 @@ using ``vstart.sh``,
Do not daemonize the service.
``--redirect-output``
Tedirect the stdout and stderr of service to ``out/$type.$num.stdout``.
Redirect the ``stdout`` and ``stderr`` to ``out/$type.$num.stdout``.
``--osd-args``
Pass extra command line options to crimson-osd or ceph-osd. It's quite
useful for passing Seastar options to crimson-osd. For instance, you could
use ``--osd-args "--memory 2G"`` to set the memory to use. Please refer
the output of::
Pass extra command line options to ``crimson-osd`` or ``ceph-osd``.
This is useful for passing Seastar options to ``crimson-osd``. For
example, one can supply ``--osd-args "--memory 2G"`` to set the amount of
memory to use. Please refer to the output of::
crimson-osd --help-seastar
for more Seastar specific command line options.
for additional Seastar-specific command line options.
``--cyanstore``
Use CyanStore as the object store backend.
``--bluestore``
Use the alienized BlueStore as the object store backend. This is the default
setting, if not specified otherwise.
Use the alienized BlueStore as the object store backend. This is the default.
``--memstore``
Use the alienized MemStore as the object store backend.
@ -193,23 +189,23 @@ using ``vstart.sh``,
passing the block device to this option.
``--seastore-secondary-devs-type``
Optional. Specify device type of secondary devices. When the secondary
Optional. Specify the type of secondary devices. When the secondary
device is slower than main device passed to ``--seastore-devs``, the cold
data in faster device will be evicted to the slower devices over time.
Valid types include ``HDD``, ``SSD``(default), ``ZNS``, and ``RANDOM_BLOCK_SSD``
Note secondary devices should not be faster than the main device.
``--seastore``
use SeaStore as the object store backend.
Use SeaStore as the object store backend.
So, a typical command to start a single-crimson-node cluster is::
To start a cluster with a single Crimson node, run::
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \
--without-dashboard --cyanstore \
--crimson --redirect-output \
--osd-args "--memory 4G"
Where we assign 4 GiB memory, a single thread running on core-0 to crimson-osd.
Here we assign 4 GiB memory and a single thread running on core-0 to ``crimson-osd``.
Another SeaStore example::
@ -220,37 +216,37 @@ Another SeaStore example::
--seastore-secondary-devs /dev/sdb \
--seastore-secondary-devs-type HDD
You could stop the vstart cluster using::
Stop this ``vstart`` cluster by running::
$ ../src/stop.sh --crimson
Metrics and Tracing
===================
Crimson offers three ways to report the stats and metrics:
Crimson offers three ways to report stats and metrics.
pg stats reported to mgr
------------------------
Crimson collects the per-pg, per-pool, and per-osd stats in a `MPGStats`
message, and send it over to mgr, so that the mgr modules can query
message which is sent to the Ceph Managers. Manager modules can query
them using the `MgrModule.get()` method.
asock command
-------------
an asock command is offered for dumping the metrics::
An admin socket command is offered for dumping metrics::
$ ceph tell osd.0 dump_metrics
$ ceph tell osd.0 dump_metrics reactor_utilization
Where `reactor_utilization` is an optional string allowing us to filter
Here `reactor_utilization` is an optional string allowing us to filter
the dumped metrics by prefix.
Prometheus text protocol
------------------------
the listening port and address can be configured using the command line options of
The listening port and address can be configured using the command line options of
`--prometheus_port`
see `Prometheus`_ for more details.
@ -263,11 +259,11 @@ fio
---
``crimson-store-nbd`` exposes configurable ``FuturizedStore`` internals as an
NBD server for use with fio.
NBD server for use with ``fio``.
To use fio to test ``crimson-store-nbd``,
In order to use ``fio`` to test ``crimson-store-nbd``, perform the below steps.
#. You will need to install ``libnbd``, and compile fio like
#. You will need to install ``libnbd``, and compile it into ``fio``
.. prompt:: bash $
@ -284,8 +280,8 @@ To use fio to test ``crimson-store-nbd``,
cd build
ninja crimson-store-nbd
#. Run the ``crimson-store-nbd`` server with a block device. Please specify
the path to the raw device, like ``/dev/nvme1n1`` in place of the created
#. Run the ``crimson-store-nbd`` server with a block device. Specify
the path to the raw device, for example ``/dev/nvme1n1``, in place of the created
file for testing with a block device.
.. prompt:: bash $
@ -301,16 +297,16 @@ To use fio to test ``crimson-store-nbd``,
--type transaction_manager \
--uds-path ${unix_socket} &
in which,
Below are descriptions of these command line arguments:
``--smp``
how many CPU cores are used
The number of CPU cores to use (Symmetric MultiProcessor)
``--mkfs``
initialize the device first
Initialize the device first.
``--type``
which backend to use. If ``transaction_manager`` is specified, SeaStore's
The back end to use. If ``transaction_manager`` is specified, SeaStore's
``TransactionManager`` and ``BlockSegmentManager`` are used to emulate a
block device. Otherwise, this option is used to choose a backend of
``FuturizedStore``, where the whole "device" is divided into multiple
@ -320,7 +316,7 @@ To use fio to test ``crimson-store-nbd``,
without the object store semantics, ``transaction_manager`` would be a
better choice.
#. Create an fio job file named ``nbd.fio``
#. Create a ``fio`` job file named ``nbd.fio``
.. code:: ini
@ -337,7 +333,7 @@ To use fio to test ``crimson-store-nbd``,
[job0]
offset=0
#. Test the crimson object store using the fio compiled just now
#. Test the Crimson object store, using the custom ``fio`` built just now
.. prompt:: bash $
@ -345,9 +341,9 @@ To use fio to test ``crimson-store-nbd``,
CBT
---
We can use `cbt`_ for performing perf tests::
We can use `cbt`_ for performance tests::
$ git checkout master
$ git checkout main
$ make crimson-osd
$ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/baseline ../src/test/crimson/cbt/radosbench_4K_read.yaml
$ git checkout yet-another-pr
@ -372,9 +368,9 @@ We can use `cbt`_ for performing perf tests::
19:48:23 - INFO - cbt - seq/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.0508262/0.0557337 => accepted
19:48:23 - WARNING - cbt - 1 tests failed out of 16
Where we compile and run the same test against two branches. One is ``master``, another is ``yet-another-pr`` branch.
And then we compare the test results. Along with every test case, a set of rules is defined to check if we have
performance regressions when comparing two set of test results. If a possible regression is found, the rule and
Here we compile and run the same test against two branches: ``main`` and ``yet-another-pr``.
We then compare the results. Along with every test case, a set of rules is defined to check for
performance regressions when comparing the sets of test results. If a possible regression is found, the rule and
corresponding test results are highlighted.
.. _cbt: https://github.com/ceph/cbt
@ -409,7 +405,7 @@ The `tips`_ for debugging Scylla also apply to Crimson.
Human-readable backtraces with addr2line
----------------------------------------
When a seastar application crashes, it leaves us with a serial of addresses, like::
When a Seastar application crashes, it leaves us with a backtrace of addresses, like::
Segmentation fault.
Backtrace:
@ -428,10 +424,10 @@ When a seastar application crashes, it leaves us with a serial of addresses, lik
0x000000000d833ac9
Segmentation fault
``seastar-addr2line`` offered by Seastar can be used to decipher these
addresses. After running the script, it will be waiting for input from stdin,
so we need to copy and paste the above addresses, then send the EOF by inputting
``control-D`` in the terminal::
The ``seastar-addr2line`` utility provided by Seastar can be used to map these
addresses to functions. The script expects input on ``stdin``,
so we need to copy and paste the above addresses, then send EOF by inputting
``control-D`` in the terminal. One might use ``echo`` or ``cat`` instead`::
$ ../src/seastar/scripts/seastar-addr2line -e bin/crimson-osd
@ -459,8 +455,8 @@ so we need to copy and paste the above addresses, then send the EOF by inputting
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/app-template.cc:173 (discriminator 5)
main at /home/kefu/dev/ceph/build/../src/crimson/osd/main.cc:131 (discriminator 1)
Please note, ``seastar-addr2line`` is able to extract the addresses from
the input, so you can also paste the log messages like::
Note that ``seastar-addr2line`` is able to extract addresses from
its input, so you can also paste the log messages as below::
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr:Backtrace:
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e78dbc
@ -469,11 +465,11 @@ the input, so you can also paste the log messages like::
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e985
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: /lib64/libpthread.so.0+0x0000000000012dbf
Unlike classic OSD, crimson does not print a human-readable backtrace when it
handles fatal signals like `SIGSEGV` or `SIGABRT`. And it is more complicated
when it comes to a stripped binary. So before planting a signal handler for
those signals in crimson, we could to use `script/ceph-debug-docker.sh` to parse
the addresses in the backtrace::
Unlike the classic ``ceph-osd``, Crimson does not print a human-readable backtrace when it
handles fatal signals like `SIGSEGV` or `SIGABRT`. It is also more complicated
with a stripped binary. So instead of planting a signal handler for
those signals into Crimson, we can use `script/ceph-debug-docker.sh` to map
addresses in the backtrace::
# assuming you are under the source tree of ceph
$ ./src/script/ceph-debug-docker.sh --flavor crimson master:27e237c137c330ebb82627166927b7681b20d0aa centos:8