mirror of
https://github.com/ceph/ceph
synced 2024-12-27 14:03:25 +00:00
doc/dev/crimson: Improve crimson.rst
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
This commit is contained in:
parent
4dbf84a34d
commit
524901d3cf
@ -2,50 +2,50 @@
|
||||
crimson
|
||||
=======
|
||||
|
||||
Crimson is the code name of crimson-osd, which is the next generation ceph-osd.
|
||||
It targets fast networking devices, fast storage devices by leveraging state of
|
||||
the art technologies like DPDK and SPDK, for better performance. And it will
|
||||
keep the support of HDDs and low-end SSDs via BlueStore. Crimson will try to
|
||||
be backward compatible with classic OSD.
|
||||
Crimson is the code name of ``crimson-osd``, which is the next
|
||||
generation ``ceph-osd``. It improves performance when using fast network
|
||||
and storage devices, employing state-of-the-art technologies including
|
||||
DPDK and SPDK. BlueStore continues to support HDDs and slower SSDs.
|
||||
Crimson aims to be backward compatible with the classic ``ceph-osd``.
|
||||
|
||||
.. highlight:: console
|
||||
|
||||
Building Crimson
|
||||
================
|
||||
|
||||
Crimson is not enabled by default. To enable it::
|
||||
Crimson is not enabled by default. Enable it at build time by running::
|
||||
|
||||
$ WITH_SEASTAR=true ./install-deps.sh
|
||||
$ mkdir build && cd build
|
||||
$ cmake -DWITH_SEASTAR=ON ..
|
||||
|
||||
Please note, `ASan`_ is enabled by default if crimson is built from a source
|
||||
cloned using git.
|
||||
Please note, `ASan`_ is enabled by default if Crimson is built from a source
|
||||
cloned using ``git``.
|
||||
|
||||
.. _ASan: https://github.com/google/sanitizers/wiki/AddressSanitizer
|
||||
|
||||
Testing crimson with cephadm
|
||||
===============================
|
||||
|
||||
The Ceph CI/CD pipeline includes ceph container builds with
|
||||
crimson-osd subsitituted for ceph-osd.
|
||||
The Ceph CI/CD pipeline builds containers with
|
||||
``crimson-osd`` subsitituted for ``ceph-osd``.
|
||||
|
||||
Once a branch at commit <sha1> has been built and is available in
|
||||
shaman, you can deploy it using the cephadm instructions outlined
|
||||
``shaman``, you can deploy it using the cephadm instructions outlined
|
||||
in :ref:`cephadm` with the following adaptations.
|
||||
|
||||
First, while performing the initial bootstrap, use the --image flag to
|
||||
use a crimson build rather than a default build:
|
||||
First, while performing the initial bootstrap, use the ``--image`` flag to
|
||||
use a Crimson build:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm --image quay.ceph.io/ceph-ci/ceph:<sha1>-crimson --allow-mismatched-release bootstrap ...
|
||||
|
||||
You'll likely need to include the --allow-mismatched-release flag to
|
||||
You'll likely need to supply the ``--allow-mismatched-release`` flag to
|
||||
use a non-release branch.
|
||||
|
||||
Additionally, prior to deploying the osds, you'll need enable crimson
|
||||
and default pools to be created as crimson pools (from cephadm shell):
|
||||
Additionally, prior to deploying OSDs, you'll need enable Crimson to
|
||||
direct the default pools to be created as Crimson pools. From the cephadm shell run:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -53,39 +53,39 @@ and default pools to be created as crimson pools (from cephadm shell):
|
||||
ceph osd set-allow-crimson --yes-i-really-mean-it
|
||||
ceph config set mon osd_pool_default_crimson true
|
||||
|
||||
The first command enables the crimson experimental feature. Crimson
|
||||
is highly experimental, and malfunctions up to and including crashes
|
||||
The first command enables the ``crimson`` experimental feature. Crimson
|
||||
is highly experimental, and malfunctions including crashes
|
||||
and data loss are to be expected.
|
||||
|
||||
The second enables the allow_crimson OSDMap flag. The monitor will
|
||||
not allow crimson-osd to boot without that flag.
|
||||
The second enables the ``allow_crimson`` OSDMap flag. The monitor will
|
||||
not allow ``crimson-osd`` to boot without that flag.
|
||||
|
||||
The last causes pools to be created by default with the crimson flag.
|
||||
crimson pools are restricted to operations supported by crimson.
|
||||
crimson-osd won't instantiate pgs from non-crimson pools.
|
||||
The last causes pools to be created by default with the ``crimson`` flag.
|
||||
Crimson pools are restricted to operations supported by Crimson.
|
||||
``Crimson-osd`` won't instantiate PGs from non-Crimson pools.
|
||||
|
||||
Running Crimson
|
||||
===============
|
||||
|
||||
As you might expect, crimson is not featurewise on par with its predecessor yet.
|
||||
As you might expect, Crimson does not yet have as extensive a feature set as does ``ceph-osd``.
|
||||
|
||||
object store backend
|
||||
--------------------
|
||||
|
||||
At the moment, ``crimson-osd`` offers both native and alienized object store
|
||||
backends. The native object store backends perform IO using seastar reactor.
|
||||
backends. The native object store backends perform IO using the SeaStar reactor.
|
||||
They are:
|
||||
|
||||
.. describe:: cyanstore
|
||||
|
||||
CyanStore is modeled after memstore in classic OSD.
|
||||
CyanStore is modeled after memstore in the classic OSD.
|
||||
|
||||
.. describe:: seastore
|
||||
|
||||
Seastore is still under active development.
|
||||
|
||||
While the alienized object store backends are backed by a thread pool, which
|
||||
is a proxy of the alien store adaptor running in SeaStar. The proxy issues
|
||||
The alienized object store backends are backed by a thread pool, which
|
||||
is a proxy of the alienstore adaptor running in Seastar. The proxy issues
|
||||
requests to object stores running in alien threads, i.e., worker threads not
|
||||
managed by the Seastar framework. They are:
|
||||
|
||||
@ -95,36 +95,34 @@ managed by the Seastar framework. They are:
|
||||
|
||||
.. describe:: bluestore
|
||||
|
||||
The object store used by classic OSD by default.
|
||||
The object store used by the classic ``ceph-osd``
|
||||
|
||||
daemonize
|
||||
---------
|
||||
|
||||
Unlike ``ceph-osd``, ``crimson-osd`` does not daemonize itself even if the
|
||||
``daemonize`` option is enabled. Because, to read this option, ``crimson-osd``
|
||||
``daemonize`` option is enabled. In order to read this option, ``crimson-osd``
|
||||
needs to ready its config sharded service, but this sharded service lives
|
||||
in the seastar reactor. If we fork a child process and exit the parent after
|
||||
in the Seastar reactor. If we fork a child process and exit the parent after
|
||||
starting the Seastar engine, that will leave us with a single thread which is
|
||||
the replica of the thread calls `fork()`_. This would unnecessarily complicate
|
||||
the code, if we would have tackled this problem in crimson.
|
||||
|
||||
Since a lot of GNU/Linux distros are using systemd nowadays, which is able to
|
||||
daemonize the application, there is no need to daemonize by ourselves. For
|
||||
those who are using sysvinit, they can use ``start-stop-daemon`` for daemonizing
|
||||
``crimson-osd``. If this is not acceptable, we can whip up a helper utility
|
||||
to do the trick.
|
||||
a replica of the thread that called `fork()`_. Tackling this problem in Crimson
|
||||
would unnecessarily complicate the code.
|
||||
|
||||
Since supported GNU/Linux distributions use ``systemd``, which is able to
|
||||
daemonize the application, there is no need to daemonize ourselves.
|
||||
Those using sysvinit can use ``start-stop-daemon`` to daemonize ``crimson-osd``.
|
||||
If this is does not work out, a helper utility may be devised.
|
||||
|
||||
.. _fork(): http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
|
||||
|
||||
logging
|
||||
-------
|
||||
|
||||
Currently, ``crimson-osd`` uses the logging utility offered by Seastar. see
|
||||
``src/common/dout.h`` for the mapping between different logging levels to
|
||||
the severity levels in Seastar. For instance, the messages sent to ``derr``
|
||||
will be printed using ``logger::error()``, and the messages with debug level
|
||||
over ``20`` will be printed using ``logger::trace()``.
|
||||
``Crimson-osd`` currently uses the logging utility offered by Seastar. See
|
||||
``src/common/dout.h`` for the mapping between Ceph logging levels to
|
||||
the severity levels in Seastar. For instance, messages sent to ``derr``
|
||||
will be issued using ``logger::error()``, and the messages with a debug level
|
||||
greater than ``20`` will be issued using ``logger::trace()``.
|
||||
|
||||
+---------+---------+
|
||||
| ceph | seastar |
|
||||
@ -140,18 +138,17 @@ over ``20`` will be printed using ``logger::trace()``.
|
||||
| > 20 | trace |
|
||||
+---------+---------+
|
||||
|
||||
Please note, ``crimson-osd``
|
||||
does not send the logging message to specified ``log_file``. It writes
|
||||
the logging messages to stdout and/or syslog. Again, this behavior can be
|
||||
Note that ``crimson-osd``
|
||||
does not send log messages directly to a specified ``log_file``. It writes
|
||||
the logging messages to stdout and/or syslog. This behavior can be
|
||||
changed using ``--log-to-stdout`` and ``--log-to-syslog`` command line
|
||||
options. By default, ``log-to-stdout`` is enabled, and the latter disabled.
|
||||
options. By default, ``log-to-stdout`` is enabled, and ``--log-to-syslog`` is disabled.
|
||||
|
||||
|
||||
vstart.sh
|
||||
---------
|
||||
|
||||
To facilitate the development of crimson, following options would be handy when
|
||||
using ``vstart.sh``,
|
||||
The following options aree handy when using ``vstart.sh``,
|
||||
|
||||
``--crimson``
|
||||
Start ``crimson-osd`` instead of ``ceph-osd``.
|
||||
@ -160,24 +157,23 @@ using ``vstart.sh``,
|
||||
Do not daemonize the service.
|
||||
|
||||
``--redirect-output``
|
||||
Tedirect the stdout and stderr of service to ``out/$type.$num.stdout``.
|
||||
Redirect the ``stdout`` and ``stderr`` to ``out/$type.$num.stdout``.
|
||||
|
||||
``--osd-args``
|
||||
Pass extra command line options to crimson-osd or ceph-osd. It's quite
|
||||
useful for passing Seastar options to crimson-osd. For instance, you could
|
||||
use ``--osd-args "--memory 2G"`` to set the memory to use. Please refer
|
||||
the output of::
|
||||
Pass extra command line options to ``crimson-osd`` or ``ceph-osd``.
|
||||
This is useful for passing Seastar options to ``crimson-osd``. For
|
||||
example, one can supply ``--osd-args "--memory 2G"`` to set the amount of
|
||||
memory to use. Please refer to the output of::
|
||||
|
||||
crimson-osd --help-seastar
|
||||
|
||||
for more Seastar specific command line options.
|
||||
for additional Seastar-specific command line options.
|
||||
|
||||
``--cyanstore``
|
||||
Use CyanStore as the object store backend.
|
||||
|
||||
``--bluestore``
|
||||
Use the alienized BlueStore as the object store backend. This is the default
|
||||
setting, if not specified otherwise.
|
||||
Use the alienized BlueStore as the object store backend. This is the default.
|
||||
|
||||
``--memstore``
|
||||
Use the alienized MemStore as the object store backend.
|
||||
@ -193,23 +189,23 @@ using ``vstart.sh``,
|
||||
passing the block device to this option.
|
||||
|
||||
``--seastore-secondary-devs-type``
|
||||
Optional. Specify device type of secondary devices. When the secondary
|
||||
Optional. Specify the type of secondary devices. When the secondary
|
||||
device is slower than main device passed to ``--seastore-devs``, the cold
|
||||
data in faster device will be evicted to the slower devices over time.
|
||||
Valid types include ``HDD``, ``SSD``(default), ``ZNS``, and ``RANDOM_BLOCK_SSD``
|
||||
Note secondary devices should not be faster than the main device.
|
||||
|
||||
``--seastore``
|
||||
use SeaStore as the object store backend.
|
||||
Use SeaStore as the object store backend.
|
||||
|
||||
So, a typical command to start a single-crimson-node cluster is::
|
||||
To start a cluster with a single Crimson node, run::
|
||||
|
||||
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \
|
||||
--without-dashboard --cyanstore \
|
||||
--crimson --redirect-output \
|
||||
--osd-args "--memory 4G"
|
||||
|
||||
Where we assign 4 GiB memory, a single thread running on core-0 to crimson-osd.
|
||||
Here we assign 4 GiB memory and a single thread running on core-0 to ``crimson-osd``.
|
||||
|
||||
Another SeaStore example::
|
||||
|
||||
@ -220,37 +216,37 @@ Another SeaStore example::
|
||||
--seastore-secondary-devs /dev/sdb \
|
||||
--seastore-secondary-devs-type HDD
|
||||
|
||||
You could stop the vstart cluster using::
|
||||
Stop this ``vstart`` cluster by running::
|
||||
|
||||
$ ../src/stop.sh --crimson
|
||||
|
||||
Metrics and Tracing
|
||||
===================
|
||||
|
||||
Crimson offers three ways to report the stats and metrics:
|
||||
Crimson offers three ways to report stats and metrics.
|
||||
|
||||
pg stats reported to mgr
|
||||
------------------------
|
||||
|
||||
Crimson collects the per-pg, per-pool, and per-osd stats in a `MPGStats`
|
||||
message, and send it over to mgr, so that the mgr modules can query
|
||||
message which is sent to the Ceph Managers. Manager modules can query
|
||||
them using the `MgrModule.get()` method.
|
||||
|
||||
asock command
|
||||
-------------
|
||||
|
||||
an asock command is offered for dumping the metrics::
|
||||
An admin socket command is offered for dumping metrics::
|
||||
|
||||
$ ceph tell osd.0 dump_metrics
|
||||
$ ceph tell osd.0 dump_metrics reactor_utilization
|
||||
|
||||
Where `reactor_utilization` is an optional string allowing us to filter
|
||||
Here `reactor_utilization` is an optional string allowing us to filter
|
||||
the dumped metrics by prefix.
|
||||
|
||||
Prometheus text protocol
|
||||
------------------------
|
||||
|
||||
the listening port and address can be configured using the command line options of
|
||||
The listening port and address can be configured using the command line options of
|
||||
`--prometheus_port`
|
||||
see `Prometheus`_ for more details.
|
||||
|
||||
@ -263,11 +259,11 @@ fio
|
||||
---
|
||||
|
||||
``crimson-store-nbd`` exposes configurable ``FuturizedStore`` internals as an
|
||||
NBD server for use with fio.
|
||||
NBD server for use with ``fio``.
|
||||
|
||||
To use fio to test ``crimson-store-nbd``,
|
||||
In order to use ``fio`` to test ``crimson-store-nbd``, perform the below steps.
|
||||
|
||||
#. You will need to install ``libnbd``, and compile fio like
|
||||
#. You will need to install ``libnbd``, and compile it into ``fio``
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
@ -284,8 +280,8 @@ To use fio to test ``crimson-store-nbd``,
|
||||
cd build
|
||||
ninja crimson-store-nbd
|
||||
|
||||
#. Run the ``crimson-store-nbd`` server with a block device. Please specify
|
||||
the path to the raw device, like ``/dev/nvme1n1`` in place of the created
|
||||
#. Run the ``crimson-store-nbd`` server with a block device. Specify
|
||||
the path to the raw device, for example ``/dev/nvme1n1``, in place of the created
|
||||
file for testing with a block device.
|
||||
|
||||
.. prompt:: bash $
|
||||
@ -301,16 +297,16 @@ To use fio to test ``crimson-store-nbd``,
|
||||
--type transaction_manager \
|
||||
--uds-path ${unix_socket} &
|
||||
|
||||
in which,
|
||||
Below are descriptions of these command line arguments:
|
||||
|
||||
``--smp``
|
||||
how many CPU cores are used
|
||||
The number of CPU cores to use (Symmetric MultiProcessor)
|
||||
|
||||
``--mkfs``
|
||||
initialize the device first
|
||||
Initialize the device first.
|
||||
|
||||
``--type``
|
||||
which backend to use. If ``transaction_manager`` is specified, SeaStore's
|
||||
The back end to use. If ``transaction_manager`` is specified, SeaStore's
|
||||
``TransactionManager`` and ``BlockSegmentManager`` are used to emulate a
|
||||
block device. Otherwise, this option is used to choose a backend of
|
||||
``FuturizedStore``, where the whole "device" is divided into multiple
|
||||
@ -320,7 +316,7 @@ To use fio to test ``crimson-store-nbd``,
|
||||
without the object store semantics, ``transaction_manager`` would be a
|
||||
better choice.
|
||||
|
||||
#. Create an fio job file named ``nbd.fio``
|
||||
#. Create a ``fio`` job file named ``nbd.fio``
|
||||
|
||||
.. code:: ini
|
||||
|
||||
@ -337,7 +333,7 @@ To use fio to test ``crimson-store-nbd``,
|
||||
[job0]
|
||||
offset=0
|
||||
|
||||
#. Test the crimson object store using the fio compiled just now
|
||||
#. Test the Crimson object store, using the custom ``fio`` built just now
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
@ -345,9 +341,9 @@ To use fio to test ``crimson-store-nbd``,
|
||||
|
||||
CBT
|
||||
---
|
||||
We can use `cbt`_ for performing perf tests::
|
||||
We can use `cbt`_ for performance tests::
|
||||
|
||||
$ git checkout master
|
||||
$ git checkout main
|
||||
$ make crimson-osd
|
||||
$ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/baseline ../src/test/crimson/cbt/radosbench_4K_read.yaml
|
||||
$ git checkout yet-another-pr
|
||||
@ -372,9 +368,9 @@ We can use `cbt`_ for performing perf tests::
|
||||
19:48:23 - INFO - cbt - seq/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.0508262/0.0557337 => accepted
|
||||
19:48:23 - WARNING - cbt - 1 tests failed out of 16
|
||||
|
||||
Where we compile and run the same test against two branches. One is ``master``, another is ``yet-another-pr`` branch.
|
||||
And then we compare the test results. Along with every test case, a set of rules is defined to check if we have
|
||||
performance regressions when comparing two set of test results. If a possible regression is found, the rule and
|
||||
Here we compile and run the same test against two branches: ``main`` and ``yet-another-pr``.
|
||||
We then compare the results. Along with every test case, a set of rules is defined to check for
|
||||
performance regressions when comparing the sets of test results. If a possible regression is found, the rule and
|
||||
corresponding test results are highlighted.
|
||||
|
||||
.. _cbt: https://github.com/ceph/cbt
|
||||
@ -409,7 +405,7 @@ The `tips`_ for debugging Scylla also apply to Crimson.
|
||||
Human-readable backtraces with addr2line
|
||||
----------------------------------------
|
||||
|
||||
When a seastar application crashes, it leaves us with a serial of addresses, like::
|
||||
When a Seastar application crashes, it leaves us with a backtrace of addresses, like::
|
||||
|
||||
Segmentation fault.
|
||||
Backtrace:
|
||||
@ -428,10 +424,10 @@ When a seastar application crashes, it leaves us with a serial of addresses, lik
|
||||
0x000000000d833ac9
|
||||
Segmentation fault
|
||||
|
||||
``seastar-addr2line`` offered by Seastar can be used to decipher these
|
||||
addresses. After running the script, it will be waiting for input from stdin,
|
||||
so we need to copy and paste the above addresses, then send the EOF by inputting
|
||||
``control-D`` in the terminal::
|
||||
The ``seastar-addr2line`` utility provided by Seastar can be used to map these
|
||||
addresses to functions. The script expects input on ``stdin``,
|
||||
so we need to copy and paste the above addresses, then send EOF by inputting
|
||||
``control-D`` in the terminal. One might use ``echo`` or ``cat`` instead`::
|
||||
|
||||
$ ../src/seastar/scripts/seastar-addr2line -e bin/crimson-osd
|
||||
|
||||
@ -459,8 +455,8 @@ so we need to copy and paste the above addresses, then send the EOF by inputting
|
||||
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/app-template.cc:173 (discriminator 5)
|
||||
main at /home/kefu/dev/ceph/build/../src/crimson/osd/main.cc:131 (discriminator 1)
|
||||
|
||||
Please note, ``seastar-addr2line`` is able to extract the addresses from
|
||||
the input, so you can also paste the log messages like::
|
||||
Note that ``seastar-addr2line`` is able to extract addresses from
|
||||
its input, so you can also paste the log messages as below::
|
||||
|
||||
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr:Backtrace:
|
||||
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e78dbc
|
||||
@ -469,11 +465,11 @@ the input, so you can also paste the log messages like::
|
||||
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e985
|
||||
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: /lib64/libpthread.so.0+0x0000000000012dbf
|
||||
|
||||
Unlike classic OSD, crimson does not print a human-readable backtrace when it
|
||||
handles fatal signals like `SIGSEGV` or `SIGABRT`. And it is more complicated
|
||||
when it comes to a stripped binary. So before planting a signal handler for
|
||||
those signals in crimson, we could to use `script/ceph-debug-docker.sh` to parse
|
||||
the addresses in the backtrace::
|
||||
Unlike the classic ``ceph-osd``, Crimson does not print a human-readable backtrace when it
|
||||
handles fatal signals like `SIGSEGV` or `SIGABRT`. It is also more complicated
|
||||
with a stripped binary. So instead of planting a signal handler for
|
||||
those signals into Crimson, we can use `script/ceph-debug-docker.sh` to map
|
||||
addresses in the backtrace::
|
||||
|
||||
# assuming you are under the source tree of ceph
|
||||
$ ./src/script/ceph-debug-docker.sh --flavor crimson master:27e237c137c330ebb82627166927b7681b20d0aa centos:8
|
||||
|
Loading…
Reference in New Issue
Block a user