when ceph.conf not set public ip & cluster ip, heartbeat will get blank ip address. when osd::_send_boot , classic osd will check if heartbeat front and back addrs are blank ip, if they are blank ip, will use public ip which is learned from mon to set into them. So implement them in crimson osd.
Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
crimson/osd: don't assume a pull must happen if there is no push.
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
crimson/osd: clean the recovery message-related header inclusion.
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
crimson/osd: fix assertion failure in InternalClientRequest.
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Default value for `--crush-device-class` is `None`.
When not passing this parameter, ceph-volume sets the
value "None" in the lv tags.
Therefore, ceph-volume will output that value with calling
`ceph-volume lvm list --format json`
For instance:
```
"1": [
{
"devices": [
"/dev/sdc"
],
"lv_name": "osd-data-5a4a34f5-5733-4c69-b439-edb48e31a45f",
"lv_path": "/dev/ceph-aeb16fc3-9ac2-4126-ab66-bf920d101ea4/osd-data-5a4a34f5-5733-4c69-b439-edb48e31a45f",
"lv_size": "49.00g",
"lv_tags": "ceph.block_device=/dev/ceph-aeb16fc3-9ac2-4126-ab66-bf920d101ea4/osd-data-5a4a34f5-5733-4c69-b439-edb48e31a45f,ceph.block_uuid=E9hZNU-80Zz-PiER-iWN3-jSIU-krEN-khwU3x,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=40fe4af5-0408-444b-843c-0926d550d1f1,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=39680838-19df-4e50-9bb6-46b093d5b52b,ceph.osd_id=1,ceph.type=block,ceph.vdo=0",
"lv_uuid": "E9hZNU-80Zz-PiER-iWN3-jSIU-krEN-khwU3x",
"name": "osd-data-5a4a34f5-5733-4c69-b439-edb48e31a45f",
"path": "/dev/ceph-aeb16fc3-9ac2-4126-ab66-bf920d101ea4/osd-data-5a4a34f5-5733-4c69-b439-edb48e31a45f",
"tags": {
"ceph.block_device": "/dev/ceph-aeb16fc3-9ac2-4126-ab66-bf920d101ea4/osd-data-5a4a34f5-5733-4c69-b439-edb48e31a45f",
"ceph.block_uuid": "E9hZNU-80Zz-PiER-iWN3-jSIU-krEN-khwU3x",
"ceph.cephx_lockbox_secret": "",
"ceph.cluster_fsid": "40fe4af5-0408-444b-843c-0926d550d1f1",
"ceph.cluster_name": "ceph",
"ceph.crush_device_class": "None",
```
ceph-volume should print `"ceph.crush_device_class": "",` instead of `"ceph.crush_device_class": "None",`
Fixes: https://tracker.ceph.com/issues/53425
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add the possibility to skip the `needs_root()` decorator.
See linked tracker for details.
Fixes: https://tracker.ceph.com/issues/53511
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Otherwise, another task may get a reference to the extent before
we've set the pin.
Fixes: https://tracker.ceph.com/issues/53267
Signed-off-by: Samuel Just <sjust@redhat.com>
crimson/os/seastore: fix compiler error for gcc > 9 and clang13
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
We were passing a grace of zero seconds to our temporary work queue, which
led to the HeartbeatMap issuing cpu_tp timeout errors to the log. By using
a non-zero grace period we can avoid these. Use the same default grace
we use for the workqueue itself when it goes to sleep.
Fixes: https://tracker.ceph.com/issues/53506
Signed-off-by: Sage Weil <sage@newdream.net>
In the classical OSD the `ReplicatedRecoveryBackend::recover_object()`
divides into two main flows: pull and push:
```cpp
int ReplicatedBackend::recover_object(
const hobject_t &hoid,
// ...
)
{
dout(10) << __func__ << ": " << hoid << dendl;
RPGHandle *h = static_cast<RPGHandle *>(_h);
if (get_parent()->get_local_missing().is_missing(hoid)) {
ceph_assert(!obc);
// pull
prepare_pull(
v,
hoid,
head,
h);
} else {
ceph_assert(obc);
int started = start_pushes(
hoid,
obc,
h);
// ...
}
return 0;
}
```
Pulls may also enter the push path (`C_ReplicatedBackend_OnPullComplete`)
but push handling doesn't draw any assumption on that. What's important,
`recover_object()` may result in no pulls and pushes.
This isn't the case of crimson as its implementation of the push path
asserts that, if no push is scheduled, `PullInfo` must be allocated.
This patch reworks this logic to reflects the classical one and to avoid
crashes like the following one:
```
DEBUG 2021-12-01 18:43:00,220 [shard 0] osd - recover_object: loaded obc: 3:4e058a2e:::smithi13839607-45:head
WARN 2021-12-01 18:43:00,220 [shard 0] none - intrusive_ptr_add_ref(p=0x6190000d7f80, use_count=3)
WARN 2021-12-01 18:43:00,220 [shard 0] none - intrusive_ptr_release(p=0x6190000d7f80, use_count=4)
TRACE 2021-12-01 18:43:00,220 [shard 0] osd - call_with_interruption_impl clearing interrupt_cond: 0x60300012b210,N7crimson3osd20IOInterruptConditionE
TRACE 2021-12-01 18:43:00,220 [shard 0] osd - call_with_interruption_impl: may_interrupt: false, local interrupt_condintion: 0x60300012b210, global interrupt_cond: 0x0,N7crimson3osd20IOInterruptConditionE
TRACE 2021-12-01 18:43:00,220 [shard 0] osd - set: interrupt_cond: 0x60300012b210, ref_count: 1
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8902-g52fd47fe/rpm/el8/BUILD/ceph-17.0.
0-8902-g52fd47fe/src/crimson/osd/replicated_recovery_backend.cc:84: ReplicatedRecoveryBackend::maybe_push_shards(const hobject_t&, eversion_t)::<lambda()>: Assertion `recovery.pi' failed.
Aborting on shard 0.
```
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Fixes: https://tracker.ceph.com/issues/53185
NCB mishandles fsck DEEP in mount()/umount()/mkfs() case causing it to remove the allocation-file without destaging a new copy (which will cost us a full rebuild on startup)
There are also few confiliting calls to open_db()/close_db() passing inconsistent read-only flag
We fix both issues by storing open-db type (read-only/read-write) and using it for close-db (which won't pass read-only flag anymore)
We also move allocation-file destage to close-db so it will be refreshed after being removed by fsck and such
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
This PR updates the content on documenting-ceph,
which is, as of December 2021, in need of an
update.
This is the first of what I estimate to be three
to five PRs against this .rst file.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
After all OSDs are upgraded to a new release, generate a health warning if
the 'require-osd-release' flag doesn't match the the new release version.
This will result in the cluster showing a warning in the health state until
the flag is set properly.
Fixes: https://tracker.ceph.com/issues/51984
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Note that we don't annotate the dashboard NotificationQueue because it is
used internally by the dashboard with other events.
Signed-off-by: Sage Weil <sage@newdream.net>