Commit Graph

123608 Commits

Author SHA1 Message Date
Patrick Seidensal
037410713f
monitoring: remove instance label from ceph-cluster.json completely
The `instance` label is only useful if

- the exporter returns only data about its node or instance
- the exporter provides an instance label and then may return data about
  other nodes

In this case, it's about the Prometheus mgr module, which is a single
exporter providing data about a whole cluster, so not only data related
to the node (or instance) the mgr module is running on.  It is
completely irrelevant on which node the exporter runs on, the data
provided doesn't change.  The exporter also doesn't provide `instance`
labels (which Prometheus wouldn't change due to our configuration, see
"honor_labels" setting).

(Actually there's one exception where `instance` labels are provided by
the Ceph mgr module, but that doesn't affect the Ceph Cluster
dashboard.)

Note that keeping that instance label on this particular dashboard would
enable the user to switch between a previously failed mgr instance and
the data collected from there and the currently running mgr instance (on
which the Prometheus mgr module runs on).  That'd split the data, which
I don't think is a useful feature, but rather looks broken.

Fixes: https://tracker.ceph.com/issues/51212

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2021-06-16 09:11:30 +02:00
Patrick Seidensal
4270a13d6c
mgr/dashboard: Fix Grafana Ceph Cluster health status widget
The health status widget doesn't show any status because it requires its
query to return a single result. But in case a mgr instance had failed,
it would return more, provided the incident has happened in the
requested time frame.

This is simply an issue of the `instant` switch being disabled for that
widget. As only one mgr instance can ever be providing data at a time,
enabling `instant` completely solves that issue.

Fixes: https://tracker.ceph.com/issues/51212

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2021-06-16 09:10:32 +02:00
Patrick Seidensal
f51cab109d
mgr/dashboard: Fix decimals in OSC Capacity Utilization widget
Fixes: https://tracker.ceph.com/issues/51212

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2021-06-16 09:10:32 +02:00
Patrick Seidensal
5527c1c54f
mgr/dashboard: Remove hard-coded timezone off Grafana dashboards
Remove hard-coded timezone off Grafana dashboards to enable the Grafana
administrator to decide which timezone should be used for dashboards.

If we hard-coded those values, changing the global settings in Grafana
wouldn't have an effect. And the administrators can't change the
automatically imported Grafana dashboards provided by us.

Fixes: https://tracker.ceph.com/issues/51212

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2021-06-16 09:10:32 +02:00
Patrick Seidensal
8218d43e5f
monitoring: convert newline character to LF
Convert newline character from CRLF in `rbd-details.json` to LF, so that
it will be consistent with all the other dashboard JSON files.

Fixes: https://tracker.ceph.com/issues/51212

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2021-06-16 09:10:32 +02:00
Kefu Chai
54d02f098a
Merge pull request #41552 from tchaikov/wip-mgr-find-roots
mgr: expose CRUSHMap.find_roots()

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
2021-05-31 09:40:50 +08:00
J. Eric Ivancich
0cebfae56b
Merge pull request #41563 from cybozu/rgw-add-the-description-of-blocking-io-during-index-resharding
rgw: add the description of blocking io during index resharding

Reviewed-by: Matt Benjamin mbenjamin@redhat.com
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2021-05-29 12:18:45 -04:00
Kefu Chai
2ecb738e2d
Merge pull request #41278 from sebastian-philipp/mgr-cephadm-set-user-no-hosts
mgr/cephadm: Don't call _check_host without hosts

Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
2021-05-29 10:42:14 +08:00
Kefu Chai
2a35c562a1
Merge pull request #41520 from tchaikov/wip-osd-unique-ptr
os: let ObjectStore::create() return unique_ptr<>

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-05-29 10:37:31 +08:00
Kefu Chai
2ba0f48bd1
Merge pull request #41573 from tchaikov/wip-allocat-ctor
os/bluestore: pass string_view to ctor of Allocator

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2021-05-29 10:36:43 +08:00
Kefu Chai
0331281e8a
Merge pull request #41582 from cyx1231st/wip-seastore-swap-read-extent
crimson/seastore: introduce and adopt LBAManager::get_mapping(t, offset)

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-05-28 15:35:01 +08:00
Yingxin Cheng
88a41c3922 crimson/seastore: adopt get_mapping(t, offset) interface
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-05-28 15:05:53 +08:00
Yingxin Cheng
c165a289e6 crimson/seastore: implement and test get_mapping(t, laddr)
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-05-28 15:05:44 +08:00
Yingxin Cheng
6f4b296056 crimson/seastore: add stub to introduce get_mapping() without length
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-05-28 10:30:42 +08:00
Kefu Chai
596ae330d9
Merge pull request #41578 from rzarzynski/wip-crimson-monc-auth-req
crimson/monc: handle_auth_request() doesn't depend on active_con.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-05-28 08:09:07 +08:00
Kefu Chai
9091261749
Merge pull request #41544 from tchaikov/wip-doc-confval
doc/mgr: use confval directive to define options

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-05-28 07:59:34 +08:00
Kefu Chai
dfdcf2cf92 doc/mgr: use confval directive to define options
less repeating this way

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-28 07:44:44 +08:00
Yuri Weinstein
e1f273928d
Merge pull request #41540 from ceph/wip-15213
doc: 15.2.13 Release Notes

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-05-27 16:40:41 -07:00
Sage Weil
1f30c0114d Merge PR #41483 into master
* refs/pull/41483/head:
	cephadm: stop passing --no-hosts to podman
	mgr/nfs: use host.addr for backend IP where possible
	mgr/cephadm: convert host addr if non-IP to IP
	mgr/dashboard,prometheus: new method of getting mgr IP
	doc/cephadm: remove any reference to the use of DNS or /etc/hosts
	mgr/cephadm: use known host addr
	mgr/cephadm: resolve IP at 'orch host add' time

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2021-05-27 19:14:53 -04:00
zdover23
fe258aad49
Merge pull request #41561 from zdover23/wip-doc-cephadm-s-mgmt-service-status-improvement-2021-05-26
doc/cephadm: enrich "service status"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-05-28 07:41:40 +10:00
Sage Weil
d1bb94ba4c cephadm: stop passing --no-hosts to podman
This reverts cfc1f914ce, which is no longer
neceesary because (1) we don't use socket.getfqdn(), and (2) we generally
do not rely on DNS or /etc/hosts at all anymore (with the exception of
the upgrade transition).

Signed-off-by: Sage Weil <sage@newdream.net>
2021-05-27 12:00:20 -04:00
Sage Weil
7e9f4ac7a1 mgr/nfs: use host.addr for backend IP where possible
Signed-off-by: Sage Weil <sage@newdream.net>
2021-05-27 12:00:20 -04:00
Sage Weil
781bfa14ff mgr/cephadm: convert host addr if non-IP to IP
Previously we allowed the host.addr to be a DNS name (short or fqdn).
This is problematic because of the inconsistent way that docker and podman
handle /etc/hosts, and undesirable because relying on external DNS is
an external source of failure for the cluster without any benefit in
return (simply updating DNS is not sufficient to make ceph behave).

So: update any non-IP to an IP as soon as we start up (presumably on
upgrade).  If we get a loopback address (127.0.0.1 or 127.0.1.1), then
wait and hope that the next instance of the manager has better luck.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-05-27 12:00:20 -04:00
Sage Weil
157a7b4183 mgr/dashboard,prometheus: new method of getting mgr IP
- Use a centralized method get_mgr_ip()
- Look up the hostname via DNS.  This is a bit more reliable than
getfqdn() since it will work even when podman adds the container
name to /etc/hosts.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-05-27 12:00:20 -04:00
Sage Weil
872668a9b3 doc/cephadm: remove any reference to the use of DNS or /etc/hosts
Signed-off-by: Sage Weil <sage@newdream.net>
2021-05-27 12:00:20 -04:00
Sage Weil
900880050a mgr/cephadm: use known host addr
If the host IP/addr is known, use that.  The addr might even be a FQDN
instead of an IP address, in which case we want to look that up instead
of the bare hostname.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-05-27 12:00:20 -04:00
Radoslaw Zarzynski
d328cbdfe2 crimson/monc: handle_auth_request() doesn't depend on active_con.
Following crash occured at Sepia [1]:

```
INFO  2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] ProtocolV2::start_accept(): targ
et_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] TRIGGER ACCEPTING, was NONE
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] SEND(26) banner: len_payload=16,
 supported=1, required=0, banner="ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(10) banner: "ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT banner: payload_len=16
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(16) banner features: supported=1 required=0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] WRITE HelloFrame: my_type=osd, peer_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT HelloFrame: my_type=client peer_addr=v2:172.21.15.119:6803/31733
INFO  2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false)
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
 0# 0x000055E84CF44C1F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F2BC88C0B20 in /lib64/libpthread.so.0
 4# crimson::mon::Connection::get_conn() in ceph-osd
 5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
 6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
 7# 0x000055E84DF67669 in ceph-osd
 8# 0x000055E84DF68775 in ceph-osd
 9# 0x000055E846F47F60 in ceph-osd
10# 0x000055E85296770F in ceph-osd
11# 0x000055E85296CC50 in ceph-osd
12# 0x000055E852B1ECBB in ceph-osd
13# 0x000055E85267C73A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
Fault at location: 0x98
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136907

When the `handle_auth_request()` happens, there is no guarantee
`active_con` is being available. This is reflected in the classical
implementation:

```cpp
int MonClient::handle_auth_request(
  Connection *con,
  // ...
  ceph::buffer::list *reply)
{
  // ...
  bool isvalid = ah->verify_authorizer(
    cct,
    *rotating_secrets,
    payload,
    auth_meta->get_connection_secret_length(),
    reply,
    &con->peer_name,
    &con->peer_global_id,
    &con->peer_caps_info,
    &auth_meta->session_key,
    &auth_meta->connection_secret,
    ac);
```

The patch transplate the same logic to crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-05-27 15:47:45 +00:00
Kefu Chai
258ffd289a os/bluestore: pass string_view to ctor of Allocator
just for the sake of correctness, as they don't need a full-blown
std::string, what they need is but a string like object. and they always
create a std::string instance as a member variable if they want to have
a copy of it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:37:08 +08:00
Kefu Chai
d5445b8f11 tools/ceph_objectstore_tool: destruct ObjectStore using unique_ptr<>
before this change, cot never destructs the created ObjectStore
instances.

after this change, they are destructed upon returning from main().

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:14:44 +08:00
Kefu Chai
b04b2f4d2a osd: pass unique_ptr<ObjectStore> to ctor of OSD
less error-prone, and it's simpler to manage the resource using RAII

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:07:10 +08:00
Kefu Chai
2455c901bc osd/OSD: remove unused include headers
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:07:10 +08:00
Kefu Chai
8b2c3a211a osd/OSD: use scope_guard to umount objecstore
RAII can simplify the clean up logic in OSD::mkfs().

and since `ch` is a smart pointer, so it is able to take care of itself,
as long as we ensure that it is destructed before objectstore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:07:10 +08:00
Kefu Chai
3f659a4827 osd: pass unique_ptr<ObjectStore> to OSD::mkfs()
less error prune this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:07:10 +08:00
Kefu Chai
7e8ec0c8ca os: let ObjectStore::create() return unique_ptr<>
instead of returning a raw pointer of ObjectStore, let
`ObjectStore::create()` return a `std::unique_ptr<ObjectStore>`.

less error prune this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-05-27 23:07:10 +08:00
ofriedma
9b4ae60d20
Merge pull request #41495 from pleiadesian/patch-quota-cache
rgw: remove quota soft threshold
2021-05-27 17:46:41 +03:00
ofriedma
428809482b
Merge pull request #41288 from ofriedma/wip-ofriedma-segfault
rgw: crash on multipart upload to bucket with policy
2021-05-27 17:32:08 +03:00
Ilya Dryomov
6bdda825aa
Merge pull request #41529 from Yenya/rbd-deep-cp-docs
doc/rbd: document cp versus deep cp

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-05-27 15:23:42 +02:00
Jan "Yenya" Kasprzak
bf5863baf5 doc/rbd: document cp versus deep cp
I found that the difference between "rbd cp" and "rbd deep cp",
i.e. what "deep" means in this context, is documented only in
the mailing list archive and in the Mimic reelase notes.

Let's make the difference explicit in the manpage and in rbd --help.

Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
2021-05-27 13:47:16 +02:00
Sebastian Wagner
674d6d96da
Merge pull request #41224 from adk3798/change-mon-stack-images-docs
doc/cephadm: recommend redeploying monitoring stack daemon after changing image

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-05-27 11:54:24 +02:00
Kefu Chai
a4c961b93c
Merge pull request #41566 from anthonyeleven/anthonyeleven/update-rgw-yaml-in
src/common/options: improve spelling, capitalization, and wording in rgw.yml.in

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-05-27 17:39:30 +08:00
Sebastian Wagner
39fe1c282d
Merge pull request #41400 from liewegas/fix-50113
doc/releases/pacific: add note about rgw on upgrade

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-05-27 11:36:33 +02:00
Milind Changire
26fbbefa82
Merge pull request #40831 from vshankar/wip-cephfs-mirror-incremental-sync
cephfs-mirror: incremental sync

Reviewed-by: Milind Changire <mchangir@redhat.com>
2021-05-27 13:39:23 +05:30
Ilya Dryomov
dfa0164f71
Merge pull request #41279 from pkalever/promote-attach
rbd: promote rbd-nbd attach and detach at rbd integrated cli

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-05-27 09:58:32 +02:00
Kefu Chai
8bc2c6da10
Merge pull request #41378 from varshar16/wip-check-file-inputs-nfs
pybind/mgr: generalize CLICheckNonemptyFileInput() error msg

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
2021-05-27 15:23:44 +08:00
Kefu Chai
00349925b0
Merge pull request #41381 from AmnonHanuhov/wip-Refactor_PeeringState
crimson/osd: Refactor PeeringState

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-05-27 15:21:47 +08:00
Kefu Chai
1c8ebd6bf3
Merge pull request #41516 from tchaikov/wip-47380
mon/OSDMonitor: drop stale failure_info even if can_mark_down()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-05-27 15:19:12 +08:00
Kefu Chai
e2a050e496
Merge pull request #41546 from tchaikov/wip-crush-alignment
crush/crush: ensure alignof(crush_work_bucket) is 1

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-05-27 15:17:48 +08:00
Kefu Chai
3c6002f52f
Merge pull request #41517 from tchaikov/wip-osd-osd-types
osd/osd_type: use f->dump_unsigned() when appropriate

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-05-27 15:17:11 +08:00
Kefu Chai
3bee93335e
Merge pull request #41527 from t-msn/cleanup-peeringstate-init
osd/PeeringState: cleanup dead code in PeeringState::init

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-05-27 15:16:07 +08:00
Kefu Chai
367cf49690
Merge pull request #41565 from anthonyeleven/anthonyeleven/update-rgw-chunk
doc/radosgw: modernize reference to rgw_max_chunk_size

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-05-27 14:13:24 +08:00