Commit Graph

121180 Commits

Author SHA1 Message Date
Sage Weil
3368844d02 cephadm: keepalived needs --cap-add=NET_RAW
This makes

Mar 24 12:00:32 dael conmon[3969650]: Wed Mar 24 16:00:32 2021: cant open raw socket. errno=1

go away and allows it to enter the MASTER state.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-24 16:45:58 -05:00
Sage Weil
6a176b02b1 cephadm: fix --cap-add=NET_ADMIN
Podman wants the = sign.  This aligns us with the other --cap-add user
(SYS_PTRACE), which uses =.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-24 16:45:55 -05:00
Sage Weil
40e29b9786 cephadm: fix quoting for keepalived env var
This was broken by 3ea514c552

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-24 16:45:51 -05:00
Sage Weil
401e725506 mgr/cephadm: ha-rgw: use correct port
The DaemonDescription includes the port that RGW is bound to; use that
in the haproxy configuration.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-22 14:31:31 -05:00
Kefu Chai
009167ea66
Merge pull request #40033 from tchaikov/wip-47380
mon/OSDMonitor: drop stale failure_info after a grace period

Reviewed-by: Sage Weil <sage@redhat.com>
2021-03-22 15:11:23 +08:00
Kefu Chai
f73716402b
Merge pull request #40102 from tchaikov/wip-doc-fixes
doc: theme, cmake and formatting related fixes

Reviewed-by: Zac Dover <zac.dover@gmail.com>
2021-03-21 22:04:02 +08:00
Kefu Chai
1c65e84bca
Merge pull request #40272 from tchaikov/wip-install-dep-remove-existing-boost
install-deps.sh: remove existing ceph-libboost of different version

Reviewed-by: David Galloway <dgallowa@redhat.com>
2021-03-21 13:43:24 +08:00
Sage Weil
9e8253f69c Merge PR #40147 into master
* refs/pull/40147/head:
	python-common: Validate characters in service_id for container names

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-03-20 19:57:23 -04:00
Sage Weil
664b08b954 Merge PR #40244 into master
* refs/pull/40244/head:
	qa/suites/rados/cephadm/smoke-roleless: deploy additional daemon types

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-03-20 19:56:58 -04:00
David Galloway
10f02f93d2
Merge pull request #40266 from jdurgin/wip-release-notes-retry
script/ceph-release-notes: add retries to pull request fetching
2021-03-20 14:58:42 -04:00
Kefu Chai
8585f05b84
Merge pull request #40271 from liu-chunmei/seastore_fix_segment_cleaner
crimson/seastore: fix segment_cleaner bugs

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-03-20 22:34:53 +08:00
Sage Weil
bbb5490e9c Merge PR #40219 into master
* refs/pull/40219/head:
	mon/MgrStatMonitor: ignore MMgrReport from non-active mgr
	mgr: tell monc when we get new servicemap, fsmap

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2021-03-20 09:17:26 -04:00
Sage Weil
66a2f616fb Merge PR #40117 into master
* refs/pull/40117/head:
	mgr/orchestrator: DG loads properly the unmanaged attribute

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2021-03-20 09:16:41 -04:00
Sage Weil
8c2f9d2ef0 Merge PR #40103 into master
* refs/pull/40103/head:
	cephadm: fix a minor typo in logging message

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
2021-03-20 09:16:14 -04:00
Sage Weil
c9f97045a2 Merge PR #40220 into master
* refs/pull/40220/head:
	mgr/cephadm: identify rgw, cepfs-mirror in servicemap
	mgr/ServiceMap: adjust 'ceph -s' summary
	rgw: register daemons in servicemap by gid; include id
	cephadm: fix rbd-mirror auth name

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2021-03-20 09:15:58 -04:00
Sage Weil
e1cc63c116 Merge PR #40222 into master
* refs/pull/40222/head:
	mgr/orchestrator: remove image name field from 'orch ps' and 'orch ls'

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2021-03-20 09:14:31 -04:00
Sage Weil
fdfd4f8b14 Merge PR #40224 into master
* refs/pull/40224/head:
	qa/suites/rados/cephadm/dashboard: test on centos

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-03-20 09:14:21 -04:00
Sage Weil
84e6fa6c93 Merge PR #40241 into master
* refs/pull/40241/head:
	cephadm: use debug verbosity during container exec

Reviewed-by: Adam King <adking@redhat.com>
2021-03-20 09:14:10 -04:00
Kefu Chai
939b147a55 install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,

- nautilus: v1.72
- octopus, pacific: v1.73

they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.

in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.

ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 13:06:08 +08:00
chunmei-liu
179eb156b7 crimson/seastore: fix segment_cleaner bugs
Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
2021-03-19 21:16:44 -07:00
Patrick Donnelly
b4dceb0297
Merge PR #40214 into master
* refs/pull/40214/head:
	mgr/volumes: Retain suid/guid bits in subvolume clone
	pybind/cephfs: Add lchmod python binding
	client/libcephfs: Add lchmod

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
2021-03-19 20:11:47 -07:00
Josh Durgin
fe403a3c78 script/ceph-release-notes: add retries to pull request fetching
API rate limits are easily hit without this for major releases.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2021-03-19 21:11:29 -04:00
Sage Weil
9183d96e93 Merge PR #40242 into master
* refs/pull/40242/head:
	mgr/cephadm/upgrade: do not repeat crash message
	mgr/cephadm/upgrade: a little less verbose
	mgr/cephadm: don't log not-ok-to-stop at ERR level
	mgr/cephadm: is presumed -> appears
	mgr/cephadm: don't double-log ok-to-stop results
	mgr/cephadm/upgrade: include upgrade progress in ceph -s

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-03-19 16:42:14 -04:00
Sage Weil
2bd11c4ceb mgr/cephadm: identify rgw, cepfs-mirror in servicemap
Like rbd-mirror, cephfs-mirror and rgw daemons register under their gid.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 13:13:52 -04:00
Sage Weil
ab0d8f2ae9 mgr/ServiceMap: adjust 'ceph -s' summary
- Do not list individual daemon ids as this won't scale for larger
  clusters
- Do not contemplate multile daemons of the same type that register with
  different "daemon_type" -- not until we actually have any that do that.
- Present counts by various groupings: distinct hosts and rgw zones to
  start.

  services:
    mon:           1 daemons, quorum a (age 4m)
    mgr:           x(active, since 3m)
    osd:           1 osds: 1 up (since 3m), 1 in (since 3m)
    cephfs-mirror: 1 daemon active (1 hosts)
    rbd-mirror:    2 daemons active (1 hosts)
    rgw:           2 daemons active (1 hosts, 1 zones)

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 13:13:52 -04:00
Kefu Chai
9212d7696a
Merge pull request #40230 from tchaikov/wip-rgw-test-boost-asio
cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT for rgw tests

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2021-03-20 00:13:16 +08:00
Sage Weil
e48d80671a qa/suites/rados/cephadm/smoke-roleless: deploy additional daemon types
Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 11:07:25 -05:00
Kefu Chai
ff077fc3ea osd: drop entry in failure_pending when resetting stale peer
no need to keep it in the pending list anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
253cb8f411 osd: mark HeartbeatInfo::is_stale() and friends "const"
just for more const correctness.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
a124ee85b0 mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.

in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.

in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".

Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
6e512b2f1e mon/OSDMonitor: restructure OSDMonitor::check_failures() loop
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.

Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
d42815d5e9 mon/OSDMonitor: extract get_grace_time()
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
09216c01be mon/OSDMonitor: do not return old failure report when updating it
there is no need to return stale report, as the caller is not interested
in it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
062a3859b9 mon/OSDMonitor: do not return no_reply() again
we always return "no_op" message to proxy monitor in
`OSDMonitor::prepare_failure()` at the very beginning of this method. so
no need to reply the peon again when discarding the failure report.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Kefu Chai
164ff62aa5 mon/Monitor: early return if routed request is not found
* early return if routed request is not found in routed_requests.
  reduce the indent level, for better readability.
* do not look up the request twice. for better performance.
* use unique_ptr<> for holding the request, for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-20 00:04:32 +08:00
Sage Weil
217ddfeb22 mgr/cephadm/upgrade: do not repeat crash message
Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 10:46:09 -04:00
Sage Weil
e03fffe648 mgr/cephadm/upgrade: a little less verbose
The _do_upgrade() method runs a zillion times; try to report fewer
repetitive messages on every iteration.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 10:44:19 -04:00
Kefu Chai
36d2f006c6 cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT for rgw tests
otherwise unittest_rgw_iam_policy does not compile with boost v1.75

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-19 22:39:20 +08:00
Sage Weil
0d5787c0d1 mgr/cephadm: don't log not-ok-to-stop at ERR level
This is normal during the upgrade; INF is fine.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 10:38:06 -04:00
Sage Weil
3ea3ee5c09 mgr/cephadm: is presumed -> appears
The old wording was weird.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 10:37:37 -04:00
Sage Weil
df7af90b89 mgr/cephadm: don't double-log ok-to-stop results
The calling upgrade code also reports this.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 10:37:16 -04:00
Sage Weil
efb7ab22a4 mgr/cephadm/upgrade: include upgrade progress in ceph -s
Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 10:31:24 -04:00
Sage Weil
4bcf9c3422 Merge PR #40218 into master
* refs/pull/40218/head:
	cephadm: make default image the daily master build

Reviewed-by: Michael Fritch <mfritch@suse.com>
2021-03-19 10:21:20 -04:00
Michael Fritch
46f00a7bd7
cephadm: use debug verbosity during container exec
avoid failures from appearing on the consle when exec'ing within the
container during the `ls` command

Signed-off-by: Michael Fritch <mfritch@suse.com>
2021-03-19 08:15:24 -06:00
Kefu Chai
bb70e94dd7
Merge pull request #40232 from tchaikov/wip-rgw-drop-unused-var
rgw/rgw_zone: drop unused variable

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2021-03-19 22:05:55 +08:00
Kefu Chai
9521e38450
Merge pull request #40205 from tchaikov/wip-promtool-podman-docker
test: run promtool test without docker on focal

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
2021-03-19 22:03:50 +08:00
Sage Weil
04e89d57e7 qa/suites/rados/cephadm/dashboard: test on centos
Fixes: https://tracker.ceph.com/issues/49638
Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 08:56:19 -05:00
Sage Weil
afc33758e0 rgw: register daemons in servicemap by gid; include id
Registering by gid allows multiple radosgw instances to share an auth
key/identity.  Including the id in the metadata allows them to still be
identified by name (even if not uniquely).

Signed-off-by: Sage Weil <sage@newdream.net>
2021-03-19 09:45:49 -04:00
Kefu Chai
8c28c79856 cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT for rgw tests
otherwise unittest_rbd_mirror does not compile with boost v1.75

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-19 20:35:52 +08:00
Kefu Chai
f381aa8bf0 test: run promtool test without docker on ubuntu/focal
before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.

after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.

* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert
  53a5816ded, as we don't need to
  pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
  command, bail out if it is not.

see also: https://tracker.ceph.com/issues/49653

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-19 20:35:51 +08:00