Commit Graph

122100 Commits

Author SHA1 Message Date
胡玮文
9a864f086b cephadm: cleanup extra slash in runtime dir
%t already contains a slash, no need to add an extra one

Signed-off-by: 胡玮文 <huww98@outlook.com>
2021-04-07 21:25:10 +08:00
Kefu Chai
a866516a95
Merge pull request #40581 from weixinwei/master
os/bluestore/BlueFS: do not _flush_range deleted files

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2021-04-07 21:18:01 +08:00
Casey Bodley
4bf2ea2f70
Merge pull request #40601 from cbodley/wip-50147
qa/rgw: move ignore-pg-availability.yaml out of suites/rgw

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2021-04-07 08:49:49 -04:00
Xiubo Li
89c5113561 client: don't allow access to MDS-private inodes
Fixes: https://tracker.ceph.com/issues/50112
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-04-07 19:53:15 +08:00
Rishabh Dave
0e1d9caa0f
Merge pull request #39907 from guits/fix_get_first_lv
ceph-volume: `get_first_lv()` refactor
2021-04-07 17:02:05 +05:30
Ilya Dryomov
dc55f0bb43 packaging: require ceph-common for immutable object cache daemon
This daemon has a systemd service which starts it with --setuser ceph
--setgroup ceph.  "ceph" user and group are created by ceph-common and
won't be there unless ceph-common is installed.

Fixes: https://tracker.ceph.com/issues/50207
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-07 13:11:23 +02:00
Ernesto Puerta
caee16b814
Merge pull request #40624 from rhcs-dashboard/read-only-config-access-disable
mgr/dashboard: Revoke read-only user's access to Manager modules 

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2021-04-07 13:04:28 +02:00
Kefu Chai
91bc0e54ab mgr/selftest: add a command for querying python version
so the test driver can skip certain tests based on the version of python
runtime on the test node

Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-04-07 19:01:46 +08:00
Yuval Lifshitz
83e89dfa33 rgw/multisite: return correct error code when op fails
when trying to disable/enbale sync on non-master zone

Fixes: https://tracker.ceph.com/issues/50201

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
2021-04-07 12:51:30 +03:00
Guillaume Abrioux
a5e4216b49 ceph-volume: get_first_*() refactor
As indicated by commit 17957d9beb those
fuctions were meant to avoid writing something like following:

```
lvs = get_lvs()
if len(lvs) >= 1:
	lvs = lv[0]
```

Those functions should return `None` if 0 or more than 1 item is returned.
The current name of these functions are confusing and can lead to thinking that
we just want the first item returned, even though it returns more than 1
item, let's rename them to `get_single_pv()`, `get_single_vg()` and
`get_single_lv()`

Closes: https://tracker.ceph.com/issues/49643

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-04-07 09:13:39 +02:00
Ronen Friedman
79c5b078e6
Merge pull request #40637 from ronen-fr/wip-ronenf-rgw-logbacking
rgw: fix lambda capture of a non-variable
Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
2021-04-07 10:13:01 +03:00
Kefu Chai
c76940977e
Merge pull request #40604 from liewegas/podman-registries-conf
qa/distros/podman: preserve registries.conf

Reviewed-by: David Galloway <dgallowa@redhat.com>
2021-04-07 15:12:24 +08:00
Kefu Chai
c6e8fc5933
Merge pull request #40487 from tchaikov/wip-boost-1.75
install-deps.sh: install boost 1.75

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-04-07 15:10:34 +08:00
Kefu Chai
bb45bf90a6
Merge pull request #40583 from a16bitsysop/include
tools/cephfs_mirror/PeerReplayer.cc: add missing include

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-04-07 15:07:34 +08:00
Kefu Chai
2b8551a58f
Merge pull request #40607 from tchaikov/wip-journald-if-msghdr
common: disable journald logging backend if struct msghdr is not found

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2021-04-07 15:06:25 +08:00
Kefu Chai
e0f96a6829
Merge pull request #40609 from tchaikov/wip-common-buffer-cleanup
common/buffer.cc: use shift_round_up() when appropriate

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-04-07 15:05:23 +08:00
Xiubo Li
eb89c464ae mds: do not show the default auth if it's unambiguous
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-04-07 14:04:36 +08:00
Xiubo Li
4f63998185 mds: switch to rank number instead
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-04-07 14:04:36 +08:00
Ronen Friedman
6016ae0f46 rgw: fix lambda capture of a non-variable
One cannot just capture a structured binding "non-variable".

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-04-07 08:47:36 +03:00
Ronen Friedman
c57731d3ce Revert "osd: Try other PGs when reservation failures occur"
This reverts commit 08c3ede084.

Due to https://tracker.ceph.com/issues/49868
Should be reinstated once that bug is solved. See tracker comments for analysis
and suggested fixes.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-04-07 08:38:17 +03:00
Ronen Friedman
b8045f7b18 Revert "test: Add test for scrub parallelism"
This reverts commit dd63577ab3.

As 08c3ede084 (the tested functionality) is reverted.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-04-07 08:37:03 +03:00
weixinwei
744bd5271c os/bluestore/BlueFS: do not _flush_range deleted files
Fixes: https://tracker.ceph.com/issues/49861
Signed-off-by: weixinwei <weixw3@lenovo.com>
2021-04-07 10:58:06 +08:00
Sage Weil
72c4fc75ad qa/standalone: default to disable insecure global id reclaim
Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-06 17:29:23 -04:00
Sage Weil
3e80f61efe qa/suites/upgrade/octopus-x: disable insecure global_id reclaim health warnings
These will trigger on upgrade; suppress them so that our health gates
will still work.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-06 17:29:23 -04:00
Sage Weil
9f6fd4fe56 qa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts
Turn these off everywhere for our tests so they don't interfere with our health checks.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-06 17:29:21 -04:00
Sage Weil
7ca7418322 cephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap
If this is a fresh pacific cluster, let's assume that there won't be
legacy clients connecting.  (And if there are, let's put the burden on
the user to enable them to do so insecurely.)

This is in contrast to upgrades, where our focus is on not breaking
anything.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-06 17:28:55 -04:00
Sage Weil
18b343b06e mon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]
Two new alerts:

- AUTH_INSECURE_GLOBAL_ID_RENEWAL_ALLOWED if we are allowing clients to reclaim
global_ids in an insecure manner (for backwards compatibility until
clients are upgraded)

- AUTH_INSECURE_GLBOAL_ID_RENEWAL if there are currently clients connected that
do not know how to securely renew their global_id, as exposed by
auth_expose_insecure_global_id_reclaim=true.  The client auth names and IPs
are listed the alert details (up to a limit, at least).

The docs recommend operators mute these alerts instead of silencing, but
we still include option that allow the alerts to be disabled entirely.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
05772ab612 auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys
When handling CEPHX_GET_AUTH_SESSION_KEY requests from nautilus+
clients, ignore CEPH_ENTITY_TYPE_AUTH in CephXAuthenticate::other_keys.
Similarly, when handling CEPHX_GET_PRINCIPAL_SESSION_KEY requests,
ignore CEPH_ENTITY_TYPE_AUTH in CephXServiceTicketRequest::keys.
These fields are intended for requesting service tickets, the auth
ticket (which is really a ticket granting ticket) must not be shared
this way.

Otherwise we end up sharing an auth ticket that a) isn't encrypted
with the old session key even if needed (should_enc_ticket == true)
and b) has the wrong validity, namely auth_service_ticket_ttl instead
of auth_mon_ticket_ttl.  In the CEPHX_GET_AUTH_SESSION_KEY case, this
undue ticket immediately supersedes the actual auth ticket already
encoded in the same reply (the reply frame ends up containing two auth
tickets).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
522a52e6c2 auth/cephx: rotate auth tickets less often
If unauthorized global_id (re)use is disallowed, a client that has
been disconnected from the network long enough for keys to rotate
and its auth ticket to expire (i.e. become invalid/unverifiable)
would not be able to reconnect.

The default TTL is 12 hours, resulting in a 12-24 hour reconnect
window (the previous key is kept around, so the actual window can be
up to double the TTL).  The setting has stayed the same since 2009,
but it also hasn't been enforced.  Bump it to get a 72 hour reconnect
window to cover for something breaking on Friday and not getting fixed
until Monday.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
08766a17ed mon: fail fast when unauthorized global_id (re)use is disallowed
When unauthorized global_id (re)use is disallowed, we don't want to
let unpatched clients in because they wouldn't be able to reestablish
their monitor session later, resulting in subtle hangs and disrupted
user workloads.

Denying the initial connect for all legacy (CephXAuthenticate < v3)
clients is not feasible because a large subset of them never stopped
presenting their ticket on reconnects and are therefore compatible with
enforcing mode: most notably all kernel clients but also pre-luminous
userspace clients.  They don't need to be patched and excluding them
would significantly hamper the adoption of enforcing mode.

Instead, force clients that we are not sure about to reconnect shortly
after they go through authentication and obtain global_id.  This is
done in Monitor::dispatch_op() to capture both msgr1 and msgr2, most
likely instead of dispatching mon_subscribe.

We need to let mon_getmap through for "ceph ping" and "ceph tell" to
work.  This does mean that we share the monmap, which lets the client
return from MonClient::authenticate() considering authentication to be
finished and causing the potential reconnect error to not propagate to
the user -- the client would hang waiting for remaining cluster maps.
For msgr1, this is unavoidable because the monmap is sent immediately
after the final MAuthReply.  But for msgr2 this is rare: most of the
time we get to their mon_subscribe and cut the connection before they
process the monmap!

Regardless, the user doesn't get a chance to start a workload since
there is no proper higher-level session at that point.

To help with identifying clients that need patching, add global_id and
global_id_status to "sessions" output.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
abebd643cc auth/cephx: option to disallow unauthorized global_id (re)use
global_id is a cluster-wide unique id that must remain stable for the
lifetime of the client instance.  The cephx protocol has a facility to
allow clients to preserve their global_id across reconnects:

(1) the client should provide its global_id in the initial handshake
    message/frame and later include its auth ticket proving previous
    possession of that global_id in CEPHX_GET_AUTH_SESSION_KEY request

(2) the monitor should verify that the included auth ticket is valid
    and has the same global_id and, if so, allow the reclaim

(3) if the reclaim is allowed, the new auth ticket should be
    encrypted with the session key of the included auth ticket to
    ensure authenticity of the client performing reclaim.  (The
    included auth ticket could have been snooped when the monitor
    originally shared it with the client or any time the client
    provided it back to the monitor as part of requesting service
    tickets, but only the genuine client would have its session key
    and be able to decrypt.)

Unfortunately, all (1), (2) and (3) have been broken for a while:

- (1) was broken in 2016 by commit a2eb6ae3fb ("mon/monclient:
  hunt for multiple monitor in parallel") and is addressed in patch
  "mon/MonClient: preserve auth state on reconnects"

- it turns out that (2) has never been enforced.  When cephx was
  being designed and implemented in 2009, two changes to the protocol
  raced with each other pulling it in different directions: commits
  0669ca21f4 ("auth: reuse global_id when requesting tickets")
  and fec31964a1 ("auth: when renewing session, encrypt ticket")
  added the reclaim mechanism based strictly on auth tickets, while
  commit 5eeb711b6b ("auth: change server side negotiation a bit")
  allowed the client to provide global_id in the initial handshake.
  These changes didn't get reconciled and as a result a malicious
  client can assign itself any global_id of its choosing by simply
  passing something other than 0 in MAuth message or AUTH_REQUEST
  frame and not even bother supplying any ticket.  This includes
  getting a global_id that is being used by another client.

- (3) was broken in 2019 with addition of support for msgr2, where
  the new auth ticket ends up being shared unencrypted.  However the
  root cause is deeper and a malicious client can coerce msgr1 into
  the same.  This also goes back to 2009 and is addressed in patch
  "auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys".

Because (2) has never been enforced, no one noticed when (1) got
broken and we began to rely on this flaw for normal operation in
the face of reconnects due to network hiccups or otherwise.  As of
today, only pre-luminous userspace clients and kernel clients are
not exercising it on a daily basis.

Bump CephXAuthenticate version and use a dummy v3 to distinguish
between legacy clients that don't (may not) include their auth ticket
and new clients.  For new clients, unconditionally disallow claiming
global_id without a corresponding auth ticket.  For legacy clients,
introduce a choice between permissive (current behavior, default for
the foreseeable future) and enforcing mode.

If the reclaim is disallowed, return EACCES.  While MonClient does
have some provision for global_id changes and we could conceivably
implement enforcement by handing out a fresh global_id instead of
the provided one, those code paths have never been tested and there
are too many ways a sudden global_id change could go wrong.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
6b860684c6 auth/cephx: make cephx_decode_ticket() take a const ticket_blob
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
b50b6abd60 auth/AuthServiceHandler: keep track of global_id and whether it is new
AuthServiceHandler already has global_id field, but it is unused.
Revive it and let the handler know whether global_id is newly assigned
by the monitor or provided by the client.

Lift the setting of entity_name into AuthServiceHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:55 -04:00
Ilya Dryomov
49cba02a75 auth/AuthServiceHandler: build_cephx_response_header() is cephx-specific
Make the one in CephxServiceHandler private and drop the stub in
AuthNoneServiceHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Ilya Dryomov
c151c9659b auth/AuthServiceHandler: drop unused start_session() args
session_key, connection_secret and connection_secret_required_length
aren't material for start_session() across all three implementations.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Ilya Dryomov
a71f6e90d4 mon/MonClient: drop global_id arg from _add_conn() and _add_conns()
Passing anything but MonClient instance's global_id doesn't make
sense.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Ilya Dryomov
c9b022e073 mon/MonClient: reset auth state in shutdown()
Destroying AuthClientHandler and not resetting global_id is another
way to get MonClient to send CEPHX_GET_AUTH_SESSION_KEY requests with
CephXAuthenticate::old_ticket not populated.  This is particularly
pertinent to get_monmap_and_config() which shuts down the bootstrap
MonClient between retry attempts.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Ilya Dryomov
236b536b28 mon/MonClient: preserve auth state on reconnects
Commit a2eb6ae3fb ("mon/monclient: hunt for multiple monitor in
parallel") introduced a regression where auth state (global_id and
AuthClientHandler) was no longer preserved on reconnects.  The ensuing
breakage was quickly noticed and prompted a follow-on fix 8bb6193c8f
("mon/MonClient: persist global_id across re-connecting").

However, as evident from the subject, the follow-on fix only took
care of the global_id part.  AuthClientHandler is still destroyed
and all cephx tickets are discarded.  A new from-scratch instance
is created for each MonConnection and CEPHX_GET_AUTH_SESSION_KEY
requests end up with CephXAuthenticate::old_ticket not populated.
The bug is in MonClient, so both msgr1 and msgr2 are affected.

This should have resulted in a similar sort of breakage but didn't
because of a much larger bug.  The monitor should have denied the
attempt to reclaim global_id with no valid ticket proving previous
possession of that global_id presented.  Alas, it appears that this
aspect of the cephx protocol has never been enforced.  This is dealt
with in the next patch.

To fix the issue at hand, clone AuthClientHandler into each
MonConnection so that each respective CEPHX_GET_AUTH_SESSION_KEY
request gets a copy of the current auth ticket.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Ilya Dryomov
eec24e4d11 mon/MonClient: claim active_con's auth explicitly
Eliminate confusion by moving auth from active_con into MonClient
instead of swapping them.

The existing MonClient::auth can be destroyed right away -- I don't
see why active_con would need it or a reason to delay its destruction
(which is what stashing in active_con effectively does).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Ilya Dryomov
6faa18e0a8 mon/MonClient: resurrect "waiting for monmap|config" timeouts
This fixes a regression introduced in commit 85157d5aae ("mon:
s/Mutex/ceph::mutex/").  Waiting for monmap and config indefinitely
is not just bad UX, it actually masks other more serious bugs.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-06 17:28:54 -04:00
Sage Weil
9e96806aea Merge PR #40603 into master
* refs/pull/40603/head:
	qa/tasks/ceph.conf: shorten cephx TTL for testing

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2021-04-06 17:28:08 -04:00
Casey Bodley
fb760da0ed
Merge pull request #40190 from cbodley/wip-qa-rgw-sigv4-warnings
rgw: silence some unused variable warnings

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
2021-04-06 16:08:28 -04:00
Sage Weil
3b391cbf7a Merge PR #40602 into master
* refs/pull/40602/head:
	qa/tasks/cephadm: add apply() method/task

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2021-04-06 16:04:27 -04:00
Sage Weil
2218c9473f Merge PR #40597 into master
* refs/pull/40597/head:
	cephadm: pass '-i' to docker|podman run for shell|enter

Reviewed-by: Adam King <adking@redhat.com>
2021-04-06 16:04:17 -04:00
J. Eric Ivancich
b38cdb58dc
Merge pull request #40553 from dang/wip-dang-zipper-list
RGW Zipper - Make sure bucket list progresses

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2021-04-06 13:49:35 -04:00
Patrick Donnelly
61b014c4a6
Merge PR #39939 into master
* refs/pull/39939/head:
	cephfs: ceph-dokan - properly log the mounted root
	cephfs: Update ceph-dokan "--removable" flag
	cephfs: document using multiple fs on Windows
	cephfs: provide additional volume details on Windows
	cephfs: add ceph-dokan unmap command

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-04-06 10:46:23 -07:00
Patrick Donnelly
d0e3b7129d
Merge PR #40418 into master
* refs/pull/40418/head:
	test: unmount when finished ino_release_cb
	test: wait a time for inode release
	qa: move ino_release_cb to libcephfs sub-suite
	qa: simplify recall triggers for bug
	qa: fix name for qa task referencing tracker issue

Reviewed-by: Jeff Layton <jlayton@redhat.com>
2021-04-06 10:45:00 -07:00
Patrick Donnelly
eb38b924ff
Merge PR #40460 into master
* refs/pull/40460/head:
	client: only check pool permissions for regular files

Reviewed-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-04-06 10:43:54 -07:00
Patrick Donnelly
414b5593f0
Merge PR #40465 into master
* refs/pull/40465/head:
	test: bump up retries for `test_mirroring_init_failure_with_recovery` test
	test: fix typo
	test: disable mgr/mirroring for `test_mirroring_init_failure_with_recovery` test

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-04-06 10:43:12 -07:00
Patrick Donnelly
532532e4ce
Merge PR #40468 into master
* refs/pull/40468/head:
	mds/metrics: add one whitespace between metric type the metainfo

Reviewed-by: Varsha Rao <varao@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
2021-04-06 10:42:42 -07:00