Commit Graph

108777 Commits

Author SHA1 Message Date
Sage Weil
f44c11d1a5 mgr/cephadm: do not remove service spec when removing a daemon
This makes our behavior similar to kube: if you kill a pod, the operator
or controller will come along and create a new one (probably somewhere
else).

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 08:10:54 -06:00
Joshua Schmid
24e05378dc mgr/cephadm: rename completion variables&cleanup
Signed-off-by: Joshua Schmid <jschmid@suse.de>
2020-03-01 08:10:54 -06:00
Joshua Schmid
09ad4d39f4 mgr/cephadm: leverage service specs
Fixes: https://tracker.ceph.com/issues/44205

This does a couple of things:

* Change the way apply_$service() works:

Instead of triggering the deployment mechanism it will rather
transform the already passed ServiceSpec into a json representation
and save it in a persistent mon_store section.

`mgr/cephadm/service_spec/$service|daemon_type/service_name`

These locations will be periodically checked in the serve() thread.
This works since all the apply_$service_type functions are idempotent.

* Allow to save a config-like specification in the mon_store.

`ceph orch apply -i <service_spec_file.yaml>`

will read the specified services and save them in the mon store
section like mentioned above. The same serve() mechanism like above
also applies to deployment.

Signed-off-by: Joshua Schmid <jschmid@suse.de>
2020-03-01 08:10:54 -06:00
Sage Weil
4c5827241a qa/suites/rados/singleton-bluestore/cephtool: whitelist MON_DOWN
cephtool/test.sh now includes a test that disallows mon from the quorum
for a short period.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 08:03:21 -06:00
Sage Weil
1400b35858 qa/suites/rados/verity/tasks/mon_recovery: whitelist SLOW_OPS
The mon can see slow ops when thrashing.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 07:58:11 -06:00
Sage Weil
a181ad533d mgr/test_orchestrator: update_foo -> apply_foo
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 07:23:54 -06:00
Sage Weil
faadaa3155 Merge PR #33639 into master
* refs/pull/33639/head:
	pybind/mgr/mgr_module: fix standby module logging options

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-03-01 07:22:45 -06:00
Sage Weil
ad0bbc8f78 msg: add get_{pid,random}_nonce() helpers
In cases where we normally use a pid for a nonce, fall back to a random
value when the pid == 1 (i.e., we're in a container).  For the cases where
we use a random value, use the helper.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 20:58:15 +08:00
Sage Weil
a403e233ed msg/Policy: make stateless_server default to anon (again)
Midway through the octopus cycle, we made stateless server more stateless
in the sense that it would not register incoming client connections.  And,
in so doing, it would not enforce that client connections came from
unique addresses, by closing an existing connection from the same addr
when a new connection was accepted.

This turned out to cause out of order OSD ops because the OSD needed that
behavior.  See https://tracker.ceph.com/issues/42328.  We fixed that by
reverting to the old behavior for all but monitor connections, where we
needed it, in 507d213cc4.

This, in turn, breaks most OSD <-> OSD communication (and probably lots
of other things) with cephadm, because we make entity_addr_t unique with
a nonce that is populated by getpid()... and the containerized daemons
all have pid 1.  When we finally merged the follow-on fixes for the change
above cephadm OSDs can't ping each other.

In my view, the 'anon' connection handling is a good idea in the general
case.  So, let's adjust our fix for #42328 so that it is only the OSD
client-side interface that registers client connections and makes them
unique.

Fixes: https://tracker.ceph.com/issues/44358
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 20:34:10 +08:00
Sage Weil
e05a07ded9 osd: drop broken 'poison pill'
This was effectively a no-op, since the default policy was *also*
stateless_server.

This line originates from v0.24 (2010) when we added the cluster msgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 20:34:10 +08:00
Mykola Golub
f863d18d64 test/run-rbd-tests: properly initialize newly created rbd pool
The scheduler tests from cli_generic expect 'rbd' pool marked as rbd
application pool.

Signed-off-by: Mykola Golub <mgolub@suse.com>
2020-03-01 08:54:44 +00:00
Kefu Chai
dcbb682763
Merge pull request #31109 from liupengs/wip-msg-async-fix-event-center-block
msg/async/rdma: unblock event center if the peer is down when connecting

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-03-01 15:47:19 +08:00
liupengs
9fbe5bc754 msg/async/rdma: move C_handle_connection to RDMAConnectionSocketImpl.cc
Signed-off-by: Peng Liu <liupeng37@baidu.com>
2020-03-01 14:17:39 +08:00
Kefu Chai
fef57508fa
Merge pull request #33591 from badone/wip-install-deps-set-gpgcheck-for-reals
install-deps.sh: Actually set gpgcheck to false

Reviewed-by: Dan Mick <dan.mick@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-03-01 13:43:26 +08:00
liupengs
8b2a95011c msg/async/rdma: fix bug event center is blocked by rdma construct connection for transport ib sync msg
We construct a tcp connection to transport ib sync msg, if the
remote node is shutdown (shutdown by accident), the net.connect will be blocked until timeout
is reached, which cause the event center be blocked.

This bug may cause mon probe timeout and osd not reply, and so on.

Signed-off-by: Peng Liu <liupeng37@baidu.com>
2020-03-01 12:55:24 +08:00
Sage Weil
531c9ab4fa mgr/test_orchestrator: add force flag to remove_daemons
Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-29 19:29:38 -06:00
Sage Weil
1428f54490 qa/tasks/mgr/test_orchestrator_cli: update
Most of these were broken due to CLI changes weeks ago.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-29 17:37:18 -06:00
Sage Weil
f2f738184a Merge PR #33634 into master
* refs/pull/33634/head:
	qa/workunits/cephadm/test_cephadm.sh: dump logs on exit
	qa/workunits/cephadm/test_cephadm.sh: add `cleanup` function

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-29 16:13:07 -06:00
Sage Weil
abcee7133b pybind/mgr/mgr_module: fix standby module logging options
We need to define the module options and their default so that
_configure_logging can succeed.

Broken by 8ec3b3d3cc

Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-29 10:44:43 -06:00
Sage Weil
45555c529d Merge PR #33438 into master
* refs/pull/33438/head:
	cephadm: add prometheus adopt

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-28 21:32:47 -06:00
Sage Weil
d8f8074534 Merge PR #33433 into master
* refs/pull/33433/head:
	cephadm: also return JSON decode error.

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-28 21:32:36 -06:00
Yehuda Sadeh
e28718eaa1 rgw: move frontends initial init to after global_init()
So that central config could be used

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:52 -08:00
Yehuda Sadeh
c48f944a2e rgw: ssl: don't try to init certificate if not needed
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
e71a1984ef rgw: frontend: add rgw_frontend_defaults configurable
Allow setting default configuration per frontend config option. This allows
having defaults for part of the frontend config options and set others.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
63c1428966 rgw: beast ssl: enable use of meta variable for cert config
Add the following variables:
$realm, $realm_id, $zonegroup, $zonegroup_id, $zone, $zone_id

Variables can also be passed in with curly braces, e.g., ${realm}

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
9745b76433 rgw: beast ssl: improve output
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
0b1c81c5e9 rgw: beast frontend: handle default ssl configurables
Also don't exit immediately if can't init ssl cert, only if cert is actually
needed (if ssl_port/ssl_endpoint is configured).

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
7fd89b1b62 rgw: update docs about ssl config through config-key
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
e765fd6b13 rgw: allow beast ssl frontend cert config via mon config-key
Fixes: https://tracker.ceph.com/issues/44128

For example:

radosgw --rgw-frontends= \
  "beast ssl_port=443 \
  ssl_certificate=config://rgw/cert/domain.crt \
  ssl_private_key=config://rgw/cert/domain.key"

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
8ca78a166c rgw: create config-key svc
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Yehuda Sadeh
958aa48eb7 rgw: svc/rados: new mon_command call
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2020-02-28 19:16:47 -08:00
Kefu Chai
c20f4ab840
Merge pull request #33596 from badone/wip-serve-doc-python3
admin/serve-doc: Switch to python3 only

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-02-29 09:58:01 +08:00
Brad Hubbard
8ccf43bce9 admin/serve-doc: Switch to python3 only
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2020-02-29 10:29:21 +10:00
Michael Fritch
5bdcf00e87
qa/workunits/cephadm/test_cephadm.sh: dump logs on exit
dumps the last few lines from each of the surviving daemon logs

Signed-off-by: Michael Fritch <mfritch@suse.com>
2020-02-28 15:26:56 -07:00
Michael Fritch
3cc2fc9f5f
qa/workunits/cephadm/test_cephadm.sh: add cleanup function
moves logic for clean-up during `trap EXIT` into a function

Signed-off-by: Michael Fritch <mfritch@suse.com>
2020-02-28 14:45:42 -07:00
Jason Dillaman
bb6ca5c2df
Merge pull request #33624 from trociny/wip-dateutil-dep
rpm,deb: fix python dateutil module dependency

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
2020-02-28 15:40:01 -05:00
Casey Bodley
08e0a4289d qa/rgw: verify suite selects a random bucket sharding configuration
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2020-02-28 14:38:13 -05:00
Casey Bodley
101f41e1e9 qa/rgw: add different bucket sharding overrides
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2020-02-28 14:36:59 -05:00
Casey Bodley
481b4fcdcf rgw: move ShardTracker::next_candidate() into lamdba
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2020-02-28 13:58:53 -05:00
Casey Bodley
bcc1bcf5e6 rgw: bucket_list_ordered loops until it gets a unique candidate
when we detect a duplicate common prefix, we need to loop until we get
the next unique candidate. we must add a new candidate for each shard,
or we won't visit it again and would miss later entries

Fixes: https://tracker.ceph.com/issues/44353

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2020-02-28 13:02:54 -05:00
Casey Bodley
81cfd5da3f rgw: bucket_list_ordered advances past duplicate common prefixes
we may see the same common prefix from more than one shard. when we
detect a duplicate, we need to advance past it. otherwise, we may make
the wrong decision about is_truncated because the shards with
duplicates won't be at_end()

Fixes: https://tracker.ceph.com/issues/44353

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2020-02-28 13:02:54 -05:00
Ziye Yang
e14f02ca0d NVMEDevice: Remove the unnecessary aio_wait in sync read
Using the aio_wait are unncessary, since all the async read
submission and completion happen in the same thread.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2020-02-29 01:49:57 +08:00
Nathan Cutler
839fc76f99 doc/cephadm/administration: clarify log gathering
This is an attempt to bring the current state of the documentation more
into line with the current state of the cephadm code.

However, when I try to grab logs from a daemon on a host other than the
one where the daemon is running, I get an empty log...

References: https://tracker.ceph.com/issues/44354
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2020-02-28 18:16:17 +01:00
Venky Shankar
33f6263e5f doc: update clone section for mgr/volumes w/ attr synchronization changes
Fixes: http://tracker.ceph.com/issues/43965
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-02-28 11:42:17 -05:00
Venky Shankar
db8b8a6791 test: revert to default mount state in test_cephfs:test_mount_root()
without this tests written after test_mount_root() in the source
were failing with

     cephfs.LibCephFSStateError: You cannot perform that operation on a
     CephFS object in state initialized.

... since the test fiddles with the default mount which is root.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-02-28 11:42:17 -05:00
Venky Shankar
e22d546beb test: add test for verifying inode attrbiutes sync on clone
Also, for existing tests, additionally verify inode attributes
for clones.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-02-28 11:42:17 -05:00
Venky Shankar
aac6b5f1ce mgr/volumes: synchronize inode attributes for cloned subvolumes
Synchronize ownership, permission and inode timestamp (access and
modification times) for all supported inode types. Note that inode
timestamps are synchronized upto seconds granularity.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-02-28 11:42:17 -05:00
Venky Shankar
fc49ffd227 pybind/cephfs: pybind calls for changing inode timestamps
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-02-28 11:42:17 -05:00
Venky Shankar
ea97410fb7 pybind/cephfs: pybind call for changing ownership for symlinks
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-02-28 11:42:17 -05:00
Mykola Golub
62620fc7d3 rpm,deb: fix python dateutil module dependency
(needed for mgr/rbd_support)

Signed-off-by: Mykola Golub <mgolub@suse.com>
2020-02-28 16:29:26 +00:00