Commit Graph

103555 Commits

Author SHA1 Message Date
Sage Weil
e771ece974 Merge PR #31167 into master
* refs/pull/31167/head:
	os/bluestore: do not mark per_pool_omap updated unless we fixed it

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2019-10-27 09:19:40 -05:00
Kefu Chai
482561bd55 mgr/dashboard: accept socket error 0
see
9d73226ab2

Fixes: https://tracker.ceph.com/issues/38378
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-27 14:34:48 +08:00
Kefu Chai
8a743a8a4e mgr/dashboard: accept exceptions from builtin SSL
see also https://github.com/cherrypy/cheroot/pull/4

so we don't panic when client is trying to talk with us with an
unsupported protocol, the exception should be accepted, and the
client can fallback to supported protocol.

Fixes: https://tracker.ceph.com/issues/38378
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-27 14:34:48 +08:00
Kefu Chai
01cb398019 mgr/dashboard: extract monkey patches for cherrypy out
so they are less distracting to the dashboard developers. and probably
less scaring.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-27 14:34:48 +08:00
Kefu Chai
8adbb86fb8 ceph.spec.in: add missing python-yaml dependency for mgr-k8sevents
otherwise we might have:

```
ceph/src/pybind/mgr/k8sevents/__init__.py", line 1, in <module>
    from .module import Module
  File "/home/kchai/ceph/src/pybind/mgr/k8sevents/module.py", line 28, in <module>
    import yaml
ImportError: No module named yaml
```

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-27 12:46:15 +08:00
Kefu Chai
3f56b1f3ef mgr/diskprediction_cloud: refactor timeout() decorator
* timeout() is never passed any parameter when being called, so let's
  remove the parameters list of "seconds" and "error_message"
* use `getattr()` instead of `hasattr()` for retrieving the
  member variable of `self`
* pass `self` to wrapper function explicitly.
* return `func()` right away.
* hardwire the error message of `TimeoutError` to "Timer expired",
  because
  - as neither errno.ETIME nor errno.ETIMEOUT is portable
  - the only caller of `TimeoutError` is `timeout()`, so there is
    no need to have the flexibility to pass a different error message
* use `wraps()` as a decorator, simpler this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-27 10:15:09 +08:00
Kefu Chai
48bf1c9bf3 mon/OSDMonitor: use initializer_list<> for {si,iec}_options
* use initializer_list<> for {si,iec}_options, no need to uset set<>
* remove the comments, the variable names are self-documented.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-27 09:43:43 +08:00
Kefu Chai
3e2f997061
Merge pull request #31010 from pdvian/wip-fix-target-max
mon/OSDMonitor.cc: Allow pool set target_max_(objects/bytes) with SI/IEC units

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-10-27 09:38:23 +08:00
Kefu Chai
674bd8a9e6
Merge pull request #30434 from smithfarm/wip-41820
qa: enable dashboard tests to be run with "--suite rados/dashboard"

Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-10-27 09:18:16 +08:00
Kefu Chai
c121234b79
Merge pull request #31005 from tchaikov/wip/qa/tasks/ceph/cleanup
qa/tasks/ceph.py: remove unused variables

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-10-27 09:16:37 +08:00
Sage Weil
56e99ba5f0 qa/tasks/cbt: run stop-all.sh when finishing up
stop-all.sh will work if the right deps are there (currently we lack 'nc')

also killall -9 java to be sure.

Fixes: https://tracker.ceph.com/issues/42496
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-26 13:25:39 -05:00
Sage Weil
fa414c7667 Merge PR #31086 into master
* refs/pull/31086/head:
	doc/bootstrap.rst: fix github's url

Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-26 11:42:31 -05:00
Sage Weil
7fc78b6342 Merge PR #31169 into master
* refs/pull/31169/head:
	mgr/ssh: implement (synchronous) describe_service
	ceph-daemon: ls: replace 'active' bool with 'state' enum
	ceph-daemon: include container_id and version in 'ls' output
	ceph-daemon: tolerate systemctl is-* exit codes
	mgr/ssh: modernize timeout config option

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-10-26 10:55:19 -05:00
Alexandre Bruyelles
333895a2ca
doc/bootstrap.rst: fix github's url
Signed-off-by: Alexandre Bruyelles <jack@jack.fr.eu.org>
2019-10-26 17:20:27 +02:00
Kefu Chai
23e39ed9e5
Merge pull request #31126 from wjwithagen/wjw-fix-std-bitset
rgw: Select the std::bitset to resolv ambiguity

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-10-26 09:30:54 +08:00
Dan Mick
baec02e6b2 src/telemetry: remove, now lives in ceph-telemetry.git
Signed-off-by: Dan Mick <dan.mick@redhat.com>
2019-10-25 16:44:05 -07:00
Sage Weil
8bbb13f0eb mgr/ssh: implement (synchronous) describe_service
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 18:43:44 -05:00
Sage Weil
eca63205cc ceph-daemon: ls: replace 'active' bool with 'state' enum
('running', 'inactive', 'error', 'unknown')

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 18:43:44 -05:00
Sage Weil
2039d3aa4a ceph-daemon: include container_id and version in 'ls' output
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 17:58:27 -05:00
Sage Weil
24f55ba832 ceph-daemon: tolerate systemctl is-* exit codes
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 17:58:19 -05:00
Sage Weil
776c0b09fe mgr/ssh: modernize timeout config option
(and rename it)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 16:37:13 -05:00
Ilya Dryomov
b7a0e2adcb qa: add script to stress udev_enumerate_scan_devices()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-25 22:05:38 +02:00
Ilya Dryomov
bd37a72e0e krbd: retry on an empty list from udev_enumerate_scan_devices()
systemd 219 doesn't have the issue that is worked around in the
previous commit, but has a different one: udev_enumerate_scan_devices()
always succeeds, but sometimes returns an empty list when the device is
actually there.  This happens rarely and at random so I haven't been
able to get to the bottom of it yet, but it looks like another similar
race condition in libudev.

Since an empty list is expected if the device isn't there, retry just
twice with a small sleep in-between.  This appears to be enough: I got
7 occurrences per 600000 "rbd unmap" invocations, all of which needed
a single retry:

  rbd: udev enumerate missed a device, tries = 1

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-25 22:05:32 +02:00
Sage Weil
0bc2ed6191 Merge PR #31141 into master
* refs/pull/31141/head:
	ceph-daemon: fix warning message
	ceph-daemon: fix enable with conflicting ceph.target files
	test_ceph_daemon.sh: use same fsid as qa/standalone/test_ceph_daemon.sh
	test_ceph_daemon.sh: use other latest-master image
	ceph-daemon: fix pathify (fix shell --config/--keyring)
	ceph-daemon: /var/run/ceph -> /var/run/ceph/$fsid

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-10-25 14:30:33 -05:00
Sage Weil
3ee7d4154b ceph-daemon: fix warning message
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 12:46:52 -05:00
Josh Durgin
8c7eeab703
Merge pull request #31164 from tchaikov/wip/doc/rados/fix-typo
doc/rados/deployment/ceph-deploy-mon: fix typo

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-10-25 10:41:42 -07:00
Sage Weil
efd3497d03 ceph-daemon: fix enable with conflicting ceph.target files
+ make description better

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 11:26:32 -05:00
Sage Weil
4408911ec4 os/bluestore: do not mark per_pool_omap updated unless we fixed it
We only fix the per-pool-omap issues if we do a non-shallow fsck.

Fixes: https://tracker.ceph.com/issues/42490
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-25 11:02:28 -05:00
Kefu Chai
659456d36a doc/rados/deployment/ceph-deploy-mon: fix typo
s/comingling/commingling/

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-25 22:38:44 +08:00
Kefu Chai
0f30a49953
Merge pull request #31113 from sungjunyoung/do_cmake-warning
do_cmake: Warn user about slow debug performance only for not set

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-10-25 22:24:42 +08:00
Sage Weil
b830870bbb Merge PR #31130 into master
* refs/pull/31130/head:
	ceph-daemon: only set up crash dir mount if it exists

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-10-25 09:18:20 -05:00
Sage Weil
23d22a2781 Merge PR #31138 into master
* refs/pull/31138/head:
	mon: fix tell to hybrid octopus/pre-octopus mons

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-10-25 09:11:36 -05:00
Boris Ranto
a325f28d93 restful: Use node_id for _gather_leaf_ids
The _gather_leaf_ids function doesn't need the node structure, it only
needs the id.

Signed-off-by: Boris Ranto <branto@redhat.com>
2019-10-25 15:51:15 +02:00
Boris Ranto
4f17cbc865 restful: Query nodes_by_id for items
The node dict that is passed to the _gather_leaf_ids function from the
_gather_osds function does not have 'items' in it. We also can't use
buckets at this point since those only exist for leaf nodes, not all
nodes.

We need to query the nodes_by_id dict to get 'items' for a node inside
the _gather_leaf_ids function instead.

Signed-off-by: Boris Ranto <branto@redhat.com>
2019-10-25 15:51:15 +02:00
Ilya Dryomov
e5921ef4a8 krbd: retry on transient errors from udev_enumerate_scan_devices()
udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist.  Some of these errors are suppressed, but
this isn't reliable and varies across versions.  In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum().  In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).

Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices().  Don't limit the number, but log
each retry.

Fixes: https://tracker.ceph.com/issues/41036
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-25 15:28:43 +02:00
Sebastian Wagner
9ba33ac11c python-common: Add mypy testing
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2019-10-25 14:58:41 +02:00
Jason Dillaman
190b0bf433
Merge pull request #31006 from zy751713126/fix_features
librbd: fix rbd_features_to_string output

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-25 08:34:25 -04:00
Jason Dillaman
b0c876b48a
Merge pull request #30955 from runsisi/wip-fix-dup-lock
librbd: force reacquire lock if blacklist is disabled

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-25 08:33:51 -04:00
Jason Dillaman
f80b7dd18f
Merge pull request #30954 from mxdInspur/thickimg_create_progress
rbd:  creating thick-provision image progress percent info exceeds 100%

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-25 08:33:12 -04:00
Kefu Chai
a90a52c428
Merge pull request #31143 from badone/wip-tracker-38466-enable-librabbitmq-devel
ceph.spec.in: enable amqp_endpoint on RHEL8 by default

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-10-25 19:15:43 +08:00
Kefu Chai
6046d4b4c0
Merge pull request #31110 from tchaikov/wip-cmake/enable-seastar-with-dpdk
cmake: support `Seastar_DPDK=ON` option

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-10-25 18:00:03 +08:00
Kefu Chai
770b443e74 auth/cephx: always initialize local variables
to silence GCC warnings like:

rc/auth/cephx/CephxProtocol.h:309:5: warning: 'type' may be used uninitialized in this function [-Wmaybe-uninitialized]
     if (i != tickets_map.end())
     ^~

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-10-25 16:44:53 +08:00
Brad Hubbard
c44c140dfa ceph.spec.in: Enable amqp_endpoint on RHEL8 by default
RHEL/CentOS 8 now provide librabbitmq-devel so we can enable it as a
build requirement.

Fixes: https://tracker.ceph.com/issues/38466

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2019-10-25 18:18:23 +10:00
Sage Weil
12c08ec319 test_ceph_daemon.sh: use same fsid as qa/standalone/test_ceph_daemon.sh
That way these scripts both clean up for each other (and are obviously
bogus clusters).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 21:27:36 -05:00
Sage Weil
4005eddca0 test_ceph_daemon.sh: use other latest-master image
This one is newer?

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 21:26:21 -05:00
Sage Weil
75016caba5 ceph-daemon: fix pathify (fix shell --config/--keyring)
The relative path works on one of my machines but not my laptop; the
full CWD works everywhere.  This is presumably better anyway!

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 21:25:59 -05:00
Sage Weil
d23cf0113d ceph-daemon: /var/run/ceph -> /var/run/ceph/$fsid
This is better than having a single /var/run/ceph on the host with a
weird naming scheme.  Among other things, it means that we can access
the asok for any daemon for a given fsid from any container on the same
host with the same fsid (notably, a shell).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 21:25:17 -05:00
Sage Weil
7bbf9037ca mon: fix tell to hybrid octopus/pre-octopus mons
We can't decide whether to use the new tell command style based on the
monmap.min_mon_release alone because some mons may be octopus even though
that hasn't updated yet.  The same goes for if we look at the combined
features for the cluster--the underlying problem is the monmap doesn't
tell us which mons are octopus and which ones aren't, so we don't know
how to behave.

Instead, allow octopus+ mons to advertise the converted tell commands
going forward, for compatibility with pre-octopus clients (who do the old
style of tell) and for octopus+ clients talking to a min_mon_release <
octopus cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 20:09:05 -05:00
Sage Weil
f56a8db34d ceph-daemon: only set up crash dir mount if it exists
Sometimes we run containers on a host that doesn't have a crash dir set
up (becuase no daemon has been deployed).  Examples include shell and
ceph-volume.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 20:06:23 -05:00
David Zafman
4ea43f7342
Merge pull request #31133 from dzafman/wip-42476
ceph-objectstore-tool: call collection_bits() crashes on the meta col…

Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-24 17:23:48 -07:00