Commit Graph

97170 Commits

Author SHA1 Message Date
Adam C. Emerson
4771a4cb1e Add function2 header
From https://github.com/Naios/function2/

This provides unique_function, a move-only, type-erased function
wrapper.

I copied the file in rather than using a submodule because it's only
one file, and a submodule seems rather like killing a mosquito with an
anti-tank gun.

Chosen over the slightly more capable cxx_function because that
version fails its basic contract.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2019-04-08 12:45:41 -04:00
Adam C. Emerson
b2fea9ada4 librados: Make ObjectOperation move constructible
Since it just holds a pointer to a heap-allocated ObjectOperationImpl,
there's no reason not to. This allows us to move them around so the
call to aio_operate can happen after the stack frame where the
ObjectOperation was created has exited.

Would have just switched to a unique_ptr but the test harness relies
on being able to slip its own stuff in there and it was less work to
write my own move constructors than edit all the casts there to use
the get method.

Since there is no move construction now, no code will be broken if we
assert on use of moved-from objects and feedback suggests that people
think this is likelier to catch erroneous use than lazy
reinitialization.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2019-04-08 12:43:48 -04:00
Patrick Donnelly
7b03130e2a
Merge PR #27281 into master
* refs/pull/27281/head:
	script/ceph-release-notes: alternate merge commit format

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
2019-04-08 09:35:43 -07:00
Kefu Chai
8c553e3522
Merge pull request #26782 from iotcg/master
vstart.sh: fix CEPH_PORT check and cleanups

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-04-09 00:32:46 +08:00
Patrick Donnelly
b90d566d34
Merge PR #27437 into master
* refs/pull/27437/head:
	vstart: add an alias for cephfs-shell to vstart_environment.sh

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-08 09:26:01 -07:00
Sage Weil
1aabb186b7 Merge PR #27419 into master
* refs/pull/27419/head:
	common: add --log-early command line option

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-04-08 10:17:50 -05:00
Yuval Lifshitz
9641f27eff
Merge pull request #27091 from yuvalif/s2_pubsub_api_new
S3 compatible pubsub API
2019-04-08 17:31:40 +03:00
Sage Weil
441a3160b6 Merge PR #27374 into master
* refs/pull/27374/head:
	mgr/volume: set cephfs metadata bias at 4x
	mgr/volume: default to 16 PGs (min) for metadata pool

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-08 09:25:14 -05:00
Sage Weil
5e6f43ed03 Merge PR #27349 into master
* refs/pull/27349/head:
	qa/distros/supported/ubuntu_latest: 16.04 -> 18.04
	qa/distros/supported/centos_latest: 7.5 -> 7.6
	qa/distros: add centos 7.6

Reviewed-by: David Galloway <dgallowa@redhat.com>
2019-04-08 09:24:26 -05:00
Sage Weil
8420e3f32b Merge PR #27386 into master
* refs/pull/27386/head:
	os/filestore/FileJournal: note EIO events
	os/filestore: make note of EIO errors when we see them
	os/filestore: note devname for later use
	global/signal_handler: avoid core dump on EIO
	os/bluestore/KernelDevice: note EIO metadata on aio EIO
	global: add hook to annotate crash report with EIO information

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-08 09:24:09 -05:00
Sage Weil
1695764436 Merge PR #27337 into master
* refs/pull/27337/head:
	msg/async: add timeout for connections which are not yet ready
	msg: rename ms_tcp_read_timeout to ms_connection_idle_timeout

Reviewed-by: Ricardo Dias <rdias@suse.com>
2019-04-08 09:13:37 -05:00
Jeff Layton
91d3d96691 vstart: add an alias for cephfs-shell to vstart_environment.sh
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2019-04-08 10:00:03 -04:00
Kefu Chai
fbc71dcf27
Merge pull request #27341 from liewegas/wip-learn-addr-from-peer
msg/async/ProtocolV[12]: add ms_learn_addr_from_peer

Reviewed-by: Ricardo Dias <rdias@suse.com>
2019-04-08 19:53:09 +08:00
Kefu Chai
2cecb87855
Merge pull request #27340 from tchaikov/wip-cmt-more-chatty
ceph-monstore-tool: print out caps when rebuilding monstore

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-08 19:50:01 +08:00
Kefu Chai
fcca032bed
Merge pull request #27352 from liewegas/wip-deferred-log-start
common: start logging for non-global_init users

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-04-08 19:49:11 +08:00
Kefu Chai
0565e35d19
Merge pull request #27324 from batrick/async-msgr-clear
msg/async: use faster clear method to delete containers

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-08 19:48:08 +08:00
Kefu Chai
3805935ae0
Merge pull request #26806 from xiexingguo/wip-repair-eio-rep
osd: automatically repair replicated replica on pulling error

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-04-08 19:46:36 +08:00
Kefu Chai
1cb3847a89
Merge pull request #27417 from tchaikov/wip-rpm-python36-el7
rpm: use python 3.6 as the default python3

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2019-04-08 15:32:05 +08:00
Changcheng Liu
a69d611269 vstart.sh: add space between option and arg in cut command
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
2019-04-08 12:52:10 +08:00
Changcheng Liu
aa0444cbda vstart.sh: improve readability of ceph-conf for default op:lookup
The default operation os ceph-conf is --lookup option.
Add it obviously to enhance its readiness.

Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
2019-04-08 12:52:10 +08:00
Changcheng Liu
ab60180551 vstart.sh: set init-ceph under CEPH_BIN directly
$CEPH_BUILD_DIR/bin/init-ceph is equal to
$CEPH_BIN/init-ceph

Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
2019-04-08 12:52:10 +08:00
Changcheng Liu
bf213a3c64 vstart.sh: align add/minus shell var usage
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
2019-04-08 12:52:10 +08:00
Changcheng Liu
a7caf1482b vstart.sh: fix typo when getting ipv4 address
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
2019-04-08 12:52:10 +08:00
Changcheng Liu
d19db0af84 vstart.sh: fix ceph random port check
The regular expression should check the port first, then
check "LISTEN" item. "LISTEN" is not after the port item.

Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-08 12:51:49 +08:00
Kefu Chai
658ca2b0d7 rpm: use python 3.6 as the default python3
some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/

and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.

as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.

but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:

- none of the packages installed by `yum-builddep` installs the python3
  related rpm macros, so the system stays with whatever python3 it was
  using. in this case, `rpmbuild` won't complain, as the
  `python3_pkgversion` and `python_version` are consistent before and
  after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
  `yum-builddep` installed python3.6 and also the updated
  `python-rpm-macros` packages, which points `python3_version` and
  `python3_pkgversion` to 3.6 and 36 respectively. in this case,
  `rpmbuild` will complain, because when we run `yum-builddep`,
  `python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
  it was using python34 for preparing the "BuildRequires". but some
  of the packages installed by `yum-builddep` installs python36, and
  also the updated `python-rpm-macros` packages, which points
  `python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
  in this case, `rpmbuild` will complain, because the python36 related
  dependencies are missing. what the system has is python34
  dependencies.
- system does not have python3 installed before `yum-builddep`. so
  it was using python34 for preparing the "BuildRequires". but some
  of the packages installed by `yum-builddep` installs python34, and
  also the updated `python-rpm-macros` packages, which points
  `python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
  in this case, `rpmbuild` won't complain, as the
  `python3_pkgversion` and `python_version` are also consistent before and
  after `yum-builddep`.

as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-08 10:12:32 +08:00
xie xingguo
7209cc6aa3 msg/async: add timeout for connections which are not yet ready
There could be various corner cases that may cause an async
connection stuck in the connecting stage (e.g., by manually
creating some loop back connections on the switches of our test cluster,
we can almost 100% reproduce http://tracker.ceph.com/issues/37499).

In 61b9432ef9 I try to employ the
existing keep_alive mechanism to get those stuck connections out of the
trap but it does not work if the corresponding connection
is not yet ready, since we always require the underlying connection to be
**ready** in order to send out a keep_alive message.

Fix by making a more general connecting timeout strategy.
If a connecting process can not be finished within a specific interval,
then we simply cut it off and retry.

Fixes: http://tracker.ceph.com/issues/37499
Fixes: http://tracker.ceph.com/issues/38493
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-08 09:19:59 +08:00
Sage Weil
614fb4631a os/filestore/FileJournal: note EIO events
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 16:01:38 -05:00
Sage Weil
8827250d5a os/filestore: make note of EIO errors when we see them
This is imprecise, since we can't (easily) map an EIO back to a specific
part of the device, or even (easily) tell whether it was a read or write
error.  It's enough to mark a crash dump as an EIO event, though, and to
include the name of the (primary) filestore device.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 16:01:38 -05:00
Sage Weil
cef94b72b2 os/filestore: note devname for later use
This will generally happen early, before we see an EIO error and need it.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 16:01:38 -05:00
Sage Weil
e3464df47b global/signal_handler: avoid core dump on EIO
Generating a core dump is overkill if we hit an EIO error from the
hardware.  Exit with an error code instead.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 16:01:38 -05:00
Sage Weil
145333576c os/bluestore/KernelDevice: note EIO metadata on aio EIO
Note that we only do this if we're about to induce a crash.  If we can
pass EIO up the stack, it's up to the upper layer to handle it or trigger
its own crash if it can't.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 16:01:38 -05:00
Sage Weil
7f9df6158d global: add hook to annotate crash report with EIO information
If the global g_eio* fields are populated, include them in the crash
report, similar to how we populate assertion metadata.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 16:01:38 -05:00
Sage Weil
933d5084cb common: add --log-early command line option
Sometime it is important and useful to see the logs from the bootstrap
phase where we are getting the initial configs from the monitors.  Add
a command-line option --log-early to do that.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-07 13:58:43 -05:00
Yuri Weinstein
dff2bf78be
Merge pull request #27407 from tchaikov/wip-rpm-python3-el7
rpm: use python3.4 on RHEL7 by default
2019-04-06 08:57:57 -07:00
Neha Ojha
91e4926b09
Merge pull request #27403 from cyberang3l/many_objects_per_pg_docs
doc: update documentation for the MANY_OBJECTS_PER_PG warning

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-05 16:03:01 -07:00
Casey Bodley
97f8a4aa1c
Merge pull request #27409 from cbodley/wip-vstart-rgw-debug
vstart: only add --debug-ms=1 in RGWDEBUG

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-05 16:41:25 -04:00
Casey Bodley
7e6d53233d vstart: only add --debug-ms=1 in RGWDEBUG
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-04-05 14:32:06 -04:00
Kefu Chai
7b15b682b1 cmake: should PYTHON3_VERSION_STRING of libpython3
and make sure the version matches exactly the requested one

in future, we should use FindPython.cmake

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-06 01:59:36 +08:00
Vangelis Tasoulas
24131fc59a
doc: Update documentation for the MANY_OBJECTS_PER_PG warning
The current documentation for the MANY_OBJECTS_PER_PG warning
states that The threshold can be raised to silence the health
warning by adjusting the mon_pg_warn_max_object_skew config
option on the monitors. It seems that this is not true (at least)
since the luminous times, and this option should be adjusted on
the managers.

I encountered this problem and I spend quite sometime injecting
the mon_pg_warn_max_object_skew to the monitors, added the option
ceph.conf and restarted the monitors several times but the warning
was not going away. I had to download the code to see what's
happening and I found out this:

$ git grep -A 3 mon_pg_warn_max_object_skew src/common/options.cc
src/common/options.cc:1480:    Option("mon_pg_warn_max_object_skew", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED)
src/common/options.cc-1481-    .set_default(10.0)
src/common/options.cc-1482-    .set_description("max skew few average in objects per pg")
src/common/options.cc-1483-    .add_service("mgr"),

After I restarted the ceph-mgr service, the warning went away.

Signed-off-by: Vangelis Tasoulas <vangelis@tasoulas.net>
2019-04-05 19:53:35 +02:00
Patrick Donnelly
9473d99b61
Merge PR #27396 into master
* refs/pull/27396/head:
	doc: fixed typo in leadership names

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-05 10:49:25 -07:00
Patrick Donnelly
50206f33d9
Merge PR #27397 into master
* refs/pull/27397/head:
	doc: fixed caps

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-05 10:48:22 -07:00
Kefu Chai
cea9d18ced rpm: use python3.4 on RHEL7 by default
python3.4 is the native python3 before 7.6

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-06 00:28:39 +08:00
Casey Bodley
532458d9c1
Merge pull request #27400 from cbodley/wip-39118
rgw: limit entries in remove_olh_pending_entries()

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2019-04-05 12:02:02 -04:00
Sage Weil
8a5590b6aa Merge PR #26874 into master
* refs/pull/26874/head:
	OSD: OSDMapRef access by multiple threads is unsafe

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-04-05 08:57:51 -05:00
Sage Weil
cbcfe6e45f Merge PR #27132 into master
* refs/pull/27132/head:
	os/bluestore: new bluestore_debug_enforce_settings option.

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-05 08:57:31 -05:00
Sage Weil
c957b5a195 Merge PR #27317 into master
* refs/pull/27317/head:
	kv: make delete range optional on number of keys

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-04-05 08:57:14 -05:00
Sage Weil
819d484b37 Merge PR #27327 into master
* refs/pull/27327/head:
	mon/MonmapMonitor: clean up empty created stamp in monmap
	common/buffer: fix warnings

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-04-05 08:39:48 -05:00
Casey Bodley
3805ea635a rgw: limit entries in remove_olh_pending_entries()
If there are too many entries to send in a single osd op, the osd rejects
the request with EINVAL. This error happens in follow_olh(), which means
that requests against the object logical head (requests with no version
id) can't be resolved to the current object version. In multisite, this
also causes data sync to get stuck in retries

Fixes: http://tracker.ceph.com/issues/39118

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-04-05 09:26:07 -04:00
Servesha Dudhgaonkar
324426c966 doc: fixed caps
Signed-off-by: Servesha Dudhgaonkar <sdudhgao@redhat.com>
2019-04-05 17:29:05 +05:30
Servesha Dudhgaonkar
b373123c29 doc: fixed typo in leadership names
Signed-off-by: Servesha Dudhgaonkar <sdudhgao@redhat.com>
2019-04-05 17:16:50 +05:30