From https://github.com/Naios/function2/
This provides unique_function, a move-only, type-erased function
wrapper.
I copied the file in rather than using a submodule because it's only
one file, and a submodule seems rather like killing a mosquito with an
anti-tank gun.
Chosen over the slightly more capable cxx_function because that
version fails its basic contract.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Since it just holds a pointer to a heap-allocated ObjectOperationImpl,
there's no reason not to. This allows us to move them around so the
call to aio_operate can happen after the stack frame where the
ObjectOperation was created has exited.
Would have just switched to a unique_ptr but the test harness relies
on being able to slip its own stuff in there and it was less work to
write my own move constructors than edit all the casts there to use
the get method.
Since there is no move construction now, no code will be broken if we
assert on use of moved-from objects and feedback suggests that people
think this is likelier to catch erroneous use than lazy
reinitialization.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
* refs/pull/27374/head:
mgr/volume: set cephfs metadata bias at 4x
mgr/volume: default to 16 PGs (min) for metadata pool
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/27386/head:
os/filestore/FileJournal: note EIO events
os/filestore: make note of EIO errors when we see them
os/filestore: note devname for later use
global/signal_handler: avoid core dump on EIO
os/bluestore/KernelDevice: note EIO metadata on aio EIO
global: add hook to annotate crash report with EIO information
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/27337/head:
msg/async: add timeout for connections which are not yet ready
msg: rename ms_tcp_read_timeout to ms_connection_idle_timeout
Reviewed-by: Ricardo Dias <rdias@suse.com>
The default operation os ceph-conf is --lookup option.
Add it obviously to enhance its readiness.
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
The regular expression should check the port first, then
check "LISTEN" item. "LISTEN" is not after the port item.
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.
as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.
but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:
- none of the packages installed by `yum-builddep` installs the python3
related rpm macros, so the system stays with whatever python3 it was
using. in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are consistent before and
after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
`yum-builddep` installed python3.6 and also the updated
`python-rpm-macros` packages, which points `python3_version` and
`python3_pkgversion` to 3.6 and 36 respectively. in this case,
`rpmbuild` will complain, because when we run `yum-builddep`,
`python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python36, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
in this case, `rpmbuild` will complain, because the python36 related
dependencies are missing. what the system has is python34
dependencies.
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python34, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are also consistent before and
after `yum-builddep`.
as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.
Signed-off-by: Kefu Chai <kchai@redhat.com>
There could be various corner cases that may cause an async
connection stuck in the connecting stage (e.g., by manually
creating some loop back connections on the switches of our test cluster,
we can almost 100% reproduce http://tracker.ceph.com/issues/37499).
In 61b9432ef9 I try to employ the
existing keep_alive mechanism to get those stuck connections out of the
trap but it does not work if the corresponding connection
is not yet ready, since we always require the underlying connection to be
**ready** in order to send out a keep_alive message.
Fix by making a more general connecting timeout strategy.
If a connecting process can not be finished within a specific interval,
then we simply cut it off and retry.
Fixes: http://tracker.ceph.com/issues/37499
Fixes: http://tracker.ceph.com/issues/38493
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
This is imprecise, since we can't (easily) map an EIO back to a specific
part of the device, or even (easily) tell whether it was a read or write
error. It's enough to mark a crash dump as an EIO event, though, and to
include the name of the (primary) filestore device.
Signed-off-by: Sage Weil <sage@redhat.com>
Generating a core dump is overkill if we hit an EIO error from the
hardware. Exit with an error code instead.
Signed-off-by: Sage Weil <sage@redhat.com>
Note that we only do this if we're about to induce a crash. If we can
pass EIO up the stack, it's up to the upper layer to handle it or trigger
its own crash if it can't.
Signed-off-by: Sage Weil <sage@redhat.com>
If the global g_eio* fields are populated, include them in the crash
report, similar to how we populate assertion metadata.
Signed-off-by: Sage Weil <sage@redhat.com>
Sometime it is important and useful to see the logs from the bootstrap
phase where we are getting the initial configs from the monitors. Add
a command-line option --log-early to do that.
Signed-off-by: Sage Weil <sage@redhat.com>
doc: update documentation for the MANY_OBJECTS_PER_PG warning
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
The current documentation for the MANY_OBJECTS_PER_PG warning
states that The threshold can be raised to silence the health
warning by adjusting the mon_pg_warn_max_object_skew config
option on the monitors. It seems that this is not true (at least)
since the luminous times, and this option should be adjusted on
the managers.
I encountered this problem and I spend quite sometime injecting
the mon_pg_warn_max_object_skew to the monitors, added the option
ceph.conf and restarted the monitors several times but the warning
was not going away. I had to download the code to see what's
happening and I found out this:
$ git grep -A 3 mon_pg_warn_max_object_skew src/common/options.cc
src/common/options.cc:1480: Option("mon_pg_warn_max_object_skew", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED)
src/common/options.cc-1481- .set_default(10.0)
src/common/options.cc-1482- .set_description("max skew few average in objects per pg")
src/common/options.cc-1483- .add_service("mgr"),
After I restarted the ceph-mgr service, the warning went away.
Signed-off-by: Vangelis Tasoulas <vangelis@tasoulas.net>
If there are too many entries to send in a single osd op, the osd rejects
the request with EINVAL. This error happens in follow_olh(), which means
that requests against the object logical head (requests with no version
id) can't be resolved to the current object version. In multisite, this
also causes data sync to get stuck in retries
Fixes: http://tracker.ceph.com/issues/39118
Signed-off-by: Casey Bodley <cbodley@redhat.com>