rbd binary will load rbd.ko itself, with appropriate options. Loading
it by hand with default options is undesirable.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Thanks to libkrbd, 'rbd map' now outputs the device node it mapped to
to stdout:
$ sudo rbd map foo
/dev/rbd0
This will allow us to get rid of a lot of ad-hoc poll/sleep code in our
qa scripts.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Add libkrbd libtool convenience library to provide an interface for
mapping and unmapping rbd images programmatically. This will be used
by the rbd binary itself and the librbd_fsx testing tool.
libkrbd takes care of the kernel module stuff (common/module.h) and
makes use of libudev to be able to properly wait for block device
creation and deletion and tell which block device got assigned by the
kernel to the newly created mapping.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Add two kernel module helpers: module_{module,has_parameter}(). They
are going to live in common/module.[ch].
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Turn common/secret.c into a libtool convenience library, libsecret.la.
Currently it is build directly, twice: for mount.ceph and rbd binaries.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
A 'status' or 'health' request will return a HEALTH_WARN whenever the
monitor handling the request has the option set to zero.
Fixes: 7784
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
We were waiting for the election to finish, but we need to *also* wait for
paxos to recover. Being a peon or leader is not sufficient and we may
return a map that is still old.
Fixes: #7997
Signed-off-by: Sage Weil <sage@inktank.com>
We check whether the head is degraded, and we check whether a clone is
unreadable, but in the case where we have a cache op on a degraded object,
we don't check. That leads to an assert when the repop hits the replica
and the object is in the peer's missing set.
Fix this by adding a check on the clone when write_ordered is true. Note
that checking write_ordered is better than whether it is a cache op because
we want to preserve write ordering even for reads that are flagged by the
client.
Fixes: #8048
Signed-off-by: Sage Weil <sage@inktank.com>
If we recalculate the mapping and find that there is no primary, we need
to set the 'osd' field to -1. Otherwise, the caller will try to resend
to a dead session with bad results.
This was introduced in the refactor 860d72770c.
Fixes: #8130
Signed-off-by: Sage Weil <sage@inktank.com>
If we have just started and receive a command, we currently will reply with
EINVAL because the leader commands are empty. Note that this race is very
difficult to reach because the (old) peon needs to forward a command to
the mon while it still thinks it has quorum, and the message needs to get
sent after the leader mon has restarted and reset its connection but before
it has declared a new election.
To fix this, we should assume at startup time that our commands are
valid. If it is an internal command that does not require quorum, that
is fine. If it does require quorum, we will retry the command after the
election completes and we will revalidate the command then.
Fixes: #8132
Signed-off-by: Sage Weil <sage@inktank.com>
In 69321bf, EAGAIN changed behaviour to block indefinitely
rather than returning to user. Change the return for
`osd pool set` operations that are blocked by creating PGs
to return EBUSY instead of EAGAIN, so that they are excepted
from this blocking behaviour.
Signed-off-by: John Spray <john.spray@inktank.com>
Adjust priority of committing dirfrags according to number of
expiring log segments. The more expiring log segments, the higher
priority. Because it mean MDS does not trim log segments quickly
enough.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
when the _revokes list is emptied, it doesn't mean that client has
released the revoking caps. It's possible that client was flusing
dirty metadata.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>