Ext4 as a default is a bad choice, as we don't perform enough QA with
it. To use XFS as the default for ceph-disk-prepare, we need to depend
on xfsprogs.
btrfs-tools is already recommended, so no change there. If you set
osd_fs_type=btrfs, and don't have the package installed, you'll just
get an error message.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Earlier testing never saw this, but now a mount of a disk triggers a
udev blockdev-added event, causing ceph-disk-activate to run even
before ceph-disk-prepare has had a chance to write the files and
unmount the disk.
Avoid this by using a temporary partition type uuid ("ceph 2 be"), and
only setting it to the permanent ("ceph osd"). The hotplug event won't
match the type uuid, and thus won't trigger ceph-disk-activate.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
This cleans up the error handling to not leave disks mounted
in /var/lib/ceph/tmp/mnt.* when something fails, e.g. when
the ceph command line tool can't talk to mons.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tested with meaningless but easy-to-verify values:
[global]
osd_fs_type = xfs
osd_fs_mkfs_arguments_xfs = -i size=512
osd_fs_mount_options_xfs = noikeep
ceph-disk-activate does not respect the mount options yet.
Closes: #2549
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Either use ceph.conf variable osd_fs_type or command line option
--fs-type=
Default is still ext4, as currently nothing guarantees xfsprogs
or btrfs-tools are installed.
Currently both btrfs and xfs seems to trigger a disk hotplug event at
mount time, thus triggering a useless and unwanted ceph-disk-activate
run. This will be worked around in a later commit.
Currently mkfs and mount options cannot be configured.
Bug: #2549
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Add PROT and LOCK columns, for protection status and presence of any
locks of type "excl" or "shr" (lock list for the gory details)
Shrink FORMAT to FMT
Remove TYPE column; one can infer type from presence of @ in name (snap)
or presence of parent (clone)
Dump prettybyte_t in favor of new si_t for compactness
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
1) comment set_conf_param and the loop that uses it
2) put back error checking for "called with full param list" in macro
3) make all the loop calls consistent
4) add a third arg placeholder to handle lock remove
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This avoids an error if the daemon was running already, and is
already being done with the other services.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Insufficient understanding of fragile algorithm. This needs more
thought and I don't want the parsing broken as it is now.
This reverts commit 0d48879320.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Instead of looping across all args, with increments inside the loop,
which can run off the end of the vector, demand that the final
argument parsing have exactly the right number of args, or complain
about the extras and die.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This test still verifies that the race is handled correctly if it
occurs, but will no longer clutter test results with spurious failures
when the race is not reproduced.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This is to handle TextTable output, which doesn't use tabs
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Return ENOENT if no parent.
Return error if pool reverse lookup fails.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
It's meth for referring to Python class methods:
http://sphinx.pocoo.org/domains.html#python-roles
The links to format() and features() are currently
dead because those methods don't have docstrings.
They'll start working once docstrings are added.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
This avoids the delay of installing Sphinx inside the virtualenv;
especially, compiling lxml is slow.
If Sphinx is not installed system-wide (or it's too old), this will
still install a copy inside the virtualenv, to keep working.
Thanks to Sean for the push to make this happen, and testing the
various scenarios; I (Tv) took the liberty of changing the commit to
use venv-python for the manpage build too, avoid the nonstandard
"which" command, be more careful about quoting, and explain more fully
what's going on in the comment.
Closes: https://github.com/ceph/ceph/pull/24
Signed-off-by: Sean Channel <pentabular@gmail.com>
Signed-off-by: Tommi Virtanen <tv@inktank.com>
We should never consider old 'acks' from monitors on a new election. We
usually do it, but we didn't if an election expired, because this code
didn't foresee the possibility of monitors changing ranks in-between
elections -- which doesn't happen if we specify the monmap during the
monitor's mkfs, but may happen when relying on 'mon initial peers'.
Failing to do so triggered an assertion after fixing bug #3252.
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Whenever we update the monmap we should bootstrap, in order to reset the
monitor's on-going activities and re-probe.
Not doing so contributed to bug #3252, during which we entered an infinite
election cycle. This may only happen though when we rely on 'mon initial
peers'. Specifying a monmap during the monitor's mkfs should not trigger
this bug.
Fixes: #3252
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
We cannot propose until they all recover.
Fixes: #3260
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
The client currently deadlocks with kernel buffer cache invalidation
enabled, due to the client lock calling the invalidate callback, which
in turn sends up calls back to the userspace process which try to lock
the same client lock. The fix is to invoke the invalidate callback in
a separate thread, allowing _release, _flushed, etc. to complete,
unlocking the client lock so that the invalidate callback avoids deadlock
when the up call is made.
We construct a separate work queue (Finisher) that allows scheduling
the invalidate callbacks in a separate thread. The thread only starts
when the invalidate callback is set. If no callback is set, the cache
capability reference is decremented inline as before.
Some callers of invalidate_inode_cache (flush and update_inode_file_bits)
don't expect the cache capability to be decremented. Pass a keep_caps flag to
only decrement the capability ref in the _release case.
Also, we need to make sure the mds is aware that the client has dropped
the cache capability, so we add a call to check_caps in put_cap_ref for the
CEPH_CAP_FILE_CACHE capability.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
The logic in put_cap_ref doesn't do anything but inode->put_cap_ref
if cap is set to CEPH_CAP_FILE_CACHE, so checkafter isn't needed.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
The handle_client_rename() check expects a full path rooted in the MDSDIR.
Do so in migrate_stray().
Also, use the committed (not projected) dn linkage; this was a carry-over
from the original switch to this API forever ago, but the current callers
don't need to migrate an uncommitted stray. This also aligns us with
reintegrate_stray().
Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
The stray reintegration generates a source path that will be rooted in a
(possibly remote) MDS's MDSDIR; adjust this check accordingly. This is a
holdover from way back when the straydir was the base of the tree instead
of mdsdir.
Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Cast to (unsigned long) when checking for magic values, so
real ptrs don't get sign-extended. Avoids triggering
assert(inq == &local_queue) failure.
Fixes: #3251
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
When moving directory between snaprealms, we can avoid creating snaprealm
if the directory doesn't has its own snaprealm and directory was created
after both realms' newest snapshot.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>