The kernel does not let you mount --move when the parent mount is
shared (see, e.g., https://bugzilla.redhat.com/show_bug.cgi?id=917008
for another person this also confused). We can't use --bind either
since that (on RHEL at least) screws up /etc/mtab so that the final
result looks like
/var/lib/ceph/tmp/mnt.HNHoXU /var/lib/ceph/osd/ceph-0 none rw,bind 0 0
Instead, mount the original dev in the final location and then umount
from the old location.
Signed-off-by: Sage Weil <sage@inktank.com>
parted on RHEL/Centos prefixes the *machine readable output* with
1b 5b 3f 31 30 33 34 68
Note that the same thing happens when you 'import readline' in python.
Work around it!
Signed-off-by: Sage Weil <sage@inktank.com>
apply_incremental() may return -EINVAL. Don't ignore it.
[1] UfP = Update from Paxos
Fixes: #5343
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Starting when only one network interface has started breaks machines with
multiple nics in very problematic ways.
There may be an earlier trigger that we can use for cases where other
services on the local machine depend on ceph, but for now this is better
than the existing behavior.
See #5248
Signed-off-by: Sage Weil <sage@inktank.com>
Maximum object size is 100GB configurable with osd_max_object_size
Error EFBIG if attempt to WRITE/WRITEFULL/TRUNCATE beyond osd_max_object_size
Error EINVAL if length < 1 for WRITE/WRITEFULL/ZERO
Make ZERO beyond existing size a no-op
Fixes: #5252Fixes: #5340
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Activate an osd via its journal device. udev populates its symlinks and
triggers events in an order that is not related to whether the device is
an osd data partition or a journal. That means that triggering
'ceph-disk activate' can happen before the journal (or journal symlink)
is present and then fail.
Similarly, it may be that they are on different disks that are hotplugged
with the journal second.
This can be wired up to the journal partition type to ensure that osds are
started when the journal appears second.
Include the udev rules to trigger this.
Signed-off-by: Sage Weil <sage@inktank.com>
After we change the final partition type, sgdisk may or may not trigger a
udev event, depending on how well udev is behaving (it varies between
distros, it seems). The old code would often settle and wait for udev to
activate the device, and then partprobe would uselessly fail because it
was already mounted.
Call partprobe only at the very end, after prepare is done. This ensures
that if partprobe calls udevadm settle (which is sometimes does) we do not
get stuck.
Drop the udevadm settle. I'm not sure what this accomplishes; take it out,
at least until we determine we need it.
Signed-off-by: Sage Weil <sage@inktank.com>
librados/librados.cc: In function 'int rados_mon_command_target(void*, const char*, const char**, size_t, const char*, size_t, char**, size_t*, char**, size_t*)':
error: librados/librados.cc:1877: 'LONG_MAX' was not declared in this scope
error: librados/librados.cc:1877: 'LONG_MIN' was not declared in this scope
Signed-off-by: Sage Weil <sage@inktank.com>
In commit 7e1cf87b51 we stopped waiting for
the osdmap on start because the Objecter will normally wait, but for some
commands we assume the osdmap is recent(ish).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Options -v, --verbose, --concise didn't have helpstrings
Option --completion doesn't quite work yet, and should be hidden anyway
Signed-off-by: Dan Mick <dan.mick@inktank.com>
If we add an item that already exists in particular position, we should
update instead of inserting it; the CrushWrapper methods are not
idempotent.
Signed-off-by: Sage Weil <sage@inktank.com>
If we abort while waiting, we incorrect clean up (we switch the state value
incorrectly, and also fail to clean up the initialized objecter).
Intead, skip this wait.. it's useless!
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
mon/MonmapMonitor.cc: In member function 'bool MonmapMonitor::preprocess_command(MMonCommand*)':
mon/MonmapMonitor.cc:273:2: warning: label 'out' defined but not used [-Wunused-label]
Signed-off-by: Sage Weil <sage@inktank.com>
We explicitly mark_down() and clear cur_con when shutting down; do the same
for get_monmap_privately() to ensure that the reset event doesn't make us
do something silly (like, in this case, call _reopen_session() again).
Signed-off-by: Sage Weil <sage@inktank.com>
When the shutdown/stop flag is set, continue to work through the queue.
Process events, but discard messages. This avoids the loss of reset events
on shutdown that are necessary to clean up ref cycles.
Signed-off-by: Sage Weil <sage@inktank.com>
Use the atomic pipe link removal as a signal that we are the one failing
the con and use that to queue the reset event.
This fixes the case where we have an open, the session gets set up via the
handle_accept callback, and then race with another connection and go into
wait + close, or just close. In that case, fault() needs to queue a reset
event to match the accept.
Signed-off-by: Sage Weil <sage@inktank.com>
This gives the ms_handle_reset call a chance to clean up (for example, by
breaking a con->priv <-> session reference cycle).
Signed-off-by: Sage Weil <sage@inktank.com>
Make RefCountedObject a private parent of Connection so that users are
forced to use ConnectionRef whenever references are taken.
Many methods can still take a raw Connection* when they are using the
caller's reference but not taking their own; this is cheaper than
twiddling the reference count, and the lifetime is still well defined.
Local variables generally use ConnectionRef, though.
Signed-off-by: Sage Weil <sage@inktank.com>