With the ceph-mon vs ceph-osd split packaging, users are expected to
have the ceph-mon package installed and not ceph-osd (and vice versa).
However, the init script (/etc/init.d/ceph) has a call to `ceph-disk`,
which may not be present on the machine.
Given that our packaging is not yet split upstream, this bug does not
manifest itself currently, because both ceph-mon and ceph-disk are
currently in the same "ceph" package. Once we split the packaging,
though, this will become an issue.
http://tracker.ceph.com/issues/10587 Refs: #10587
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
If the specified mount point is in use, umount it instead
of skipping mounting the fs.
In previous code , if we forgot that we already mount something unrelated to
*this ceph osd * to the particular mount point, we will skip the mount and
finally get an error complaining superblock not matching OSD ID.
umount & remount is better because
1. If the wrong FS not in use, we can get the right FS we want and make ceph boot smoothly.
2. If the wrong FS is in use, we will get EBUSY on umount, which seems explain the situation
more clearly than superblock mismatch.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
This allows members of the ceph group to make librados clients (like the
ceph cli and qemu) create sockets in the default /var/run/ceph/* location.
Signed-off-by: Sage Weil <sage@redhat.com>
Scripts expect the generated init-ceph script
to be +x, and CMake does that if the file you
feed into it is +x. This matches what we already
do with ceph.in.
Signed-off-by: John Spray <john.spray@redhat.com>
The existence of the pidfile must be checked outside of the loop to send
a signal to the daemon. Otherwise the daemon will remove the pidfile and
stop can return before the process is dead because it only checks
/proc/$pid if the pidfile exists.
http://tracker.ceph.com/issues/10389Fixes: #10389
Signed-off-by: Loic Dachary <ldachary@redhat.com>
We want to make sure the daemon runs in its own systemd environment. Check
for systemd as pid 1 and, when present, use systemd-run -r <cmd> to do
this.
Probably fixes#7627
Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Tested-by: Dan Mick <dan.mick@inktank.com>
If we are starting many daemons and hit an error, we normally note it and
move on. Do the same when doing the pre-mount step.
Fixes: #8554
Signed-off-by: Sage Weil <sage@inktank.com>
If we fail to set the CRUSH position for one OSD, continue on to try
starting others, just as we do when we fail to start the daemon.
Fixes: #8342
Signed-off-by: Sage Weil <sage@inktank.com>
On machines with MON and OSDs (on boot) OSDs started shortly after MON startup
but MON needs time to become oprational so OSDs fail to start due to short
timeout because they don't have enough time to establish communication with
cluster. This is even more likely to happen when there are other monitors down
which is not unusual when servers are rebooting after power failure.
Increasing timeout significantly improves chances for successful OSD start.
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
This avoids parsing out the wrong value when a long device name makes
df wrap over two lines.
Fixes: #6699
Reported-by: Jan Harkes <jaharkes@cs.cmu.edu>
Reviewed-by: Noah Watkins <noah.watkins@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Instead of hard-coding a check in ceph.conf and some reasonable
defaults, defer this work to ceph-crush-location, and allow users to
specify their own hook with alternative logic.
This can be helpful in a nubmer of cases, like:
- rack (or other) information included in hostname and easily parsed
out by a hook
- multiple types of devices in each host, resulting in 'parallel'
crush trees (e.g., one for hdd, one for ssd)
Signed-off-by: Sage Weil <sage@inktank.com>
If the monitor is not currently available, this crush update would block
forever, preventing the OSD and (potentially) the rest of the system
from starting up. Instead, make it time out after 10 seconds and then
abort startup. This prevents startup of an OSD if we failed to update
the CRUSH position for some reason.
In fact, do not start up the OSD if the CRUSH update fails for any
reason--not just a timeout!
Works-around: #5612
Signed-off-by: Sage Weil <sage@inktank.com>
We need to be able to condrestart all the ceph services on a
machine, so that we don't restart daemons that are supposed to be
stopped (e.g. broken disks).
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>