There needs to be a timeout to prevent ceph-disk from hanging
forever. But there is no good reason to set it to a value that is less
than a few hours.
Each OSD activation needs to happen in sequence and not in parallel,
reason why there is a global activation lock.
It would be possible, when an OSD is using a device that is not
otherwise used by another OSD (i.e. they do not share an SSD journal
device etc.), to run all activations in parallel. It would however
require a more extensive modification of ceph-disk to avoid any chances
of races.
Fixes: http://tracker.ceph.com/issues/20229
Signed-off-by: Loic Dachary <loic@dachary.org>
When booting a server with 20+ HDDs udev has to process a *lot* of
events (especially if dm-crypt is used), and 2 minutes might be not
enough for that. Make it possible to override the timeout (via systemd
drop-in files), and use a longer timeout (5 minutes) by default.
Fixes: http://tracker.ceph.com/issues/18740
Signed-off-by: Alexey Sheplyakov <asheplyakov@mirantis.com>
"ceph-disk trigger" invocation is currently performed in a mutually
exclusive fashion, with each call first taking an flock on the path
/var/lock/ceph-disk. On systems with a lot of osds, this leads to a
large amount of lock contention during boot-up, and can cause some
service instances to trip the 120 second timeout.
Take an flock on a device specific path instead of /var/lock/ceph-disk,
so that concurrent "ceph-disk trigger" invocations are permitted for
independent osds. This greatly reduces lock contention and consequently
the chance of service timeout. Per-device concurrency restrictions
required for http://tracker.ceph.com/issues/13160 are maintained.
Fixes: http://tracker.ceph.com/issues/18049
Signed-off-by: David Disseldorp <ddiss@suse.de>
A ceph udev action may be triggered before the local file systems are
mounted because there is no ordering in udev. The ceph udev action
delegates asynchronously to systemd via ceph-disk@.service which will
fail if (for instance) the LVM partition required to mount /var/lib/ceph
is not available yet. The systemd unit will retry a few times but will
eventually fail permanently. The sysadmin can systemctl reset-fail at a
later time and it will succeed.
Add a dependency to ceph-disk@.service so that it waits until the local
file systems are mounted:
After=local-fs.target
Since local-fs.target depends on lvm, it will wait until the lvm
partition (as well as any dm devices) is ready and mounted before
attempting to activate the OSD. It may still fail because the
corresponding journal/data partition is not ready yet (which is
expected) but it will no longer fail because the lvm/filesystems/dm are
not ready.
Fixes: http://tracker.ceph.com/issues/17889
Signed-off-by: Loic Dachary <loic@dachary.org>
When ceph-disk runs from udev or init script, it is in the background
and should it block for any reason, it may keep a lock forever. All
calls to ceph-disk in these context are changed to timeout.
The TimeoutStartSec= and TimeoutStopSec= which are both set via
TimeoutSec= do not apply to Type=oneshot services.
https://www.freedesktop.org/software/systemd/man/systemd.service.html
Fixes: http://tracker.ceph.com/issues/16580
Signed-off-by: Loic Dachary <loic@dachary.org>
The flock command may be installed elsewhere, depending on the
system. Let the PATH search figure that out.
http://tracker.ceph.com/issues/13975Fixes: #13975
Signed-off-by: Loic Dachary <loic@dachary.org>
When activating a device, ceph-disk trigger restarts the ceph-disk
systemd service. Two consecutive udev add on the same device will
restart the ceph-disk systemd service and the second one may kill the
first one, leaving the device half activated.
The ceph-disk systemd service is instructed to not kill an existing
process when restarting. The second run waits (via flock) for the second
one to complete before running so that they do not overlap.
http://tracker.ceph.com/issues/13160Fixes: #13160
Signed-off-by: Loic Dachary <ldachary@redhat.com>