Commit Graph

8 Commits

Author SHA1 Message Date
Loic Dachary
a9eb52e0a4 ceph-disk: set the default systemd unit timeout to 3h
There needs to be a timeout to prevent ceph-disk from hanging
forever. But there is no good reason to set it to a value that is less
than a few hours.

Each OSD activation needs to happen in sequence and not in parallel,
reason why there is a global activation lock.

It would be possible, when an OSD is using a device that is not
otherwise used by another OSD (i.e. they do not share an SSD journal
device etc.), to run all activations in parallel. It would however
require a more extensive modification of ceph-disk to avoid any chances
of races.

Fixes: http://tracker.ceph.com/issues/20229

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-06-08 22:29:48 +02:00
Alexey Sheplyakov
22332f6bae systemd/ceph-disk: make it possible to customize timeout
When booting a server with 20+ HDDs udev has to process a *lot* of
events (especially if dm-crypt is used), and 2 minutes might be not
enough for that. Make it possible to override the timeout (via systemd
drop-in files), and use a longer timeout (5 minutes) by default.

Fixes: http://tracker.ceph.com/issues/18740
Signed-off-by: Alexey Sheplyakov <asheplyakov@mirantis.com>
2017-02-06 14:17:20 +04:00
David Disseldorp
8a62cbc074 systemd/ceph-disk: reduce ceph-disk flock contention
"ceph-disk trigger" invocation is currently performed in a mutually
exclusive fashion, with each call first taking an flock on the path
/var/lock/ceph-disk. On systems with a lot of osds, this leads to a
large amount of lock contention during boot-up, and can cause some
service instances to trip the 120 second timeout.

Take an flock on a device specific path instead of /var/lock/ceph-disk,
so that concurrent "ceph-disk trigger" invocations are permitted for
independent osds. This greatly reduces lock contention and consequently
the chance of service timeout. Per-device concurrency restrictions
required for http://tracker.ceph.com/issues/13160 are maintained.

Fixes: http://tracker.ceph.com/issues/18049

Signed-off-by: David Disseldorp <ddiss@suse.de>
2016-11-28 17:55:39 +01:00
Loic Dachary
d954de5546 ceph-disk: systemd unit must run after local-fs.target
A ceph udev action may be triggered before the local file systems are
mounted because there is no ordering in udev. The ceph udev action
delegates asynchronously to systemd via ceph-disk@.service which will
fail if (for instance) the LVM partition required to mount /var/lib/ceph
is not available yet. The systemd unit will retry a few times but will
eventually fail permanently. The sysadmin can systemctl reset-fail at a
later time and it will succeed.

Add a dependency to ceph-disk@.service so that it waits until the local
file systems are mounted:

After=local-fs.target

Since local-fs.target depends on lvm, it will wait until the lvm
partition (as well as any dm devices) is ready and mounted before
attempting to activate the OSD. It may still fail because the
corresponding journal/data partition is not ready yet (which is
expected) but it will no longer fail because the lvm/filesystems/dm are
not ready.

Fixes: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-22 15:23:47 +01:00
Loic Dachary
bed1a5cc05 ceph-disk: timeout ceph-disk to avoid blocking forever
When ceph-disk runs from udev or init script, it is in the background
and should it block for any reason, it may keep a lock forever. All
calls to ceph-disk in these context are changed to timeout.

The TimeoutStartSec= and TimeoutStopSec= which are both set via
TimeoutSec= do not apply to Type=oneshot services.

https://www.freedesktop.org/software/systemd/man/systemd.service.html

Fixes: http://tracker.ceph.com/issues/16580

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-07-18 08:53:11 +02:00
Loic Dachary
c8f7d44c93 build/ops: systemd ceph-disk unit must not assume /bin/flock
The flock command may be installed elsewhere, depending on the
system. Let the PATH search figure that out.

http://tracker.ceph.com/issues/13975 Fixes: #13975

Signed-off-by: Loic Dachary <loic@dachary.org>
2015-12-04 21:11:09 +01:00
Loic Dachary
f0a47578c7 ceph-disk: systemd must not kill a running ceph-disk
When activating a device, ceph-disk trigger restarts the ceph-disk
systemd service. Two consecutive udev add on the same device will
restart the ceph-disk systemd service and the second one may kill the
first one, leaving the device half activated.

The ceph-disk systemd service is instructed to not kill an existing
process when restarting. The second run waits (via flock) for the second
one to complete before running so that they do not overlap.

http://tracker.ceph.com/issues/13160 Fixes: #13160

Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-09-22 08:46:56 +02:00
Sage Weil
f1b80e99b0 systemd: consolidate into a single ceph-disk@.service
This simple service will 'ceph-disk trigger DEV --sync'.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:22:25 -04:00