Commit Graph

6 Commits

Author SHA1 Message Date
David Disseldorp
8a62cbc074 systemd/ceph-disk: reduce ceph-disk flock contention
"ceph-disk trigger" invocation is currently performed in a mutually
exclusive fashion, with each call first taking an flock on the path
/var/lock/ceph-disk. On systems with a lot of osds, this leads to a
large amount of lock contention during boot-up, and can cause some
service instances to trip the 120 second timeout.

Take an flock on a device specific path instead of /var/lock/ceph-disk,
so that concurrent "ceph-disk trigger" invocations are permitted for
independent osds. This greatly reduces lock contention and consequently
the chance of service timeout. Per-device concurrency restrictions
required for http://tracker.ceph.com/issues/13160 are maintained.

Fixes: http://tracker.ceph.com/issues/18049

Signed-off-by: David Disseldorp <ddiss@suse.de>
2016-11-28 17:55:39 +01:00
Loic Dachary
d954de5546 ceph-disk: systemd unit must run after local-fs.target
A ceph udev action may be triggered before the local file systems are
mounted because there is no ordering in udev. The ceph udev action
delegates asynchronously to systemd via ceph-disk@.service which will
fail if (for instance) the LVM partition required to mount /var/lib/ceph
is not available yet. The systemd unit will retry a few times but will
eventually fail permanently. The sysadmin can systemctl reset-fail at a
later time and it will succeed.

Add a dependency to ceph-disk@.service so that it waits until the local
file systems are mounted:

After=local-fs.target

Since local-fs.target depends on lvm, it will wait until the lvm
partition (as well as any dm devices) is ready and mounted before
attempting to activate the OSD. It may still fail because the
corresponding journal/data partition is not ready yet (which is
expected) but it will no longer fail because the lvm/filesystems/dm are
not ready.

Fixes: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-22 15:23:47 +01:00
Loic Dachary
bed1a5cc05 ceph-disk: timeout ceph-disk to avoid blocking forever
When ceph-disk runs from udev or init script, it is in the background
and should it block for any reason, it may keep a lock forever. All
calls to ceph-disk in these context are changed to timeout.

The TimeoutStartSec= and TimeoutStopSec= which are both set via
TimeoutSec= do not apply to Type=oneshot services.

https://www.freedesktop.org/software/systemd/man/systemd.service.html

Fixes: http://tracker.ceph.com/issues/16580

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-07-18 08:53:11 +02:00
Loic Dachary
c8f7d44c93 build/ops: systemd ceph-disk unit must not assume /bin/flock
The flock command may be installed elsewhere, depending on the
system. Let the PATH search figure that out.

http://tracker.ceph.com/issues/13975 Fixes: #13975

Signed-off-by: Loic Dachary <loic@dachary.org>
2015-12-04 21:11:09 +01:00
Loic Dachary
f0a47578c7 ceph-disk: systemd must not kill a running ceph-disk
When activating a device, ceph-disk trigger restarts the ceph-disk
systemd service. Two consecutive udev add on the same device will
restart the ceph-disk systemd service and the second one may kill the
first one, leaving the device half activated.

The ceph-disk systemd service is instructed to not kill an existing
process when restarting. The second run waits (via flock) for the second
one to complete before running so that they do not overlap.

http://tracker.ceph.com/issues/13160 Fixes: #13160

Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-09-22 08:46:56 +02:00
Sage Weil
f1b80e99b0 systemd: consolidate into a single ceph-disk@.service
This simple service will 'ceph-disk trigger DEV --sync'.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:22:25 -04:00