Commit Graph

66 Commits

Author SHA1 Message Date
Loic Dachary
b3887379d6 build/ops: restart ceph-osd@.service after 20s instead of 100ms
Instead of the default 100ms pause before trying to restart an OSD, wait
20 seconds instead and retry 30 times instead of 3. There is no scenario
in which restarting an OSD almost immediately after it failed would get
a better result.

It is possible that a failure to start is due to a race with another
systemd unit at boot time. For instance if ceph-disk@.service is
delayed, it may start after the OSD that needs it. A long pause may give
the racing service enough time to complete and the next attempt to start
the OSD may succeed.

This is not a sound alternative to resolve a race, it only makes the OSD
boot process less sensitive. In the example above, the proper fix is to
enable --runtime ceph-osd@.service so that it cannot race at boot time.

The wait delay should not be minutes to preserve the current runtime
behavior. For instance, if an OSD is killed or fails and restarts after
10 minutes, it will be marked down by the ceph cluster.  This is not a
change that could break things but it is significant and should be
avoided.

Refs: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-12-01 08:28:20 +01:00
David Disseldorp
8a62cbc074 systemd/ceph-disk: reduce ceph-disk flock contention
"ceph-disk trigger" invocation is currently performed in a mutually
exclusive fashion, with each call first taking an flock on the path
/var/lock/ceph-disk. On systems with a lot of osds, this leads to a
large amount of lock contention during boot-up, and can cause some
service instances to trip the 120 second timeout.

Take an flock on a device specific path instead of /var/lock/ceph-disk,
so that concurrent "ceph-disk trigger" invocations are permitted for
independent osds. This greatly reduces lock contention and consequently
the chance of service timeout. Per-device concurrency restrictions
required for http://tracker.ceph.com/issues/13160 are maintained.

Fixes: http://tracker.ceph.com/issues/18049

Signed-off-by: David Disseldorp <ddiss@suse.de>
2016-11-28 17:55:39 +01:00
Loic Dachary
d954de5546 ceph-disk: systemd unit must run after local-fs.target
A ceph udev action may be triggered before the local file systems are
mounted because there is no ordering in udev. The ceph udev action
delegates asynchronously to systemd via ceph-disk@.service which will
fail if (for instance) the LVM partition required to mount /var/lib/ceph
is not available yet. The systemd unit will retry a few times but will
eventually fail permanently. The sysadmin can systemctl reset-fail at a
later time and it will succeed.

Add a dependency to ceph-disk@.service so that it waits until the local
file systems are mounted:

After=local-fs.target

Since local-fs.target depends on lvm, it will wait until the lvm
partition (as well as any dm devices) is ready and mounted before
attempting to activate the OSD. It may still fail because the
corresponding journal/data partition is not ready yet (which is
expected) but it will no longer fail because the lvm/filesystems/dm are
not ready.

Fixes: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-22 15:23:47 +01:00
Owen Synge
639385a7f4 systemd/CMakeLists.txt:Remove ceph-create-keys cmake
ceph-create-keys should not be started on boot of mons with systemd so should
not exist as 'After' or 'Wants' for the ceph-mon.service

Signed-off-by: Owen Synge <osynge@suse.com>
2016-11-04 23:05:44 +01:00
Owen Synge
dc5fe8d415 systemd/ceph-mon@.service:Remove ceph-create-keys for mon in systemd
ceph-create-keys should not be started on boot of mons with systemd so should
not exist as 'After' or 'Wants' for the ceph-mon.service

Signed-off-by: Owen Synge <osynge@suse.com>
2016-11-04 23:05:26 +01:00
Owen Synge
8bcb4646b6 systemd/ceph-create-keys@.service:Remove ceph-create-keys for systemd
ceph-create-keys should not be started on boot of mons with systemd so should
not exist in the systemd files

Signed-off-by: Owen Synge <osynge@suse.com>
2016-11-04 23:05:17 +01:00
Tim Serong
082199f69d systemd: autogenerate ceph-mgr key during daemon startup
This is a hack to inject a key for the mgr daemon, using whatever
key already exists on the mon on this node to gain sufficient
permissions to create the mgr key.  Failure is ignored at every
step (the '-' prefix) in case someone has already used some other
trick to set everything up manually.

Signed-off-by: Tim Serong <tserong@suse.com>
2016-09-29 17:27:08 +01:00
Tim Serong
61d779345e systemd: encourage ceph-mgr to start in sync with ceph-mon
This change introduces the following behaviour:

- When ceph-mon starts, it will try to start ceph-mgr with the same
  instance id (Wants=), but will *not* fail to start if ceph-mgr
  doesn't start (i.e. the mon still works as it always did).
- ceph-mgr will start After= ceph-mon, and will stop and start when
  ceph-mon stops and starts, because it's PartOf= ceph-mon.

If you don't want ceph-mgr to run on the mons, you need to mask the
service, i.e. `systemctl mask ceph-mgr@INSTANCE`.  Hostnames are
typically instance names, so `systemctl mask ceph-mgr@$(hostname)`
should suffice if you wish to disable ceph-mgr on the mons.

Signed-off-by: Tim Serong <tserong@suse.com>
2016-09-29 17:27:08 +01:00
Tim Serong
d8ded57a87 systemd: add ceph-mgr service and target files
Signed-off-by: Tim Serong <tserong@suse.com>
2016-09-29 17:27:08 +01:00
Jason Dillaman
b1ce837a46 Merge pull request #10942 from JellevdK/master
systemd: add install section to rbdmap.service file

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2016-09-20 16:31:00 -04:00
Sage Weil
fba798dcad remove autotools
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:50:14 -04:00
Jelle vd Kooij
57b6f656e1 Add Install section to systemd rbdmap.service file
Signed-off-by: Jelle vd Kooij <vdkooij.jelle@gmail.com>
2016-09-01 00:42:34 +02:00
Yuri Weinstein
8175ce07b8 Merge pull request #10262 from dachary/wip-16580-ceph-disk-timeout
ceph-disk: timeout ceph-disk to avoid blocking forever

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Ken Dreyer (Red Hat) <kdreyer@redhat.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.cz>
2016-08-05 08:22:39 -07:00
Loic Dachary
bed1a5cc05 ceph-disk: timeout ceph-disk to avoid blocking forever
When ceph-disk runs from udev or init script, it is in the background
and should it block for any reason, it may keep a lock forever. All
calls to ceph-disk in these context are changed to timeout.

The TimeoutStartSec= and TimeoutStopSec= which are both set via
TimeoutSec= do not apply to Type=oneshot services.

https://www.freedesktop.org/software/systemd/man/systemd.service.html

Fixes: http://tracker.ceph.com/issues/16580

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-07-18 08:53:11 +02:00
Ruben Kerkhof
4179aa8d44 systemd: add osd id to service description
So, instead of logging this:

Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.
Jul 01 13:51:04 localhost systemd[1]: Failed to start Ceph object storage daemon.

We see this, which is a lot more useful:

Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.27.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.32.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.29.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.31.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.23.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.24.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.25.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.30.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.28.
Jul 01 13:59:32 localhost systemd[1]: Failed to start Ceph object storage daemon osd.22.
2016-07-01 14:02:36 +02:00
Kefu Chai
41061ce769 cmake: install systemd files
add an option "WITH_SYSTEMD", off by default

Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-06-30 19:27:43 +08:00
Nathan Cutler
80be4a8cbf systemd: fix typo in preset file
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2016-04-30 16:21:13 +02:00
Nathan Cutler
53b1a6799c systemd: enable all the ceph .target services by default
Some distros, like Fedora and openSUSE, have a policy that all services are
disabled by default.

This patch changes that default for the ceph.target and
ceph-{mds,mon,osd,radosgw}.target services.

Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Boris Ranto <branto@redhat.com>
2016-04-27 14:20:29 +02:00
Nathan Cutler
df893f395e systemd: make Ceph daemon units "want" time-sync.target
Fixes: http://tracker.ceph.com/issues/15419

Signed-off-by: Nathan Cutler <ncutler@suse.com>
2016-04-23 17:48:08 +02:00
Sage Weil
dcd211cdd1 Merge pull request #8449 from javacruft/ceph-osd-prestart
ceph-osd-prestart.sh: drop --setuser/--setgroup

Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-22 17:06:44 -04:00
Boris Ranto
62084375fa systemd: Use the same restart limits as upstart
Currently, the systemd daemons are not restarted on failure. This patch
adds this functionality and sets the defaults to those defined in
upstart. This resolves to 3 fails per 30 minutes for osd, mon and mds
and 5 fails per 30 seconds for radosgw.

Signed-off-by: Boris Ranto <branto@redhat.com>
2016-04-13 21:26:31 +02:00
James Page
05cafcf19f Drop any systemd imposed process/thread limits
If systemd has task accounting enabled, a default of 512 tasks
will be applied to all systemd units.

For ceph, this is way to low even for a modest cluster, so stop
this restriction being applied and allow administrators to apply
limits using sysctl.

Signed-off-by: James Page <james.page@ubuntu.com>
2016-04-05 17:33:57 +01:00
James Page
74977f7884 Drop --setuser/--setgroup from osd prestart
These are not supported by /usr/lib/ceph/ceph-osd-prestart.sh,
resulting in warnings:

 ceph-osd-prestart.sh[23367]: getopt: unrecognized option '--setuser'
 ceph-osd-prestart.sh[23367]: getopt: unrecognized option '--setgroup'

--setuser and --setgroup are only needed for the ceph-osd process.

Signed-off-by: James Page <james.page@ubuntu.com>
2016-04-05 16:59:38 +01:00
Sage Weil
df6570c2bd Merge pull request #8222 from SUSE/wip-14984
systemd: set up environment in rbdmap unit file

Reviewed-by: Boris Ranto <branto@redhat.com>
2016-03-23 12:33:39 -04:00
Nathan Cutler
a7a36581ff systemd: set up environment in rbdmap unit file
http://tracker.ceph.com/issues/14984 Fixes: #14984

Signed-off-by: Nathan Cutler <ncutler@suse.com>
2016-03-19 06:34:07 +01:00
Jason Dillaman
8a0e47281f systemd: new ceph-rbd-mirror scripts
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2016-03-18 17:51:23 -04:00
Nathan Cutler
69291f872e packaging: move ceph_common.sh and ceph-osd-prestart.sh to /usr/lib/ceph
First, it makes sense for both ceph_common.sh and ceph-osd-prestart.sh to
reside in the same directory: make it so.

Second, /usr/lib exists on both RHEL/Fedora and SLE/openSUSE, whereas
the later lacks /usr/libexec. To make this less painful, package
ceph_common.sh and ceph-osd-prestart.sh in /usr/lib/ceph.

Third, allow e.g. FreeBSD to do its own thing by using the $(libexecdir)
Autoconf variable (but set it to /usr/lib in the spec file).

http://tracker.ceph.com/issues/14687 Fixes: #14687

Signed-off-by: Nathan Cutler <ncutler@suse.com>
2016-02-18 12:19:14 +01:00
Sage Weil
9da41fee1a systemd/ceph-radosgw-prestart.sh: remove
This is unpackaged and unused.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-02-04 17:48:16 -05:00
Patrick Donnelly
b65d9c5457
systemd: Add systemd sandboxing to services.
This change makes it so the mon/osd/mds/radosgw daemons:
    o Cannot write to /usr, /etc, and /boot.
    o Cannot access /home, /root, or /run/user.
    o Each daemon gets its own private /tmp and /var/tmp.
    o All daemons get a private /dev without physical devices (exception: osd)

I'm not sure if the osd daemon needs access to a full /dev so I left
ProtectDevices out for ceph-osd@.service.

Signed-off-by: Patrick Donnelly <batrick@batbytes.com>
2016-01-28 10:50:00 -05:00
Loic Dachary
c8f7d44c93 build/ops: systemd ceph-disk unit must not assume /bin/flock
The flock command may be installed elsewhere, depending on the
system. Let the PATH search figure that out.

http://tracker.ceph.com/issues/13975 Fixes: #13975

Signed-off-by: Loic Dachary <loic@dachary.org>
2015-12-04 21:11:09 +01:00
Sage Weil
a12efa204e Merge pull request #6276 from david-z/wip-systemd-finegrain-ceph-service
systemd: start/stop/restart ceph services by daemon type

Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Sage Weil  <sage@redhat.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
2015-11-28 08:25:40 -05:00
suckowbiz
5972a44106 doc: fix message typos in systemd
Signed-off-by: Tobias Suckow <tobias@suckow.biz>
2015-11-23 16:50:07 +01:00
Boris Ranto
9224ac2ad2 rbdmap: systemd support
Fixes: #13374
Signed-off-by: Boris Ranto <branto@redhat.com>
2015-11-06 10:26:22 +01:00
Zhi Zhang
cfa2d0a08a fine-grained control systemd to start/stop/restart ceph services at once
Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>
2015-10-26 15:13:19 +08:00
Sage Weil
fb5f058a92 Merge remote-tracking branch 'gh/infernalis' 2015-09-22 14:04:44 -04:00
Loic Dachary
f0a47578c7 ceph-disk: systemd must not kill a running ceph-disk
When activating a device, ceph-disk trigger restarts the ceph-disk
systemd service. Two consecutive udev add on the same device will
restart the ceph-disk systemd service and the second one may kill the
first one, leaving the device half activated.

The ceph-disk systemd service is instructed to not kill an existing
process when restarting. The second run waits (via flock) for the second
one to complete before running so that they do not overlap.

http://tracker.ceph.com/issues/13160 Fixes: #13160

Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-09-22 08:46:56 +02:00
Sage Weil
ea977611c4 systemd: increase nproc ulimit
We were observed to be hitting the limit on centos7
(triggering pthread_create failures) on a ~2000 OSD cluster.

Increasing this resolves it!

Reported-by: Dan van der Ster <daniel.vanderster@cern.ch>
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-21 14:35:15 -04:00
Sage Weil
8e13d89f0f systemd: eliminate ceph-rgw tmpfiles.d file
This is for storing the rgw socket files for fastcgi, which we do not
want to enable by default.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-14 14:00:26 -04:00
Sage Weil
367c794cb1 systemd: no need to preprocess ceph-osd@service
This used to be necessary but now is not.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-14 14:00:26 -04:00
Sage Weil
8453a89cb2 systemd: set nofile limit in unit files
Make it big so hopefully nobody has to change it.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-14 14:00:26 -04:00
Sage Weil
ea91c4ef85 systemd: tmpfiles.d in /run, not /var/run
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-11 11:38:47 -04:00
Sage Weil
3aa38bc07f make /var/run/ceph 770 ceph:ceph
This allows members of the ceph group to make librados clients (like the
ceph cli and qemu) create sockets in the default /var/run/ceph/* location.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-11 11:26:59 -04:00
Sage Weil
f1b80e99b0 systemd: consolidate into a single ceph-disk@.service
This simple service will 'ceph-disk trigger DEV --sync'.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:22:25 -04:00
Sage Weil
8f3185bade systemd: use --setuser and --setgroup for all daemons
Allow all daemons drop privilege themselves, instead of letting
systemd do it.

Among other things, this means that admins can conditionally not
drop prives by setting

  setuser match path = /var/lib/ceph/$type/$cluster-$id

in their ceph.conf to ease the pain of upgrade.

Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
2015-08-26 20:34:15 -04:00
Sage Weil
c7ee798a0f set nofile ulimit in /etc/security/limits.d/ceph only
Specify the nofile ulimit in one standard place, where everyone expects it
to be.  Drop it from the ceph-osd unit file.

Leave upstart and sysvinit untouched for the time being to avoid compat
issues.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-26 20:34:15 -04:00
Sage Weil
7c9fdf44f2 systemd: make ceph-osd setuid/gid to ceph:ceph
Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-26 20:34:15 -04:00
Sage Weil
b8893f6b8a systemd: chown ceph:ceph /var/run/ceph
Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-26 20:34:14 -04:00
Sage Weil
ec1ee5e901 systemd: run mon and mds as ceph:ceph
Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-26 20:34:14 -04:00
Sage Weil
4d10dc134b systemd: fix ceph-radosgw@ service
There's no prestart.  Fix the instance name.  Cleanup.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-01 09:58:34 -04:00
David Disseldorp
85a894697e systemd: activate disks via systemd service instead of udev
The udev(7) man page states:
  RUN
  ...
  This can only be used for very short-running foreground tasks. Running
  an event process for a long period of time may block all further
  events for this or a dependent device.

  Starting daemons or other long-running processes is not appropriate
  for udev; the forked processes, detached or not, will be
  unconditionally killed after the event handling has finished.

ceph-disk activate is far from a short-running task:
- check whether path is a block dev, for dirs call through to
  activate_dir()
- call blkid to obtain the filesystem type for the block dev
- pull mount options from hard-coded ceph.conf file
- mount the OSD dev at a temporary path
- check the ceph magic for mounted filesystem
- read cluster uuid and locate corresponding /etc/ceph/{cluster}.conf
  path
- read or generate (if missing) the OSD uuid
- create a file indicating init system usage (systemd)
- mount the device at a second (final) location
- umount (lazy) the temporary mount path
- enable the systemd ceph-osd@{osd_id} service
- start the systemd ceph-osd@{osd_id} service

This logic is therefore best left in a systemd service for execution. As
it is less limited in terms of execution time, and also allows for
improved event handling in future (fsck, dmcrypt mapping etc.).

This change sees 95-ceph-osd.rules.systemd trigger ceph-disk activate or
ceph-disk activate-journal via new ceph-disk-activate-journal@.service,
ceph-disk-activate@.service and ceph-disk-dmcrypt-activate@.service
systemd service files.

ceph-disk-dmcrypt-activate@.service makes use of the newly added
--dmcrypt parameter for ceph-disk activate.

Signed-off-by: David Disseldorp <ddiss@suse.de>
2015-08-01 09:58:34 -04:00