Commit Graph

109 Commits

Author SHA1 Message Date
Venky Shankar
1227752983 systemd: cephfs-mirror systemd unit files
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2021-01-12 05:57:32 -05:00
Wong Hoi Sing Edison
d88c834ea4
systemd: Support Graceful Reboot for AIO Node
Ceph AIO installation with single/multiple node is not friendly for
loopback mount, especially always get deadlock issue during graceful
system reboot.

We already have `rbdmap.service` with graceful system reboot friendly as
below:

    [Unit]
    After=network-online.target
    Before=remote-fs-pre.target
    Wants=network-online.target remote-fs-pre.target

    [Service]
    ExecStart=/usr/bin/rbdmap map
    ExecReload=/usr/bin/rbdmap map
    ExecStop=/usr/bin/rbdmap unmap-all

This PR introduce:

  - `ceph-mon.target`: Ensure startup after `network-online.target` and
    before `remote-fs-pre.target`
  - `ceph-*.target`: Ensure startup after `ceph-mon.target` and before
    `remote-fs-pre.target`
  - `rbdmap.service`: Once all `_netdev` get unmount by
    `remote-fs.target`, ensure unmap all RBD BEFORE any Ceph components
    under `ceph.target` get stopped during shutdown

The logic is concept proof by
<https://github.com/alvistack/ansible-role-ceph_common/tree/develop>;
also works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple no de with loopback mount.

Also see:

  - <https://github.com/ceph/ceph/pull/36776>
  - <https://github.com/etcd-io/etcd/pull/12259>
  - <https://github.com/cri-o/cri-o/pull/4128>
  - <https://github.com/kubernetes/release/pull/1504>

Fixes: https://tracker.ceph.com/issues/47528
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
2020-09-18 11:02:26 +08:00
Jan Fajerski
bd8b8540f6 systemd/ceph-osd: ceph-osd-prestart.sh now lives in /usr/libexec
Fixes: https://tracker.ceph.com/issues/45984
Fixes: ed6552d506

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2020-06-12 14:59:07 +02:00
Patrick Donnelly
9a84d5a09b
systemd: lock down more privileges
Including:

        ProtectClock=true
        ProtectHostname=true
        ProtectKernelLogs=true
        RestrictSUIDSGID=true

Also, alphabetize [service] settings.

Finally, add some protections for
systemd/ceph-immutable-object-cache@.service.in present in our other
service files but not this one.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-05-09 14:53:05 -07:00
Wido den Hollander
8246ca02ea
systemd: Wait 5 seconds before attempting a restart of an OSD
In commit 92f8ec the RestartSec parameter was removed which now
causes systemd to restart a failed OSD immediately.

After a reboot, while the network is still coming online, this can
cause problems.

Although network-online.target should guarantee us that the network
is online it doesn't guarantee that DNS resolving works.

If mon_host points to a DNS entry it could be that this cannot be
resolved yet and thus fails to start the OSDs on boot.

Fixes: https://tracker.ceph.com/issues/42761

Signed-off-by: Wido den Hollander <wido@42on.com>
2019-11-12 10:21:45 +01:00
Ricardo Dias
1d7506fdce
systemd: ceph-mgr: set MemoryDenyWriteExecute to false
Fixes: http://tracker.ceph.com/issues/39628

Signed-off-by: Ricardo Dias <rdias@suse.com>
2019-05-09 07:36:43 +01:00
Yuan Zhou
9466d70985 build/ops: adding build spec for immutable object cache daemon
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
2019-03-22 00:16:26 +08:00
Yuan Zhou
9a7e5e0866 tools: adding ceph level immutable obj cache daemon
The daemon is built for future integration with both RBD and RGW cache.
The key components are:
- domain socket based simple IPC
- simple LRU policy based promotion/demotion for the cache
- simple file based caching store for RADOS objs with sync IO interface
- systemd service/target files for the daemon

Signed-off-by: Dehao Shang <dehao.shang@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
2019-03-22 00:16:25 +08:00
Patrick Donnelly
517670926a
systemd: lock down privileges more
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-02-07 08:45:00 -08:00
Sébastien Han
9bfc490e85 systemd: enable ceph-rbd-mirror.target
Without this the rbd-mirror units will never start after a system
reboot. The rbd-mirror unit requires ceph-rbd-mirror.target to start
since it currently does not get enabled the daemon won't start after a
reboot.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-05 18:58:43 +01:00
Gregory Farnum
c61acc5b31
Merge pull request #22349 from gregsfortytwo/wip-24368-osd-restarts
systemd: only restart 3 times in 30 minutes, as fast as possible

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-19 13:00:07 -07:00
Dan Mick
da20184a16 add ceph-crash service
ceph-crash runs from systemd and watches /var/lib/ceph/crash
for crashdumps, posting them to the mgrs using the mgr's
crash plugin

Signed-off-by: Dan Mick <dan.mick@redhat.com>
2018-08-08 18:37:43 -07:00
Ilya Dryomov
37da5d8af9 systemd/rbdmap.service: order us before remote-fs-pre.target
If "/usr/bin/rbdmap unmap-all" notices a file system mounted on top of
an rbd device, it will call umount, interfering with systemd shutdown
logic.  Make sure we aren't invoked until all _netdev mounts are dealt
with by systemd.

Fixes: http://tracker.ceph.com/issues/24713
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-29 16:36:05 +02:00
Ilya Dryomov
ae61cf680b systemd/rbdmap.service: remove a dependency on local-fs.target
We don't require anything outside of rootfs.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-28 17:09:00 +02:00
Alfredo Deza
b2962e5d57 systemd: remove ceph-disk from CMakeLists
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-06-13 15:16:22 -04:00
Alfredo Deza
22c9707cd8 systemd: remove ceph-disk service
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-06-13 15:16:22 -04:00
Greg Farnum
92f8ec5c0e systemd: only restart 3 times in 30 minutes, as fast as possible
Once upon a time, we configured our init systems to only restart an OSD 3 times
in a 30 minute period. This made sure a permanently-slow OSD would stay dead,
and that an OSD which was dying on boot (but only after a long boot process)
would not insist on rejoining the cluster for *too* long.

In 62084375fa, Boris applied these same rules to
systemd in a great bid for init system consistency. Hurray!

Sadly, Loic discovered that the great dragons udev and ceph-disk were
susceptible to races under systemd (that we apparently didn't see with
the other init systems?), and our 3x start limit was preventing the
system from sorting them out. In b3887379d6
he configured the system to allow *30* restarts in 30 minutes, but no more
frequently than every 20 seconds.

So that resolved the race issue, which was far more immediately annoying
than any concern about OSDs sometimes taking too long to die. But I've started
hearing in-person reports about OSDs not failing hard and fast when they go bad,
and I attribute some of those reports to these init system differences.

Happily, we no longer rely on udev and ceph-disk, and ceph-volume shouldn't
be susceptible to the same race, so I think we can just go back to the old way.

Partly-reverts: b3887379d6
Partly-fixes: http://tracker.ceph.com/issues/24368

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2018-05-31 15:55:51 -07:00
Kefu Chai
930f5dd23f cmake: s/sysconf/sysconfig/
it's a regression caused by 638aadf

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 14:49:44 +08:00
Kefu Chai
4865831b91 cmake,deb: set EnvironmentFile using cmake
this change also fix the EnvironmentFile specified in rbdmap.service.
without this change EnvironmentFile in rbdmap.service is always
/etc/sysconfig/ceph even on debian derived distros. after this change,
this variable is /etc/default/ceph in rbdmap.service shipped by the deb
packages.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 00:23:48 +08:00
Kefu Chai
50707e7d62 debian: install system units using cmake
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 00:23:22 +08:00
Wido den Hollander
c628cd083e
systemd: Wait 10 seconds before restarting ceph-mgr
We do this for the MON and OSD as well, wait for a few
seconds before we try to attempt a restart.

On boot in IPv6 networks it might take a few seconds longer
before a IP-address is usable and this does not allow the mgr
to start right away.

Fixes: http://tracker.ceph.com/issues/23083

Signed-off-by: Wido den Hollander <wido@42on.com>
2018-02-22 11:53:41 +01:00
Nathan Cutler
a24afcdcd3 build/ops: rpm: rip out rcceph script
"rcceph" is a SysVinit-style command-line interface for stopping, starting,
enabling, etc. all ceph-osd and ceph-mon systemd units on a machine, in one go.

Since the same functionality is provided by ceph-{osd,mon}.target, the script
is obsolete. It is also unmaintained. Judging from the absence of recent
mentions of the script online, I guess it is no longer used.

Leaving dead code in the tree can cause confusion, especially when the code is
packaged and shipped to customers. Therefore I propose to rip it out.

Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-01-15 11:22:07 +01:00
Sébastien Han
e6cd9570ba rbd-mirorr: does not start on reboot
The current systemd unit file misses 'PartOf=ceph-rbd-mirror.target',
which results in the unit not starting after reboot.
If you have ceph-rbd-mirror@rbd-mirror.ceph-rbd-mirror0, it won't start
after reboot even if enabled.
Adding 'PartOf=ceph-rbd-mirror.target' will enable
ceph-rbd-mirror.target when ceph-rbd-mirror@rbd-mirror.ceph-rbd-mirror0
gets enabled.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-26 14:05:37 +02:00
Kefu Chai
30b5b4627c Merge pull request #16494 from asomers/bin_bash
misc: Fix bash path in shebangs

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-27 10:14:14 +08:00
Alfredo Deza
b20fdb8e84 systemd: include the ceph-volume service
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-08-04 10:25:57 -04:00
Alfredo Deza
d6954111dc systemd: create a service file for ceph-volume
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-08-04 10:25:57 -04:00
Alan Somers
3aae5ca6fd scripts: fix bash path in shebangs
/bin/bash is a Linuxism.  Other operating systems install bash to
different paths.  Use /usr/bin/env in shebangs to find bash.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-07-27 13:24:26 -06:00
Sage Weil
a96ebf344f Merge pull request #15835 from SUSE/wip-flatten-systemd-target-hierarchy-master
systemd: Add explicit Before=ceph.target

Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
2017-07-03 08:34:04 -05:00
Sage Weil
70a990797f Merge pull request #15585 from dachary/wip-20229-ceph-disk-timeout
ceph-disk: set the default systemd unit timeout to 3h

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-01 09:10:59 -05:00
Tim Serong
357dfa5954 systemd: Add explicit Before=ceph.target
The PartOf= and WantedBy= directives in the various systemd
unit files and targets create the following logical hierarchy:

- ceph.target
  - ceph-fuse.target
    - ceph-fuse@.service
  - ceph-mds.target
    - ceph-mds@.service
  - ceph-mgr.target
    - ceph-mgr@.service
  - ceph-mon.target
    - ceph-mon@.service
  - ceph-osd.target
    - ceph-osd@.service
  - ceph-radosgw.target
    - ceph-radosgw@.service
  - ceph-rbd-mirror.target
    - ceph-rbd-mirror@.service

Additionally, the ceph-{fuse,mds,mon,osd,radosgw,rbd-mirror}
targets have WantedBy=multi-user.target.  This gives the
following behaviour:

- `systemctl {start,stop,restart}` of any target will restart
  all dependent services (e.g.: `systemctl restart ceph.target`
  will restart all services; `systemctl restart ceph-mon.target`
  will restart all the mons, and so forth).
- `systemctl {enable,disable}` for the second level targets
  (ceph-mon.target etc.) will cause depenent services to come
  up on boot, or not (of course the individual services can
  be enabled or disabled as well - for a service to start
  on boot, both the service and its target must be enabled;
  disabling either will cause the service to be disabled).
- `systemctl {enable,disable} ceph.target` has no effect on
  whether or not services come up at boot; if the second level
  targets and services are enabled, they'll start regardless of
  whether ceph.target is enabled.  This is due to the second
  level targets all having WantedBy=multi-user.target.
- The OSDs will always start regardless of ceph-osd.target
  (unless they are explicitly masked), thanks to udev magic.

So far, so good.  Except, several users have encountered
services not starting with the following error:

  Failed to start ceph-osd@5.service: Transaction order is
  cyclic. See system logs for details.

I've not been able to reproduce this myself in such a way as to
cause OSDs to fail to start, but I *have* managed to get systemd
into that same confused state, as follows:

- Disable ceph.target, ceph-mon.target, ceph-osd.target,
  ceph-mon@$(hostname).service and all ceph-osd instances.
- Re-enable all of the above.

At this point, everything is fine, but if I then subseqently
disable ceph.target, *then* try `systemctl restart ceph.target`,
I get "Failed to restart ceph.target: Transaction order is cyclic.
See system logs for details."

Explicitly adding Before=ceph.target to each second level target
prevents systemd from becoming confused in this situation.

Signed-off-by: Tim Serong <tserong@suse.com>
2017-06-30 17:28:29 +10:00
Sage Weil
7a3a979f3c systemd/ceph-mgr: remove automagic mgr creation hack
For kraken we auto-created mgr daemons next to mon daemons with some
systemd hackery.  This is awkward (you can't not get a new mgr daemon when
you deploy a mon), systemd-specific (not implemented for upstart on
trusty), and mostly unexpected.  Since ceph-mgr daemons are now first-class
citizens and required for every cluster, make their deployment explicit
and transparent to the administrator.  Major upgrades are a rare
opportunity to have the administrator's full attention so take advantage
of it.

This effectively reverts 61d779345e and
082199f69d (and follow-on fixes).

Fixes/avoids: http://tracker.ceph.com/issues/19994
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-29 13:39:28 -04:00
Loic Dachary
a9eb52e0a4 ceph-disk: set the default systemd unit timeout to 3h
There needs to be a timeout to prevent ceph-disk from hanging
forever. But there is no good reason to set it to a value that is less
than a few hours.

Each OSD activation needs to happen in sequence and not in parallel,
reason why there is a global activation lock.

It would be possible, when an OSD is using a device that is not
otherwise used by another OSD (i.e. they do not share an SSD journal
device etc.), to run all activations in parallel. It would however
require a more extensive modification of ceph-disk to avoid any chances
of races.

Fixes: http://tracker.ceph.com/issues/20229

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-06-08 22:29:48 +02:00
John Spray
43d26b9147 systemd: update mgr auth caps
Granting it 'allow *' on mon and osd so that
it can use MCommand to remote control daemons.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-03 13:37:52 +08:00
Sage Weil
6625fcd8fd systemd/ceph-mgr@.service: fix mgr mon cap
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Kefu Chai
c52431b390 Merge pull request #13197 from asheplyakov/master-18740
systemd/ceph-disk: make it possible to customize timeout

Reviewed-by: Loic Dachary <ldachary@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-24 15:53:17 +08:00
David Disseldorp
e58413abf4 rbdmap: unmap RBDMAPFILE images unless called with unmap-all
When called with a "map" parameter, the rbdmap script iterates the list
of images present in RBDMAPFILE (/etc/ceph/rbdmap), and maps each entry.
When called with "unmap", rbdmap currently iterates *all* mapped RBD
images and unmaps each one, regardless of whether it's listed in the
RBDMAPFILE or not.

This commit adds functionality such that only RBD images listed in the
configuration file are unmapped. This behaviour is the new default for
"rbdmap unmap". A new "unmap-all" parameter is added to offer the old
unmap-all-rbd-images behaviour, which is used by the systemd service.

Fixes: http://tracker.ceph.com/issues/18884

Signed-off-by: David Disseldorp <ddiss@suse.de>
2017-02-16 20:31:45 +01:00
Nathan Cutler
fb7dabffa3 Merge pull request #13097 from ceph/wip-osd-after-mon
systemd: Start OSDs after MONs

Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
2017-02-09 00:31:23 +01:00
Alexey Sheplyakov
22332f6bae systemd/ceph-disk: make it possible to customize timeout
When booting a server with 20+ HDDs udev has to process a *lot* of
events (especially if dm-crypt is used), and 2 minutes might be not
enough for that. Make it possible to override the timeout (via systemd
drop-in files), and use a longer timeout (5 minutes) by default.

Fixes: http://tracker.ceph.com/issues/18740
Signed-off-by: Alexey Sheplyakov <asheplyakov@mirantis.com>
2017-02-06 14:17:20 +04:00
Boris Ranto
7f4acf45dd systemd: Start OSDs after MONs
Currently, we start/stop OSDs and MONs simultaneously. This may cause
problems especially when we are shutting down the system. Once the mon
goes down it causes a re-election and the MONs can miss the message
from the OSD that is going down.

Resolves: http://tracker.ceph.com/issues/18516

Signed-off-by: Boris Ranto <branto@redhat.com>
2017-01-25 12:39:46 +01:00
Wido den Hollander
e73eb8cc1e
systemd: Restart Mon after 10s in case of failure
In some situations the IP address the Monitor wants to bind to
might not be available yet.

This might for example be a IPv6 Address which is still performing
DAD or waiting for a Router Advertisement to be send by the Router(s).

Have systemd wait for 10s before starting the Mon and increase the amount
of times it does so to 5.

This allows the system to bring up IP Addresses in the mean time while
systemd waits with restarting the Mon.

Fixes: #18635

Signed-off-by: Wido den Hollander <wido@42on.com>
2017-01-23 08:50:08 +01:00
Mark Korenberg
2ccd02a838 Fix startup of Ceph cluster manager daemon on Debian 8
Signed-off-by: Mark Korenberg <socketpair@gmail.com>
2016-12-18 18:07:21 +05:00
John Spray
63ae8579bf Merge pull request #11542 from batrick/systemd-ceph-fuse
systemd: add ceph-fuse service file

Reviewed-by: John Spray <john.spray@redhat.com>
2016-12-14 13:55:33 +00:00
Patrick Donnelly
d32d70b783
systemd: add ceph-fuse service file
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2016-12-01 19:51:37 -05:00
Loic Dachary
b3887379d6 build/ops: restart ceph-osd@.service after 20s instead of 100ms
Instead of the default 100ms pause before trying to restart an OSD, wait
20 seconds instead and retry 30 times instead of 3. There is no scenario
in which restarting an OSD almost immediately after it failed would get
a better result.

It is possible that a failure to start is due to a race with another
systemd unit at boot time. For instance if ceph-disk@.service is
delayed, it may start after the OSD that needs it. A long pause may give
the racing service enough time to complete and the next attempt to start
the OSD may succeed.

This is not a sound alternative to resolve a race, it only makes the OSD
boot process less sensitive. In the example above, the proper fix is to
enable --runtime ceph-osd@.service so that it cannot race at boot time.

The wait delay should not be minutes to preserve the current runtime
behavior. For instance, if an OSD is killed or fails and restarts after
10 minutes, it will be marked down by the ceph cluster.  This is not a
change that could break things but it is significant and should be
avoided.

Refs: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-12-01 08:28:20 +01:00
David Disseldorp
8a62cbc074 systemd/ceph-disk: reduce ceph-disk flock contention
"ceph-disk trigger" invocation is currently performed in a mutually
exclusive fashion, with each call first taking an flock on the path
/var/lock/ceph-disk. On systems with a lot of osds, this leads to a
large amount of lock contention during boot-up, and can cause some
service instances to trip the 120 second timeout.

Take an flock on a device specific path instead of /var/lock/ceph-disk,
so that concurrent "ceph-disk trigger" invocations are permitted for
independent osds. This greatly reduces lock contention and consequently
the chance of service timeout. Per-device concurrency restrictions
required for http://tracker.ceph.com/issues/13160 are maintained.

Fixes: http://tracker.ceph.com/issues/18049

Signed-off-by: David Disseldorp <ddiss@suse.de>
2016-11-28 17:55:39 +01:00
Loic Dachary
d954de5546 ceph-disk: systemd unit must run after local-fs.target
A ceph udev action may be triggered before the local file systems are
mounted because there is no ordering in udev. The ceph udev action
delegates asynchronously to systemd via ceph-disk@.service which will
fail if (for instance) the LVM partition required to mount /var/lib/ceph
is not available yet. The systemd unit will retry a few times but will
eventually fail permanently. The sysadmin can systemctl reset-fail at a
later time and it will succeed.

Add a dependency to ceph-disk@.service so that it waits until the local
file systems are mounted:

After=local-fs.target

Since local-fs.target depends on lvm, it will wait until the lvm
partition (as well as any dm devices) is ready and mounted before
attempting to activate the OSD. It may still fail because the
corresponding journal/data partition is not ready yet (which is
expected) but it will no longer fail because the lvm/filesystems/dm are
not ready.

Fixes: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-22 15:23:47 +01:00
Owen Synge
639385a7f4 systemd/CMakeLists.txt:Remove ceph-create-keys cmake
ceph-create-keys should not be started on boot of mons with systemd so should
not exist as 'After' or 'Wants' for the ceph-mon.service

Signed-off-by: Owen Synge <osynge@suse.com>
2016-11-04 23:05:44 +01:00
Owen Synge
dc5fe8d415 systemd/ceph-mon@.service:Remove ceph-create-keys for mon in systemd
ceph-create-keys should not be started on boot of mons with systemd so should
not exist as 'After' or 'Wants' for the ceph-mon.service

Signed-off-by: Owen Synge <osynge@suse.com>
2016-11-04 23:05:26 +01:00
Owen Synge
8bcb4646b6 systemd/ceph-create-keys@.service:Remove ceph-create-keys for systemd
ceph-create-keys should not be started on boot of mons with systemd so should
not exist in the systemd files

Signed-off-by: Owen Synge <osynge@suse.com>
2016-11-04 23:05:17 +01:00
Tim Serong
082199f69d systemd: autogenerate ceph-mgr key during daemon startup
This is a hack to inject a key for the mgr daemon, using whatever
key already exists on the mon on this node to gain sufficient
permissions to create the mgr key.  Failure is ignored at every
step (the '-' prefix) in case someone has already used some other
trick to set everything up manually.

Signed-off-by: Tim Serong <tserong@suse.com>
2016-09-29 17:27:08 +01:00