Commit Graph

15 Commits

Author SHA1 Message Date
Wong Hoi Sing Edison
85bc551b17
systemd: remove ProtectClock=true for ceph-osd@.service
Ceph 16.2.0 Pacific by https://github.com/ceph/ceph/commit/9a84d5a introduce following new systemd restriction:

    ProtectClock=true
    ProtectHostname=true
    ProtectKernelLogs=true
    RestrictSUIDSGID=true

BTW, `ceph-osd@.service` failed with `ProtectClock=true` unexpectly, also see:

  - <https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/TNBGGNN6STGDKARAQTQCIPTU4KLIVJQV/>
  - <https://serverfault.com/questions/1059317/bluestore-var-lib-ceph-osd-ceph-2-block-read-bdev-label-failed-to-open-var-l>

This PR intruduce:

  - Remove `ProtectClock=true` for our systemd service templates

Fixes: https://tracker.ceph.com/issues/50347
Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>
2021-04-14 22:19:49 +08:00
Wong Hoi Sing Edison
d88c834ea4
systemd: Support Graceful Reboot for AIO Node
Ceph AIO installation with single/multiple node is not friendly for
loopback mount, especially always get deadlock issue during graceful
system reboot.

We already have `rbdmap.service` with graceful system reboot friendly as
below:

    [Unit]
    After=network-online.target
    Before=remote-fs-pre.target
    Wants=network-online.target remote-fs-pre.target

    [Service]
    ExecStart=/usr/bin/rbdmap map
    ExecReload=/usr/bin/rbdmap map
    ExecStop=/usr/bin/rbdmap unmap-all

This PR introduce:

  - `ceph-mon.target`: Ensure startup after `network-online.target` and
    before `remote-fs-pre.target`
  - `ceph-*.target`: Ensure startup after `ceph-mon.target` and before
    `remote-fs-pre.target`
  - `rbdmap.service`: Once all `_netdev` get unmount by
    `remote-fs.target`, ensure unmap all RBD BEFORE any Ceph components
    under `ceph.target` get stopped during shutdown

The logic is concept proof by
<https://github.com/alvistack/ansible-role-ceph_common/tree/develop>;
also works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple no de with loopback mount.

Also see:

  - <https://github.com/ceph/ceph/pull/36776>
  - <https://github.com/etcd-io/etcd/pull/12259>
  - <https://github.com/cri-o/cri-o/pull/4128>
  - <https://github.com/kubernetes/release/pull/1504>

Fixes: https://tracker.ceph.com/issues/47528
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
2020-09-18 11:02:26 +08:00
Jan Fajerski
bd8b8540f6 systemd/ceph-osd: ceph-osd-prestart.sh now lives in /usr/libexec
Fixes: https://tracker.ceph.com/issues/45984
Fixes: ed6552d506

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2020-06-12 14:59:07 +02:00
Patrick Donnelly
9a84d5a09b
systemd: lock down more privileges
Including:

        ProtectClock=true
        ProtectHostname=true
        ProtectKernelLogs=true
        RestrictSUIDSGID=true

Also, alphabetize [service] settings.

Finally, add some protections for
systemd/ceph-immutable-object-cache@.service.in present in our other
service files but not this one.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-05-09 14:53:05 -07:00
Wido den Hollander
8246ca02ea
systemd: Wait 5 seconds before attempting a restart of an OSD
In commit 92f8ec the RestartSec parameter was removed which now
causes systemd to restart a failed OSD immediately.

After a reboot, while the network is still coming online, this can
cause problems.

Although network-online.target should guarantee us that the network
is online it doesn't guarantee that DNS resolving works.

If mon_host points to a DNS entry it could be that this cannot be
resolved yet and thus fails to start the OSDs on boot.

Fixes: https://tracker.ceph.com/issues/42761

Signed-off-by: Wido den Hollander <wido@42on.com>
2019-11-12 10:21:45 +01:00
Patrick Donnelly
517670926a
systemd: lock down privileges more
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-02-07 08:45:00 -08:00
Greg Farnum
92f8ec5c0e systemd: only restart 3 times in 30 minutes, as fast as possible
Once upon a time, we configured our init systems to only restart an OSD 3 times
in a 30 minute period. This made sure a permanently-slow OSD would stay dead,
and that an OSD which was dying on boot (but only after a long boot process)
would not insist on rejoining the cluster for *too* long.

In 62084375fa, Boris applied these same rules to
systemd in a great bid for init system consistency. Hurray!

Sadly, Loic discovered that the great dragons udev and ceph-disk were
susceptible to races under systemd (that we apparently didn't see with
the other init systems?), and our 3x start limit was preventing the
system from sorting them out. In b3887379d6
he configured the system to allow *30* restarts in 30 minutes, but no more
frequently than every 20 seconds.

So that resolved the race issue, which was far more immediately annoying
than any concern about OSDs sometimes taking too long to die. But I've started
hearing in-person reports about OSDs not failing hard and fast when they go bad,
and I attribute some of those reports to these init system differences.

Happily, we no longer rely on udev and ceph-disk, and ceph-volume shouldn't
be susceptible to the same race, so I think we can just go back to the old way.

Partly-reverts: b3887379d6
Partly-fixes: http://tracker.ceph.com/issues/24368

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2018-05-31 15:55:51 -07:00
Kefu Chai
4865831b91 cmake,deb: set EnvironmentFile using cmake
this change also fix the EnvironmentFile specified in rbdmap.service.
without this change EnvironmentFile in rbdmap.service is always
/etc/sysconfig/ceph even on debian derived distros. after this change,
this variable is /etc/default/ceph in rbdmap.service shipped by the deb
packages.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 00:23:48 +08:00
Sage Weil
367c794cb1 systemd: no need to preprocess ceph-osd@service
This used to be necessary but now is not.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-14 14:00:26 -04:00
Sage Weil
8453a89cb2 systemd: set nofile limit in unit files
Make it big so hopefully nobody has to change it.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-14 14:00:26 -04:00
Sage Weil
8f3185bade systemd: use --setuser and --setgroup for all daemons
Allow all daemons drop privilege themselves, instead of letting
systemd do it.

Among other things, this means that admins can conditionally not
drop prives by setting

  setuser match path = /var/lib/ceph/$type/$cluster-$id

in their ceph.conf to ease the pain of upgrade.

Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
2015-08-26 20:34:15 -04:00
Sage Weil
c7ee798a0f set nofile ulimit in /etc/security/limits.d/ceph only
Specify the nofile ulimit in one standard place, where everyone expects it
to be.  Drop it from the ceph-osd unit file.

Leave upstart and sysvinit untouched for the time being to avoid compat
issues.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-26 20:34:15 -04:00
Sage Weil
7c9fdf44f2 systemd: make ceph-osd setuid/gid to ceph:ceph
Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-26 20:34:15 -04:00
Nathan Cutler
05424a803b logrotate.conf: fixes for systemd
Before this patch, the command 'logrotate -f /etc/logrotate.d/ceph'
was generating an error "Failed to reload ceph.target: Job type reload is not
applicable for unit ceph.target".

Before we issue systemctl reload, check that there is at least
one active ceph-* service. (The hyphen is significant.)

Since we use grep, make the grep package a dependency.

http://tracker.ceph.com/issues/12173 Fixes: #12173

Signed-off-by: Tim Serong <tserong@suse.com>
Signed-off-by: Lars Marowsky-Bree <lmb@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2015-06-26 19:43:44 +02:00
Owen Synge
ac347dc340 Template systemd/ceph-osd@.service with autotools,
The libexec path is different for different distributions.
systemd. This path is defined by a new variable on the
configure path.

This variable can be set with enviroment SYSTEMD_LIBEXEC_DIR.
The parameter --with-systemd-libexec-dir overrides the enviroment
variable.

Appropriate conditionals are set for SUSE and RHEL derivatives.

This is then used to template out systemd/ceph-osd@.service

Signed-off-by: Owen Synge <osynge@suse.com>
2015-05-26 19:04:22 +02:00