RepoMirrors/ceph - ceph

Commit Graph

Author	SHA1	Message	Date
Wong Hoi Sing Edison	d88c834ea4	systemd: Support Graceful Reboot for AIO Node Ceph AIO installation with single/multiple node is not friendly for loopback mount, especially always get deadlock issue during graceful system reboot. We already have `rbdmap.service` with graceful system reboot friendly as below: [Unit] After=network-online.target Before=remote-fs-pre.target Wants=network-online.target remote-fs-pre.target [Service] ExecStart=/usr/bin/rbdmap map ExecReload=/usr/bin/rbdmap map ExecStop=/usr/bin/rbdmap unmap-all This PR introduce: - `ceph-mon.target`: Ensure startup after `network-online.target` and before `remote-fs-pre.target` - `ceph-*.target`: Ensure startup after `ceph-mon.target` and before `remote-fs-pre.target` - `rbdmap.service`: Once all `_netdev` get unmount by `remote-fs.target`, ensure unmap all RBD BEFORE any Ceph components under `ceph.target` get stopped during shutdown The logic is concept proof by <https://github.com/alvistack/ansible-role-ceph_common/tree/develop>; also works as expected with Ceph + Kubernetes deployment by <https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>. No more deadlock happened during graceful system reboot, both AIO single/multiple no de with loopback mount. Also see: - <https://github.com/ceph/ceph/pull/36776> - <https://github.com/etcd-io/etcd/pull/12259> - <https://github.com/cri-o/cri-o/pull/4128> - <https://github.com/kubernetes/release/pull/1504> Fixes: https://tracker.ceph.com/issues/47528 Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>	2020-09-18 11:02:26 +08:00
Tim Serong	357dfa5954	systemd: Add explicit Before=ceph.target The PartOf= and WantedBy= directives in the various systemd unit files and targets create the following logical hierarchy: - ceph.target - ceph-fuse.target - ceph-fuse@.service - ceph-mds.target - ceph-mds@.service - ceph-mgr.target - ceph-mgr@.service - ceph-mon.target - ceph-mon@.service - ceph-osd.target - ceph-osd@.service - ceph-radosgw.target - ceph-radosgw@.service - ceph-rbd-mirror.target - ceph-rbd-mirror@.service Additionally, the ceph-{fuse,mds,mon,osd,radosgw,rbd-mirror} targets have WantedBy=multi-user.target. This gives the following behaviour: - `systemctl {start,stop,restart}` of any target will restart all dependent services (e.g.: `systemctl restart ceph.target` will restart all services; `systemctl restart ceph-mon.target` will restart all the mons, and so forth). - `systemctl {enable,disable}` for the second level targets (ceph-mon.target etc.) will cause depenent services to come up on boot, or not (of course the individual services can be enabled or disabled as well - for a service to start on boot, both the service and its target must be enabled; disabling either will cause the service to be disabled). - `systemctl {enable,disable} ceph.target` has no effect on whether or not services come up at boot; if the second level targets and services are enabled, they'll start regardless of whether ceph.target is enabled. This is due to the second level targets all having WantedBy=multi-user.target. - The OSDs will always start regardless of ceph-osd.target (unless they are explicitly masked), thanks to udev magic. So far, so good. Except, several users have encountered services not starting with the following error: Failed to start ceph-osd@5.service: Transaction order is cyclic. See system logs for details. I've not been able to reproduce this myself in such a way as to cause OSDs to fail to start, but I have managed to get systemd into that same confused state, as follows: - Disable ceph.target, ceph-mon.target, ceph-osd.target, ceph-mon@$(hostname).service and all ceph-osd instances. - Re-enable all of the above. At this point, everything is fine, but if I then subseqently disable ceph.target, then try `systemctl restart ceph.target`, I get "Failed to restart ceph.target: Transaction order is cyclic. See system logs for details." Explicitly adding Before=ceph.target to each second level target prevents systemd from becoming confused in this situation. Signed-off-by: Tim Serong <tserong@suse.com>	2017-06-30 17:28:29 +10:00
Jason Dillaman	8a0e47281f	systemd: new ceph-rbd-mirror scripts Signed-off-by: Jason Dillaman <dillaman@redhat.com>	2016-03-18 17:51:23 -04:00

Author

SHA1

Message

Date

Wong Hoi Sing Edison

d88c834ea4

systemd: Support Graceful Reboot for AIO Node

Ceph AIO installation with single/multiple node is not friendly for
loopback mount, especially always get deadlock issue during graceful
system reboot.

We already have `rbdmap.service` with graceful system reboot friendly as
below:

    [Unit]
    After=network-online.target
    Before=remote-fs-pre.target
    Wants=network-online.target remote-fs-pre.target

    [Service]
    ExecStart=/usr/bin/rbdmap map
    ExecReload=/usr/bin/rbdmap map
    ExecStop=/usr/bin/rbdmap unmap-all

This PR introduce:

  - `ceph-mon.target`: Ensure startup after `network-online.target` and
    before `remote-fs-pre.target`
  - `ceph-*.target`: Ensure startup after `ceph-mon.target` and before
    `remote-fs-pre.target`
  - `rbdmap.service`: Once all `_netdev` get unmount by
    `remote-fs.target`, ensure unmap all RBD BEFORE any Ceph components
    under `ceph.target` get stopped during shutdown

The logic is concept proof by
<https://github.com/alvistack/ansible-role-ceph_common/tree/develop>;
also works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple no de with loopback mount.

Also see:

  - <https://github.com/ceph/ceph/pull/36776>
  - <https://github.com/etcd-io/etcd/pull/12259>
  - <https://github.com/cri-o/cri-o/pull/4128>
  - <https://github.com/kubernetes/release/pull/1504>

Fixes: https://tracker.ceph.com/issues/47528
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>

2020-09-18 11:02:26 +08:00

Tim Serong

357dfa5954

systemd: Add explicit Before=ceph.target

The PartOf= and WantedBy= directives in the various systemd
unit files and targets create the following logical hierarchy:

- ceph.target
  - ceph-fuse.target
    - ceph-fuse@.service
  - ceph-mds.target
    - ceph-mds@.service
  - ceph-mgr.target
    - ceph-mgr@.service
  - ceph-mon.target
    - ceph-mon@.service
  - ceph-osd.target
    - ceph-osd@.service
  - ceph-radosgw.target
    - ceph-radosgw@.service
  - ceph-rbd-mirror.target
    - ceph-rbd-mirror@.service

Additionally, the ceph-{fuse,mds,mon,osd,radosgw,rbd-mirror}
targets have WantedBy=multi-user.target.  This gives the
following behaviour:

- `systemctl {start,stop,restart}` of any target will restart
  all dependent services (e.g.: `systemctl restart ceph.target`
  will restart all services; `systemctl restart ceph-mon.target`
  will restart all the mons, and so forth).
- `systemctl {enable,disable}` for the second level targets
  (ceph-mon.target etc.) will cause depenent services to come
  up on boot, or not (of course the individual services can
  be enabled or disabled as well - for a service to start
  on boot, both the service and its target must be enabled;
  disabling either will cause the service to be disabled).
- `systemctl {enable,disable} ceph.target` has no effect on
  whether or not services come up at boot; if the second level
  targets and services are enabled, they'll start regardless of
  whether ceph.target is enabled.  This is due to the second
  level targets all having WantedBy=multi-user.target.
- The OSDs will always start regardless of ceph-osd.target
  (unless they are explicitly masked), thanks to udev magic.

So far, so good.  Except, several users have encountered
services not starting with the following error:

  Failed to start ceph-osd@5.service: Transaction order is
  cyclic. See system logs for details.

I've not been able to reproduce this myself in such a way as to
cause OSDs to fail to start, but I *have* managed to get systemd
into that same confused state, as follows:

- Disable ceph.target, ceph-mon.target, ceph-osd.target,
  ceph-mon@$(hostname).service and all ceph-osd instances.
- Re-enable all of the above.

At this point, everything is fine, but if I then subseqently
disable ceph.target, *then* try `systemctl restart ceph.target`,
I get "Failed to restart ceph.target: Transaction order is cyclic.
See system logs for details."

Explicitly adding Before=ceph.target to each second level target
prevents systemd from becoming confused in this situation.

Signed-off-by: Tim Serong <tserong@suse.com>

2017-06-30 17:28:29 +10:00

Jason Dillaman

8a0e47281f

systemd: new ceph-rbd-mirror scripts

Signed-off-by: Jason Dillaman <dillaman@redhat.com>

2016-03-18 17:51:23 -04:00

3 Commits