mon/OSDMonitor: share new maps with even non-active osds

OSDs may not be aware of their deadness and trapped at
an obsolete map in which they were still marked as up:

```
host        osd     down_at     stuck_at
ceph-03     9       e712        e711
ceph-03     13      e700        e699
ceph-03     28      e697        e696
ceph-03     48      e697        e696
ceph-03     52      e707        e704
ceph-03     61      e710        e708
ceph-03     73      e712        e710
ceph-03     77      e708        e707

ceph-05     12      e711        e710
ceph-05     21      e703        e702
ceph-05     24      e700        e699
ceph-05     29      e703        e699
ceph-05     41      e711        e710
ceph-05     53      e711        e710
ceph-05     72      e712        e711

```

In https://github.com/ceph/ceph/pull/23958 an OSD will ping monitor
periodically now if it is stuck at __wait_for_healthy__. But in the
above case OSDs are still considering themselves as __active__ and
hence should miss that fixer.

Since these OSDs might be still able to contact with monitors (
otherwise there is no way for them to be marked up again) and send
beacons contiguously, we can simply get them out of the trap by
sharing some new maps with them.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: runsisi <runsisi@zte.com.cn>
This commit is contained in:
xie xingguo 2018-09-10 15:15:17 +08:00
parent 5a3344f0e5
commit 79f480442f

View File

@ -3572,6 +3572,11 @@ bool OSDMonitor::prepare_beacon(MonOpRequestRef op)
if (!src.is_osd() ||
!osdmap.is_up(from) ||
beacon->get_orig_source_addrs() != osdmap.get_addrs(from)) {
if (src.is_osd() && !osdmap.is_up(from)) {
// share some new maps with this guy in case it may not be
// aware of its own deadness...
send_latest(op, beacon->version+1);
}
dout(1) << " ignoring beacon from non-active osd." << from << dendl;
return false;
}