mirror of
https://github.com/ceph/ceph
synced 2025-01-02 09:02:34 +00:00
ReplicatedPG: trim backfill intervals based on peer's last_backfill_started
Otherwise, we fail to trim the peer's last_backfill_started and get bug 11199. 1) osd 4 backfills up to 31bccdb2/mira01213209-286/head (henceforth: foo) 2) Interval change happens 3) osd 0 now finds itself backfilling to 4 (lb=foo) and osd.5 (lb=b6670ba2/mira01213209-160/snapdir//1, henceforth: bar) 4) recover_backfill causes both 4 and 5 to scan forward, so 4 has an interval starting at foo, 5 has an interval starting at bar. 5) Once those have come back, recover_backfill attempts to trim off the last_backfill_started, but 4's interval starts after that, so foo remains in osd 4's interval (this is the bug) 7) We serve a copyfrom on foo (sent to 4 as well). 8) We eventually get to foo in the backfilling. Normally, they would have the same version, but of course we don't update osd.4's interval from the log since it should not have received writes in that interval. Thus, we end up trying to recover foo on osd.4 anyway. 9) But, an interval change happens between removing foo from osd.4 and completing the recovery, leaving osd.4 without foo, but with lb >= foo Fixes: #11199 Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
This commit is contained in:
parent
4dbb9c872e
commit
1388d6bd94
@ -11192,7 +11192,8 @@ int ReplicatedPG::recover_backfill(
|
||||
for (set<pg_shard_t>::iterator i = backfill_targets.begin();
|
||||
i != backfill_targets.end();
|
||||
++i) {
|
||||
peer_backfill_info[*i].trim_to(last_backfill_started);
|
||||
peer_backfill_info[*i].trim_to(
|
||||
MAX(peer_info[*i].last_backfill, last_backfill_started));
|
||||
}
|
||||
backfill_info.trim_to(last_backfill_started);
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user