This bug was about filtering missing and divergent when doing a partial
PG import. We don't support partial PG imports any more, so this can
go away!
Signed-off-by: Sage Weil <sage@redhat.com>
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.
- In mimic, as of 2347ecb961, we brought the
PG up to date while updating past_intervals. (At the same time we removed
the OSD's parallel past_intervals regeneration.)
The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges. Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.
This patch changes it so we import the PG and leave the pg_epoch matching
the import file. The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.
We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.
Fixes: http://tracker.ceph.com/issues/35955
Signed-off-by: Sage Weil <sage@redhat.com>
In the auth_none case, we were exiting the get_monmap_and_config() loop
early, before we got a monmap, because the default constructed monmap
did not have the mimic feature. Make sure we wait for both the monmap
and config.
Signed-off-by: Sage Weil <sage@redhat.com>
Remove the "ceph_assert" statements and instead bubble any potential
error code up to the caller. The object map state machines should
attempt to return a 0 upon failure unless it was unable to flag the
object map as invalid.
Fixes: http://tracker.ceph.com/issues/36074
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This was similar (but different) to the logic in PG::merge_from(). Do not
do any initialization here, and instead rely on merge_from() to do the
right thing.
Signed-off-by: Sage Weil <sage@redhat.com>
Having an accurate(ish) same_interval_since is important for making sure
any subsequent PastIntervals we add are consistent with the
last_epoch_clean value that the bounds are tested against. Otherwise we
might have lec 100 and merge in 150, an interval changes gives us a pi of
[150,something) and we fail the bounds check.
Signed-off-by: Sage Weil <sage@redhat.com>
If we move into the premerge period (pg_num_pending < pg_num), that is a
new interval. Moving out again (canceling the merge) is also a new
interval.
Signed-off-by: Sage Weil <sage@redhat.com>
If the PG is undersized, cancel the PG merge attempt early. Undersized is
a bad thing because it makes merge more dangerous.
It's also bad because the PG won't be fully clean when it finishes
peering, which means last_epoch_clean can be something far in the past,
and past_intervals won't be empty. Since we also take the past_intervals
from the source PG, we want to be confident that it is valid. It *should*
match up with the target PG since they should have mapped to the same
OSDs since they were both clean at the ReadyToMerge point--in fact, they
should both be empty. If a PG mapping change snuck in such that they did
map somewhere else, though, the same set of mapping changes will have
applied to both the source and target, so it should be safe.
(It would be better of the mon rejected the ReadyToMerge if the
mapping with the latest OSDMap has changed since the message was sent.
If we do that the situation is even better, but this change is still
appropriate.)
Signed-off-by: Sage Weil <sage@redhat.com>
Keep it all in one place (try_mark_clean()). The key behavioral change
is that we update last_epoch_clean and last_epoch_started when we are
peered too, only only when we are active.
Signed-off-by: Sage Weil <sage@redhat.com>
If we are fabricating the pg history values, we need something that is
reasonably valid, but that won't screw up peering of the PG by indicating
that the PG has peered at some point later than when it really has.
Otherwise we can end up in a situation where everyone thinks there is a
newer pg info out there that doesn't actually exist, and the PG will end
up as incomplete.
Signed-off-by: Sage Weil <sage@redhat.com>
Change the content of this field to be the last_epoch_clean for the PG
when it tells the mon it is ready to be merged.
This will later be used to populate the last_epoch_clean and
last_epoch_started fields in PG::merge_from() in certain corner cases.
Signed-off-by: Sage Weil <sage@redhat.com>
mgr/dashboard: Catch LookupError when checking the RGW status
Reviewed-by: Patrick Nawracay <pnawracay@suse.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
In https://github.com/ceph/ceph/pull/21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:
```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```
The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.
https://github.com/ceph/ceph/pull/18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Now bluestore oncommit callback exec by osd op threads.
If there are multi threads of shard, it will cause out-of order.
For example, threads_per_shard=2
Thread1 Thread2
swap_oncommits(op1_oncommit)
swap_oncommits(op2_oncommit)
OpQueueItem.run(Op3)
op2_oncommit.complete();
op1_oncommit.complete()
This make oncommits out of order.
To avoiding this, we choose a fixed thread which has the smallest
thread_index of shard to do oncommit callback function.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
osd/PG: async-recovery should respect historical missing objects
Reviewed-by: Yan Jun <yan.jun8@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>