otherwise ceph_test_rados_api_stat: LibRadosStat.PoolStat will always
timeout once the cluster is switched to luminous
Signed-off-by: Kefu Chai <kchai@redhat.com>
otherwise ceph_test_rados_api_stat: LibRadosStat.ClusterStat will always
timeout once the cluster is switched to luminous
Signed-off-by: Kefu Chai <kchai@redhat.com>
we cannot apply pending_inc twice and expect the result is the same. in
other words, pg_map.apply_incremental(pending_inc) is not an idempotent
operation.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Use a flat_map with pointers into a buffer with the actual data. For a
decoded mapping, we have just two allocations (one for flat_map and one
for the encoded buffer).
This can get slow if you make lots of incremental changes after the fact
since flat_map is not efficient for modifications at large sizes. :/
Signed-off-by: Sage Weil <sage@redhat.com>
Also, count "not active" (inactive) pgs instead of active so that we
list "bad" things consistently, and so that 'inactive' is a separate
bucket of pgs than the 'unknown' ones.
Signed-off-by: Sage Weil <sage@redhat.com>
This is a goofy workaround that we're also doing in Mgr::init(). Someday
we should come up with a more elegant solution. In the meantime, this
works just fine!
Signed-off-by: Sage Weil <sage@redhat.com>
We want to drop updates for pgs for pools that don't exist. Keep an
updated set of those pools instead of relying on the previous PGMap
having them instantiated. (The previous map may drift due to bugs.)
Signed-off-by: Sage Weil <sage@redhat.com>
We were doing an incremental per osd stat report; this screws up the
delta stats updates when there are more than a handful of OSDs. Instead,
do it with the same period as the mgr->mon reports.
Signed-off-by: Sage Weil <sage@redhat.com>
If we have a huge pool it may take a while for the PGs to get out of the
queue and be created. If we use the epoch the pool was created it may
mean a lot of old OSDMaps the OSD has to process. If we use the current
epoch (the first epoch in which any OSD learned that this PG should
exist) we limit PastIntervals as much as possible.
It is still possible that we start trying to create a PG but the cluster
is unhealthy for a long time, resulting in a long PastIntervals that
needs to be generated by a primary OSD when it eventually comes up. So
this only partially
Partially-fixes: http://tracker.ceph.com/issues/20050
Signed-off-by: Sage Weil <sage@redhat.com>
I don't see any noticeable load on bigbang cluster, so let's bump this up
a bit. Not being super aggressive here, though, since pool creation is so
rare and who really cares if ginormous clusters take a few minutes to
create all the PGs; better to make sure the mon is happy and responsive
during setup.
Signed-off-by: Sage Weil <sage@redhat.com>
I'm not sure why this didn't bite us earlier, but there is an assert
in apply_incremental (not used in preluminous mon) and an implicit
dereference in PGMonitor::encode_pending (maybe didn't cause crash?)
that will trigger if we have an osd_stat_updates record without a matching
osd_epochs update. Maybe there is some subtle reason why the osd_epochs
update happens elsewhere in master (it doesn't on the mgr), but my guess
is we were silently dereferencing the invalid iterator and not noticing.
Anyway, it's easy to fix. We use the epoch from the previous PGMap.
Signed-off-by: Sage Weil <sage@redhat.com>
Instantiate barebones pg records (creating+stale) in our PGMap when pgs
are created. These will switch to 'creating' when the pgs is in the
process of creating, and peering etc. The 'stale' is an indicator that
the mon may not have even asked the pg to create them yet.
All of the old meticulous tracking in PGMap for mappings for creating
pgs is useless to us; OSDMonitor has new code to handle it. This is
fast and simple.
Signed-off-by: Sage Weil <sage@redhat.com>
The previous version takes an Incremental and requires that we see
every single consecutive map in the history. This version is mgr-friendly
and just takes the latest OSDMap. It's a bit simpler too because it
ignores the full/nearfull (legacy preluminous) and last_osd_report.
Signed-off-by: Sage Weil <sage@redhat.com>
There are two cases where we spew health detail warnings for potentially
every pg. Cap those detail messages at 50 and, if we exceed that, include
a message saying how many more there are. This avoids huge lists of
detail messages going from the mgr to mon and also makes life better for
users of the health detail api.
Signed-off-by: Sage Weil <sage@redhat.com>
We don't actually need any of these older states at all so I hard-coded
a constant (oh no!). In reality it doesn't matter what it is anyway
since PaxosService waits for paxos_service_trim_min (=250) to accumulate
before removing anything.
Signed-off-by: Sage Weil <sage@redhat.com>