I don't see any noticeable load on bigbang cluster, so let's bump this up
a bit. Not being super aggressive here, though, since pool creation is so
rare and who really cares if ginormous clusters take a few minutes to
create all the PGs; better to make sure the mon is happy and responsive
during setup.
Signed-off-by: Sage Weil <sage@redhat.com>
I'm not sure why this didn't bite us earlier, but there is an assert
in apply_incremental (not used in preluminous mon) and an implicit
dereference in PGMonitor::encode_pending (maybe didn't cause crash?)
that will trigger if we have an osd_stat_updates record without a matching
osd_epochs update. Maybe there is some subtle reason why the osd_epochs
update happens elsewhere in master (it doesn't on the mgr), but my guess
is we were silently dereferencing the invalid iterator and not noticing.
Anyway, it's easy to fix. We use the epoch from the previous PGMap.
Signed-off-by: Sage Weil <sage@redhat.com>
Instantiate barebones pg records (creating+stale) in our PGMap when pgs
are created. These will switch to 'creating' when the pgs is in the
process of creating, and peering etc. The 'stale' is an indicator that
the mon may not have even asked the pg to create them yet.
All of the old meticulous tracking in PGMap for mappings for creating
pgs is useless to us; OSDMonitor has new code to handle it. This is
fast and simple.
Signed-off-by: Sage Weil <sage@redhat.com>
The previous version takes an Incremental and requires that we see
every single consecutive map in the history. This version is mgr-friendly
and just takes the latest OSDMap. It's a bit simpler too because it
ignores the full/nearfull (legacy preluminous) and last_osd_report.
Signed-off-by: Sage Weil <sage@redhat.com>
There are two cases where we spew health detail warnings for potentially
every pg. Cap those detail messages at 50 and, if we exceed that, include
a message saying how many more there are. This avoids huge lists of
detail messages going from the mgr to mon and also makes life better for
users of the health detail api.
Signed-off-by: Sage Weil <sage@redhat.com>
We don't actually need any of these older states at all so I hard-coded
a constant (oh no!). In reality it doesn't matter what it is anyway
since PaxosService waits for paxos_service_trim_min (=250) to accumulate
before removing anything.
Signed-off-by: Sage Weil <sage@redhat.com>
* extract send_report() out of tick() so it can be reused.
* add a commmand "mgr report-mon" for mgr, so we are able to flush the
the mgr stats to mon actively without waiting for the tick. this
could help with the tests.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The helper gets a sequence number from the osd (or osds), and then
polls the mon until that seq is reflected there.
This is overkill in some cases, since many tests only require that the
stats be reflected on the mgr (not the mon), but waiting for it to also
reach the mon is sufficient!
Signed-off-by: Sage Weil <sage@redhat.com>
This is, strictly speaking, redundant, since the osd_stat is also in the
digest, but we plan to remove that.
Signed-off-by: Sage Weil <sage@redhat.com>
Report a sequence number when we flush_pg_stats. Combine the up_from and
a per-boot seq number to get a monotonically increasing value across OSD
restarts (we assume less than 4 billion stats reports in a single epoch).
Signed-off-by: Sage Weil <sage@redhat.com>
Once the OSDMap flag is set there is no going back. Zero out the on-disk
PGMap data, and clear the in-memory PGMap to free up memory and make
bugs easier to spot.
Signed-off-by: Sage Weil <sage@redhat.com>
There is overhead for PGs we are creating because the mon has to track
which OSD each one current maps to. This can be problematic on a very
large cluster. Limit the overhead by setting a cap on the number of PGs
we are creating at once; leave the rest in a persistent queue.
Signed-off-by: Sage Weil <sage@redhat.com>
We are not persisiting the updated creating_pgs here; this is bad! I'm
not sure why it was there to begin with?
Signed-off-by: Sage Weil <sage@redhat.com>
I'm not rewriting this to use range iterator syntax because I'm in a
hurry. This just lets me change the types without touching all this code
again.
Signed-off-by: Sage Weil <sage@redhat.com>