Commit Graph

73431 Commits

Author SHA1 Message Date
Kefu Chai
d1eca5f669 mon/OSDMonitor: print pgid before looking it up in mapping
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:48 -04:00
Sage Weil
f42182c5e0 mon: handle MGetPoolStats using PGStatService
otherwise ceph_test_rados_api_stat: LibRadosStat.PoolStat will always
timeout once the cluster is switched to luminous

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:47 -04:00
Sage Weil
2de0e07c40 mon: handle MStatfs using PGStatService
otherwise ceph_test_rados_api_stat: LibRadosStat.ClusterStat will always
timeout once the cluster is switched to luminous

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:47 -04:00
Sage Weil
6a68877f59 mon/PGMap: strip out PGMapDigest compat cruft
This was needed for bigbang testing, but not for the final version.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:06:47 -04:00
Kefu Chai
aad1afcb96 mgr: reset pending_inc after applying it
we cannot apply pending_inc twice and expect the result is the same. in
other words, pg_map.apply_incremental(pending_inc) is not an idempotent
operation.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:47 -04:00
Sage Weil
af2994dcae osd: do_shutdown takes precedence over fetching more maps
This is making my osd-markdown.sh test fail reliably.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:06:47 -04:00
Sage Weil
ebc496d7f4 osd/OSDMap: more efficient PGMapTemp
Use a flat_map with pointers into a buffer with the actual data.  For a
decoded mapping, we have just two allocations (one for flat_map and one
for the encoded buffer).

This can get slow if you make lots of incremental changes after the fact
since flat_map is not efficient for modifications at large sizes.  :/

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:06:46 -04:00
Kefu Chai
60a30ff9d9 mon: print log for the creating_pgs changes
print more log messages when updating creating_pgs.

see-also: http://tracker.ceph.com/issues/20067
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:46 -04:00
Sage Weil
9332bb9be4 mgr: mgr_tick_period = 2
5 seconds is driving me nuts.  We cap the health message size so the
digest is now small and lightweight.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:06:46 -04:00
Sage Weil
2f881dbf2c mon/PGMap: count 'unknown' pgs
Also, count "not active" (inactive) pgs instead of active so that we
list "bad" things consistently, and so that 'inactive' is a separate
bucket of pgs than the 'unknown' ones.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:06:46 -04:00
Sage Weil
15f823b43e mgr/MgrStandby: reset subscriptions when we become non-active
This is a goofy workaround that we're also doing in Mgr::init().  Someday
we should come up with a more elegant solution.  In the meantime, this
works just fine!

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:54 -04:00
Sage Weil
6afca3beb2 mgr/ClusterState: make pg stat filtering less fragile
We want to drop updates for pgs for pools that don't exist.  Keep an
updated set of those pools instead of relying on the previous PGMap
having them instantiated.  (The previous map may drift due to bugs.)

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:54 -04:00
Sage Weil
dcc4c52ee8 mgr/DaemonServer: log pgmap usage to cluster log
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:54 -04:00
Sage Weil
d6d1db62ed mgr: apply PGMap incremental at same interval as reports
We were doing an incremental per osd stat report; this screws up the
delta stats updates when there are more than a handful of OSDs.  Instead,
do it with the same period as the mgr->mon reports.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:53 -04:00
Sage Weil
2e1f8fdb85 mon/PGMap: encode delta info in digest
It was already in PGMapDigest, but not encoded.

One field we didn't need; move that back to PGMap.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:53 -04:00
Sage Weil
3b049712ff mon/OSDMonitor: use newest creation epoch for pgs that we can
If we have a huge pool it may take a while for the PGs to get out of the
queue and be created.  If we use the epoch the pool was created it may
mean a lot of old OSDMaps the OSD has to process.  If we use the current
epoch (the first epoch in which any OSD learned that this PG should
exist) we limit PastIntervals as much as possible.

It is still possible that we start trying to create a PG but the cluster
is unhealthy for a long time, resulting in a long PastIntervals that
needs to be generated by a primary OSD when it eventually comes up. So
this only partially

Partially-fixes: http://tracker.ceph.com/issues/20050
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:53 -04:00
Sage Weil
50c617a447 mon/OSDMonitor: clean up no-beacon message
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:53 -04:00
Sage Weil
578cd59dd8 mgr: drop useless __func__ prints
This is part of the default prefix.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:53 -04:00
Sage Weil
85cc2595c7 osd: work around bluestore fragmetned buffers in get_map_bl
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:52 -04:00
Kefu Chai
46bf019cbe test: switch from xmlstartlet to jq
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Kefu Chai
62a478cb5f tools/ceph-monstore-update-crush.sh: switch from xmlstarlet to jq
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Kefu Chai
4c5064e7de debian,rpm: add jq dependency to ceph-test
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Sage Weil
b4e17fbfe5 mon: speed up pg creates a bit
I don't see any noticeable load on bigbang cluster, so let's bump this up
a bit.  Not being super aggressive here, though, since pool creation is so
rare and who really cares if ginormous clusters take a few minutes to
create all the PGs; better to make sure the mon is happy and responsive
during setup.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
f0eff9223a mon/PGMap: update osd_epoch in synchrony with osd_stat_updates
I'm not sure why this didn't bite us earlier, but there is an assert
in apply_incremental (not used in preluminous mon) and an implicit
dereference in PGMonitor::encode_pending (maybe didn't cause crash?)
that will trigger if we have an osd_stat_updates record without a matching
osd_epochs update.  Maybe there is some subtle reason why the osd_epochs
update happens elsewhere in master (it doesn't on the mgr), but my guess
is we were silently dereferencing the invalid iterator and not noticing.

Anyway, it's easy to fix.  We use the epoch from the previous PGMap.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Kefu Chai
1c8c3d7049 mon,mgr: update inc.osd_epochs when resetting osd_stat_updates
otherwise we could have follow bt in ceph-mgr:

 ceph version Development (no_version)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x137) [0x557c46f9d822]
 2: (PGMap::apply_incremental(CephContext*, PGMap::Incremental
const&)+0x72f) [0x557c46cb0e7b]
 3: (ClusterState::notify_osdmap(OSDMap const&)+0x15c) [0x557c46d7fb90]
 4: (()+0xf9f761) [0x557c46dc0761]
 5: (()+0xfa1592) [0x557c46dc2592]
 6: (Mgr::handle_osd_map()+0x82) [0x557c46dc0952]
 7: (Mgr::ms_dispatch(Message*)+0x37d) [0x557c46dc0df9]
 8: (MgrStandby::ms_dispatch(Message*)+0x237) [0x557c46daa12b]
 9: (Messenger::ms_deliver_dispatch(Message*)+0xbf) [0x557c47379703]
 10: (DispatchQueue::entry()+0x623) [0x557c47378821]
 11: (DispatchQueue::DispatchThread::entry()+0x1c) [0x557c470f0ca8]
 12: (Thread::entry_wrapper()+0xc1) [0x557c47225e79]
 13: (Thread::_entry_func(void*)+0x18) [0x557c47225dae]
 14: (()+0x7494) [0x7f2ef43a4494]
 15: (clone()+0x3f) [0x7f2ef321993f]

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
02fd29606c osd/osd_types: make 0 pg state 'unknown' instead of 'inactive'
The OSD never reports PGs this way; we'll only see it from a mgr
PGMap that hasn't filled in the state.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
0612b71627 mgr: simplify handling of new pgs/pools
Instantiate barebones pg records (creating+stale) in our PGMap when pgs
are created.  These will switch to 'creating' when the pgs is in the
process of creating, and peering etc.  The 'stale' is an indicator that
the mon may not have even asked the pg to create them yet.

All of the old meticulous tracking in PGMap for mappings for creating
pgs is useless to us; OSDMonitor has new code to handle it.  This is
fast and simple.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
057e4aaa84 vstart: debug mon on mgr
PGMap dout is tied to mon still; we want to see it on the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
818018a606 mon/PGMap: fix deleted_pool cleanup
- make sure num_pg_by_pool is cleared
- update_pool_deltas can repopulate pg_pool_sum; clear after that

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
681d37c055 mgr/ClusterState: apply latest osdmap to pgmap
In particular, clear out deleted pools and clear osd stats for
deleted/down/out osds.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
90d45c1dea mon/PGMap: new check_osd_map that takes a OSDMap& const
The previous version takes an Incremental and requires that we see
every single consecutive map in the history.  This version is mgr-friendly
and just takes the latest OSDMap.  It's a bit simpler too because it
ignores the full/nearfull (legacy preluminous) and last_osd_report.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
87a993a93f ceph-kvstore-tool: implement 'rm' and 'rm-prefix' commands
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
7394b96d1a mon/PGMap: use int32_t, not int
These are encoded! Be explicit.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
d019d13763 mon/PGMap: explicitly count pg per pool
This will more reliably remove empty pools from pg_pool_sum.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
73ddcc3743 mon/MgrStatMonitor: fix version across restarts
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
229e56c7bc mon/PGMap: cap health detail messages at 50 (configurable)
There are two cases where we spew health detail warnings for potentially
every pg.  Cap those detail messages at 50 and, if we exceed that, include
a message saying how many more there are.  This avoids huge lists of
detail messages going from the mgr to mon and also makes life better for
users of the health detail api.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
f0d417366f mon/MgrStatMonitor: trim mgrstat states
We don't actually need any of these older states at all so I hard-coded
a constant (oh no!).  In reality it doesn't matter what it is anyway
since PaxosService waits for paxos_service_trim_min (=250) to accumulate
before removing anything.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
7559f1c045 mon/MgrStatMonitor: wrap digest encoding in bufferlist
This is just for bigbang's benefit.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
acfebf27d1 mgr,mon: move 'osd {scrub,deep-scrub,repair}' handling to mgr
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
b01c0d36b6 mgr/DaemonServer: use registered osd session to send scrub messages
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
9fa1b382ef mgr/DaemonServer: keep registry of osd sessions
Occasionally we send them messages.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
c62e5dac91 mon: move parse_* helpers into cmdparse
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:47 -04:00
Sage Weil
39b7dcb0ea test/cephtool-test-mon: make ec test less sensitive to crush
With only 3 we can get into a situation where one slot is CRUSH_ITEM_NONE.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
b883d7a994 test/osd/osd-dup.sh: flush_pg_stats before "ceph -s"
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
30f0ae0496 qa/workunites/ceph-helpers.sh: move flush_pg_stats() here
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
11d1a4d3ef mon/PGMonitor: do not create/encode pending if luminous
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
7af652eb28 mon/PGMonitor: reset PGMonitor member vars when luminous
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Kefu Chai
3c12465f96 qa/workunits/cephtool/test.sh: use jq instead of awk to select the require element
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Kefu Chai
44708d8365 qa/workunits/cephtool/test.sh: use flush_pg_stats to sync mon with osd
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Kefu Chai
74752505bf mgr: add a command "mgr report"
* extract send_report() out of tick() so it can be reused.
* add a commmand "mgr report-mon" for mgr, so we are able to flush the
  the mgr stats to mon actively without waiting for the tick. this
  could help with the tests.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00