Commit Graph

73293 Commits

Author SHA1 Message Date
Kefu Chai
46bf019cbe test: switch from xmlstartlet to jq
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Kefu Chai
62a478cb5f tools/ceph-monstore-update-crush.sh: switch from xmlstarlet to jq
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Kefu Chai
4c5064e7de debian,rpm: add jq dependency to ceph-test
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Sage Weil
b4e17fbfe5 mon: speed up pg creates a bit
I don't see any noticeable load on bigbang cluster, so let's bump this up
a bit.  Not being super aggressive here, though, since pool creation is so
rare and who really cares if ginormous clusters take a few minutes to
create all the PGs; better to make sure the mon is happy and responsive
during setup.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
f0eff9223a mon/PGMap: update osd_epoch in synchrony with osd_stat_updates
I'm not sure why this didn't bite us earlier, but there is an assert
in apply_incremental (not used in preluminous mon) and an implicit
dereference in PGMonitor::encode_pending (maybe didn't cause crash?)
that will trigger if we have an osd_stat_updates record without a matching
osd_epochs update.  Maybe there is some subtle reason why the osd_epochs
update happens elsewhere in master (it doesn't on the mgr), but my guess
is we were silently dereferencing the invalid iterator and not noticing.

Anyway, it's easy to fix.  We use the epoch from the previous PGMap.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Kefu Chai
1c8c3d7049 mon,mgr: update inc.osd_epochs when resetting osd_stat_updates
otherwise we could have follow bt in ceph-mgr:

 ceph version Development (no_version)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x137) [0x557c46f9d822]
 2: (PGMap::apply_incremental(CephContext*, PGMap::Incremental
const&)+0x72f) [0x557c46cb0e7b]
 3: (ClusterState::notify_osdmap(OSDMap const&)+0x15c) [0x557c46d7fb90]
 4: (()+0xf9f761) [0x557c46dc0761]
 5: (()+0xfa1592) [0x557c46dc2592]
 6: (Mgr::handle_osd_map()+0x82) [0x557c46dc0952]
 7: (Mgr::ms_dispatch(Message*)+0x37d) [0x557c46dc0df9]
 8: (MgrStandby::ms_dispatch(Message*)+0x237) [0x557c46daa12b]
 9: (Messenger::ms_deliver_dispatch(Message*)+0xbf) [0x557c47379703]
 10: (DispatchQueue::entry()+0x623) [0x557c47378821]
 11: (DispatchQueue::DispatchThread::entry()+0x1c) [0x557c470f0ca8]
 12: (Thread::entry_wrapper()+0xc1) [0x557c47225e79]
 13: (Thread::_entry_func(void*)+0x18) [0x557c47225dae]
 14: (()+0x7494) [0x7f2ef43a4494]
 15: (clone()+0x3f) [0x7f2ef321993f]

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
02fd29606c osd/osd_types: make 0 pg state 'unknown' instead of 'inactive'
The OSD never reports PGs this way; we'll only see it from a mgr
PGMap that hasn't filled in the state.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
0612b71627 mgr: simplify handling of new pgs/pools
Instantiate barebones pg records (creating+stale) in our PGMap when pgs
are created.  These will switch to 'creating' when the pgs is in the
process of creating, and peering etc.  The 'stale' is an indicator that
the mon may not have even asked the pg to create them yet.

All of the old meticulous tracking in PGMap for mappings for creating
pgs is useless to us; OSDMonitor has new code to handle it.  This is
fast and simple.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:51 -04:00
Sage Weil
057e4aaa84 vstart: debug mon on mgr
PGMap dout is tied to mon still; we want to see it on the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
818018a606 mon/PGMap: fix deleted_pool cleanup
- make sure num_pg_by_pool is cleared
- update_pool_deltas can repopulate pg_pool_sum; clear after that

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
681d37c055 mgr/ClusterState: apply latest osdmap to pgmap
In particular, clear out deleted pools and clear osd stats for
deleted/down/out osds.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
90d45c1dea mon/PGMap: new check_osd_map that takes a OSDMap& const
The previous version takes an Incremental and requires that we see
every single consecutive map in the history.  This version is mgr-friendly
and just takes the latest OSDMap.  It's a bit simpler too because it
ignores the full/nearfull (legacy preluminous) and last_osd_report.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
87a993a93f ceph-kvstore-tool: implement 'rm' and 'rm-prefix' commands
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:50 -04:00
Sage Weil
7394b96d1a mon/PGMap: use int32_t, not int
These are encoded! Be explicit.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
d019d13763 mon/PGMap: explicitly count pg per pool
This will more reliably remove empty pools from pg_pool_sum.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
73ddcc3743 mon/MgrStatMonitor: fix version across restarts
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
229e56c7bc mon/PGMap: cap health detail messages at 50 (configurable)
There are two cases where we spew health detail warnings for potentially
every pg.  Cap those detail messages at 50 and, if we exceed that, include
a message saying how many more there are.  This avoids huge lists of
detail messages going from the mgr to mon and also makes life better for
users of the health detail api.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:49 -04:00
Sage Weil
f0d417366f mon/MgrStatMonitor: trim mgrstat states
We don't actually need any of these older states at all so I hard-coded
a constant (oh no!).  In reality it doesn't matter what it is anyway
since PaxosService waits for paxos_service_trim_min (=250) to accumulate
before removing anything.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
7559f1c045 mon/MgrStatMonitor: wrap digest encoding in bufferlist
This is just for bigbang's benefit.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
acfebf27d1 mgr,mon: move 'osd {scrub,deep-scrub,repair}' handling to mgr
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
b01c0d36b6 mgr/DaemonServer: use registered osd session to send scrub messages
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
9fa1b382ef mgr/DaemonServer: keep registry of osd sessions
Occasionally we send them messages.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:48 -04:00
Sage Weil
c62e5dac91 mon: move parse_* helpers into cmdparse
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:47 -04:00
Sage Weil
39b7dcb0ea test/cephtool-test-mon: make ec test less sensitive to crush
With only 3 we can get into a situation where one slot is CRUSH_ITEM_NONE.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
b883d7a994 test/osd/osd-dup.sh: flush_pg_stats before "ceph -s"
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
30f0ae0496 qa/workunites/ceph-helpers.sh: move flush_pg_stats() here
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
11d1a4d3ef mon/PGMonitor: do not create/encode pending if luminous
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
7af652eb28 mon/PGMonitor: reset PGMonitor member vars when luminous
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Kefu Chai
3c12465f96 qa/workunits/cephtool/test.sh: use jq instead of awk to select the require element
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Kefu Chai
44708d8365 qa/workunits/cephtool/test.sh: use flush_pg_stats to sync mon with osd
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Kefu Chai
74752505bf mgr: add a command "mgr report"
* extract send_report() out of tick() so it can be reused.
* add a commmand "mgr report-mon" for mgr, so we are able to flush the
  the mgr stats to mon actively without waiting for the tick. this
  could help with the tests.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:46 -04:00
Sage Weil
cd00aae1c3 qa/workunits/cephtool/test.sh: fix flush_pg_stats usage
Use a helper.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Sage Weil
ab1b78ae00 qa/tasks: use new reliable flush_pg_stats helper
The helper gets a sequence number from the osd (or osds), and then
polls the mon until that seq is reflected there.

This is overkill in some cases, since many tests only require that the
stats be reflected on the mgr (not the mon), but waiting for it to also
reach the mon is sufficient!

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Sage Weil
aac073dc2a mon: add 'osd last-stat-seq <osd>' command
Return the latest seq for the osd reflected in the mon's digest stats.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Sage Weil
fa70d0e81c mon/PGMap: coalesce last osd stat seq in the PGMapDigest
This is, strictly speaking, redundant, since the osd_stat is also in the
digest, but we plan to remove that.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Sage Weil
85b17ba18b osd: report a seq from flush_pg_stats command
Report a sequence number when we flush_pg_stats.  Combine the up_from and
a per-boot seq number to get a monotonically increasing value across OSD
restarts (we assume less than 4 billion stats reports in a single epoch).

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Sage Weil
38ddb686b6 osd: include up_from, seq in osd_stat_t
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:44 -04:00
Sage Weil
3c96f2f7ce mon/PGMonitor: clear PGMap data when require_luminous is set
Once the OSDMap flag is set there is no going back. Zero out the on-disk
PGMap data, and clear the in-memory PGMap to free up memory and make
bugs easier to spot.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:44 -04:00
Sage Weil
1e3f1fcc82 kv/RocksDBStore: make rmkeys_by_prefix efficient
This matches what rm_range_keys does.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:44 -04:00
Sage Weil
1e9c95ac97 mon/OSDMonitor: limit number of concurrently creating pgs
There is overhead for PGs we are creating because the mon has to track
which OSD each one current maps to.  This can be problematic on a very
large cluster.  Limit the overhead by setting a cap on the number of PGs
we are creating at once; leave the rest in a persistent queue.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:44 -04:00
Sage Weil
960a8d6f6d mon/MgrStatMonitor: fix digest vs pending digest
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:43 -04:00
Sage Weil
5a700a4699 mon/OSDMonitor: fix bad update_pending_pgs call
We are not persisiting the updated creating_pgs here; this is bad!  I'm
not sure why it was there to begin with?

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:43 -04:00
Sage Weil
58bf4a88fa mon/PGMonitor: disable when REQUIRE_LUMINOUS is set
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:43 -04:00
Kefu Chai
62d1960cb9 test: pass mon_pg_warn_min_per_osd=3 to mgr also
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:43 -04:00
Kefu Chai
3109da269f mon: move dump_info() to PGStatService
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:43 -04:00
Kefu Chai
063ad5aa2d mon: more constness to PGStatService
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:42 -04:00
Kefu Chai
53c96f2260 mon/OSDMonitor: do not reference pgservice if quorum not formed
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:42 -04:00
Sage Weil
a20c5e3cfe test/cephtool-test-*: enable mgr
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:42 -04:00
Sage Weil
a2fdede8ff mon/PGMap: add to mempool
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:42 -04:00
Sage Weil
66bf355811 mon/PGMap: use auto
I'm not rewriting this to use range iterator syntax because I'm in a
hurry. This just lets me change the types without touching all this code
again.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:39 -04:00