Commit Graph

84652 Commits

Author SHA1 Message Date
Sage Weil
0766f5b40c osd: do not release recovery_ops_reserved on requeue
This doesn't make sense.. although it's the same behavior as
luminous.

The point of the releases here is that if we drop something that is in
the queue we drop the recovery_ops_reserved counter by that much.  However,
if something is in the queue and waiting, and we wake it back up, there
is no net change to _reserved... which is only decremented when we
actually dequeue something.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
987490db3d osd: debug recovery_ops_reserved
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
765e16e04e osd: move PG peering waiters into op wq
This resolves problems with a peering event being delivered triggering
advance_pg which triggers a requeue of waiting events that are requeued
*behind* the event we are processing.  It also reduces the number of
wait lists by one, yay!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
da47654c70 osd: store ec profile with final pool
We need this to reinstantiate semi-deleted ec backends.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
c5b3913919 osd/PG: ignore RecoveryDone in ReplicaActive too
This can be missed on a RepRecovering -> RepNotRecovering ->
RepWaitBackfillReserved transition.  Catch any straggler events in
ReplicaActive.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
68d89616cb osd/osd_types: include epoch_sent in pg_query_t operator<<
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
cea30e9e9e osd: restructure pg waiting more
Wait by epoch.  This is less kludgey than before!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
f1f0d30c35 osd: restructure pg waiting
Rethink the way we wait for PGs.  We need to order peering events relative to
each other; keep them in a separate queue in the pg_slot.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
c20251b949 osd: normal command uses slow dispatch (it can send messages)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
560956572e osd/OSD,PG: get_osdmap()->get_epoch() -> get_osdmap_epoch()
Avoid wrangling shared_ptr!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
065829dc11 osd: misc fixes
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
3a331c8be2 osd: kill disk_tp, recovery_gen_wq
Progress!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
e5c336851c osd: move recovery contexts to normal wq
We have a specific PGRecoveryContext type/event--even though we are just
calling a GenContext--so that we can distinguish the event type properly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
26938d54d4 osd: remove _ookup_lock_pg_with_map_lock_held()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
60ea5e87b6 osd: new MOSDScrub2 message with spg_t, fast dispatch
Send new message to mimic+ OSDs.  Fast dispatch it at the OSD.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
62f79cae1b osd/PG: request scrub via a state machine event
Continuing effort to make PG interactions event based.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
fe5e361467 osd: use peering events for forced recovery
The mgr code is updated to send spg_t's instead of pg_t's (and is slightly
refactored/cleaned).

The PG events are added to the Primary state, unless we're also in the
Clean substate, in which case they are ignored.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
bd8d198c07 osd/OSDMap: get_primary_shart() variant that returns primary *and* shard
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
ae210722b4 osd: prime pg_slots for to-be-split children
Once we know which PGs are about to be created, we instantiate their
pg_slot and mark them waiting_pg, which blocks all incoming events until
the split completes, the PG is installed, and we call wake_pg_waiters().

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
718b6baa3c osd: remove obsolete slow dispatch path for most messages
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
025ca7e1e3 osd: fast dispatch M[Mon]Command
These just get dumped onto a work queue.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
f325b7dbc7 osd: fast dispatch ping
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
6710d34872 mon/OSDMOnitor: send MOSDPGCreate2 to mimic+ osds
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
974896a819 osd: handle MOSDPGCreate2 messages (fast dispatch!)
Add a new MOSDPGCreate2 message that sends the spg_t (not just pg_t) and
includes only the info we need.  Fast dispatch it.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
b4fcd6d702 osd/OSDMapMapping: a getter that returns a spg_t
Note whether a pool is erasure so that we can generate an appropriate
spg_t for a mapping.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
39945d299e osd: send pg creations through normal pg queue
Queue a null event tagged with create_info, elimiating the special
legacy path.

These are still not fast dispatch because we need an spg (not pg) to queue
and event, and we need a current osdmap in order to calculate that.  That
isn't possible/a good idea in fast dispatch.  In a subsequent patch we'll
create a new pg create message that includes the correct information and
can be fast dispatched, allowing this path to die off post-nautilus.

Also, improve things so that we ack the pg creation only after the PG has
gone active, meaning it is fully replicated (by at least min_size PGs).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
294bc5d631 osd: fix max pg check for peer events
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
9ab2400109 osd: use atomic for pg_map_size
This avoids the need for pg_map_lock in the max pg check.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:47 -05:00
Sage Weil
828060749a osd/PGPeeringEvent: note mon- vs peer-initiated pg creates
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:47 -05:00
Sage Weil
9dc71e653a osd: fast dispatch peering events (part 2)
This actually puts the remaining peering events into fast dispatch.  The
only remaining event is the pg create from the mon.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:47 -05:00
Sage Weil
2284e133af osd: fast dispatch peering events (part 1)
This is a big commit that lays out the infrastructure changes to fast
dispatch the remaining peering events.  It's hard to separate it all out
so this probably doesn't quite build; it's just easier to review as a
separate patch.

- lock ordering for pg_map has changed:
  before:
    OSD::pg_map_lock
      PG::lock
        ShardData::lock

  after:
    PG::lock
      ShardData::lock
        OSD::pg_map_lock

- queue items are now annotated with whether they can proceed without a
pg at all (e.g., query) or can instantiate a pg (e.g., notify log etc).

- There is some wonkiness around getting the initial Initialize event to
a newly-created PG.  I don't love it but it gets the job done for now.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
ac142c3cc0 osd: queue null events without PG lock
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
f9aea5da93 osd: move part of wake_pg_waiters into helper
We'll need this shortly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
8e8c7cce1f osd: use MTrim peering event for trimming
This is simpler and cleaner than handling log trimming as a special case.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
cf5cd222ce osd: fast dispatch backfill and recovery reservation events
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
25da186ab8 osd: move M{Backfill,Recovery}Reserve event logic into message
Better encapsulation!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
3b904547fb messages/MOSDPeeringOp: add
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
aea80d9afb osd/PG: move peering event type out of PG class
We will create these directly from peering Messages shortly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
beb8dd5b1a osd/PG: keep epoch, not map ref, of last osdmap for lsat persisted epoch
No need to pin the map in memory!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
a6fef5a61b osd/PG: remove old update_store_on_load()
This isn't needed post-luminous.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
643253c326 Merge tag 'v13.0.2'
v13.0.2
2018-04-03 10:08:22 -05:00
Patrick Donnelly
0186795496
Merge PR #21180 into master
* refs/pull/21180/head:
	vstart_runner: examine check_status before error

Reviewed-by: John Spray <john.spray@redhat.com>
2018-04-03 06:51:18 -07:00
Jason Dillaman
3b08c0609c Merge pull request #20460 from colletj/v1_image_creation_disallow
librbd: disallow creation of v1 image format

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-03 09:18:39 -04:00
Jason Dillaman
98bae81f17
Merge pull request #21202 from tchaikov/wip-rbd-replay
rbd-replay: remove boost dependency

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-03 09:14:39 -04:00
Jason Dillaman
ccb2646a03
Merge pull request #21142 from dragonylffly/wip-fix-ebusy
rbd-nbd: fix ebusy when do map

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-03 07:39:15 -04:00
Jason Dillaman
de03571aa7
Merge pull request #21157 from trociny/wip-23526
journal: limit number of appends sent in one librados op 

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-03 07:38:40 -04:00
Kefu Chai
440c597df3 rbd-replay: remove boost dependency
quite a few facilies are now available in standard library now after we
switched to C++17.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-04-03 16:30:47 +08:00
Patrick Donnelly
8b7892f6c9
Merge PR #20855 into master
* refs/pull/20855/head:
	client: add the fuse parameter max_write

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-02 21:15:20 -07:00
Patrick Donnelly
9d8037f8da
Merge PR #21096 into master
* refs/pull/21096/head:
	pybind/cephfs: added comments to cephfs.pyx

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-02 21:10:56 -07:00
Yuri Weinstein
9b7b5a7673
Merge pull request #21183 from neha-ojha/wip-minor-fix-perf-suite
qa/suites/perf-basic: add desc regarding test machines

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
2018-04-02 13:56:08 -07:00