RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-02-23 02:57:21 +00:00

Author	SHA1	Message	Date
Casey Bodley	93de3367a7	rgw: beast frontend closes connections on stop the strategy for stop relies on the fact that process_request() is completely synchronous, so that io_context.stop() would still complete each request and clean up properly to tolerate an asynchronous process_request(), we instead need to drain all outstanding work on the io_context so that io_context.run() can return control natually to all of the worker threads. that would allow us to suspend our coroutine in the middle of process_request(), and still guarantee that process_request() will resume and run to completion before the worker threads exit each connected socket also counts as outstanding work, and needs to be closed in order to drain the io_context. each connection now adds itself to a connection list so that stop() can close its socket Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 13:11:36 -04:00
Casey Bodley	378b01064c	rgw: beast frontend uses async SharedMutex for pause the strategy for pause relied on stopping the io_context and waiting for io_context.run() to return control to all of the worker threads. this relies on the fact that process_request() is completely synchronous (so considered a single unit of work in the io_context) - otherwise, pause could complete in the middle of a call to process_request(), and destroy the RGWRados instance while it's still in use calling io_context.stop() to pause the worker threads also assumes that no other work will be scheduled on these threads to decouple pause from worker threads, handle_connection() now uses an async shared mutex to synchronize with pause/unpause Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 13:11:36 -04:00
Sage Weil	564212ce56	osd/PG: remove warn on delete+merge race This was there just to confirm that this path was exercised by the rados suite (it is, several hits per rados run of 1/666). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	4bc01379bb	osd: base project_pg_history on is_new_interval Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	cfe6ca82ed	osd: make project_pg_history handle concurrent osdmap publish The class's osdmap may be updated while we are in our loop. Pass it in explicitly instead. Fixes: http://tracker.ceph.com/issues/26970 Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	93b6283829	osd: handle pg delete vs merge race Deletion involves an awkward dance between the pg lock and shard locks, while the merge prep and tracking is "shard down". If the delete has finished its work we may find that a merge has since been prepped. Unwinding the merge tracking is nontrivial, especially because it might involved a second PG, possibly even a fabricated placeholder one. Instead, if we delete and find that a merge is coming, undo our deletion and let things play out in the future map epoch. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	ce53eb3329	osd/PG: do not purge strays in premerge state The point of premerge is to ensure that the constituent parts of the target PG are fully clean. If there is an intervening PG migration and one of the halves finishes migrating before the other, one half could get removed and the final merge could result in an incomplete PG. In the worst case, the two halves (let's call them A and B) could have started out together on say [0,1,2], A moves to [3,4,5] and gets deleted from [0,1,2], and then the final merge happens such that all copies of the PG are incomplete. We could construct a clever check that does allow removal of strays when the sibling PG is also ready to go, but it would be complicated. Do the simple thing. In reality, this would be an extremely hard case to hit because the premerge window is generally very short. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	5eba9ba074	doc/rados/operations/placement-groups: a few minor corrections Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	856a01fcfc	doc/man/8/ceph: drop enumeration of pg states This is more maintainable. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:42 -05:00
Sage Weil	5ff6bbf63d	doc/dev/placement-groups: drop old 'splitting' reference Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	62a208b423	osd: wait for laggy pgs without osd_lock in handle_osd_map We can't hold osd_lock while blocking because other objectstore completions need to take osd_lock (e.g., _committed_osd_maps), and those objectstore completions need to complete in order to finish_splits. Move the blocking to the top before we establish any local state in this stack frame since both the public and cluster dispatchers may race in handle_osd_map and we are dropping and retaking osd_lock. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	47d627736a	osd: drain peering wq in start_boot, not _committed_maps We can't safely block in _committed_osd_maps because we are being run by the store's finisher threads, and we may have to wait for a PG to split and then merge via that same queue and deadlock. Do not hold osd_lock while waiting as this can interfere with other objectstore completions that take osd_lock. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	3cd63b9387	osd: kick split children Ensure that we bring split children up to date to the latest map even in the absence of new OSDMaps feeding in NullEvts. This is important when the handle_osd_map (or boot) thread is blocked waiting for pgs to catch up, but we also need a newly-split child to catch up (perhaps so that it can merge). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	45a556ad7d	osd: no osd_lock for finish_splits This used to protect the pg registration probably? There is no need for it now. More importantly, having it here can cause a deadlock when we are holding osd_lock and blocking on wait_min_pg_epoch(), because a PG may need to finish splitting to advance and then merge with a peer. (The wait won't block on this PG since it isn't registered in the shard yet, but it will block on the merge peer.) Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	0a12da0e17	osd/osd_types: remove is_split assert The problem is: osd is at epoch 80 import pg 1.a as of e57 1.a and 1.1a merged in epoch 60something we set up a merge now, but in should_restart_peering via advance_pg we hit the is_split assert that the ps is < old_pg_num We can meaningfully return false (this is not a split) for a pg that is beyond pg_num. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	6bd682f53d	ceph-objectstore-tool: prevent import of pg that has since merged We currently import a portion of the PG if it has split. Merge is more complicated, though, mainly because COT is operating in a mode where it fast-forwards the PG to the latest OSDMap epoch, which means it has to implement any transformations to the PG (split/merge) independently. Avoid doing this for merge. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	44de03d5e6	qa/suites: test pg merging Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	0b59b7a688	qa/tasks/thrashosds: support merging pgs too Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	b960dae627	mon/OSDMonitor: mon_inject_pg_merge_bounce_probability Optionally bounce pg_num back up right after we decrease it. This triggers conditions in the OSD where the merge and split logic may conflict. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	968917bc54	doc/rados/operations/placement-groups: update to describe pg_num reductions too Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	513239274e	doc/rados/operations: remove reference to lpgs These were removed years ago. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	aca0dc1caa	osd: implement pg merge - Vevamps the split tracking infrastructure, and adds new tracking for upcoming merges in consume_map. These are now unified into the same identify_ method. these consume the new pg_num change tracking instructure we just added in the prior commit. - PGs that are about to merge have a new wait infrastructure, since all sources and the target have to reach the target epoch before the merge can happen. - If one of the sources for a merge does not exist, we create an empty dummy PG to merge with. The implies that the resulting merged PG will be incomplete (and mostly useless), but it unifies the code paths. - The actual merge (PG::merge_from) happens in advance_pg(). Fixes: http://tracker.ceph.com/issues/85 Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	24fd8d6573	osd/PG: implement merge_from This is the building block that smooshes multiple PGs back into one. The resulting combination PG will have no PG log. That means the sources need to be clean and quiesced or else the result will end up being marked incomplete. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	ae516076c2	osdc/Objecter: resend ops on pg merge This matches the split behavior. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:09:05 -05:00
Sage Weil	69873feb4a	osd: collect and record pg_num changes by pool This will simplify our identification of split and merge events. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	3bbcd28e8a	osd: make load_pgs remove message more accurate Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	ca8dfd35c8	osd/osd_types: pg_t: add is_merge_target() Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	e95556041b	osd/osd_types: pg_t::is_merge -> is_merge_source This only checks if a pg is a merge source, not whether it is a merge target. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	7f2e439641	osd/osd_types: adding or substracting invalid stats -> invalid stats Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	f6ecd69906	osd/PG: clear_ready_to_merge on_shutdown (or final merge source prep) Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	925163c7eb	osd: debug pending_creates_from_osd cleanup, don't use cbegin Got a segv on the erase line :/ Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	a7757085c0	ceph-objectstore-tool: debug intervals update Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	9e906733fe	mgr/ClusterState: discard pg updates for pgs >= pg_num Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	ba7f9af21c	mon/OSDMonitor: fix long line Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	9400a8f33c	mon/OSDMonitor: move pool created check into caller This makes for less confusing debug output. Speaking from experience. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	0d5dc37c9d	mon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed If the user asks to reduce pg_num, reduce pg_num_target too at the same time. Don't completely hide pgp_num yet (by increasing it when pg_num_target increases). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	1a08a41266	mon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs Configure how many initial PGs we create a pool with. If the user wants more than this then we do subsequent splits. Default to 1024, so that pool creation works in the usual way for most users, but does some splitting for very large pools/clusters. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	93928ff029	osd/OSDMap: set pg[p]_num_target in build_simple*() methods These are only used by unit tests and osdmaptool as far as I can tell. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	45d8d5dcf4	mon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values If the cluster is failing to converge on the target values that is a separate problem. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	aea329eb9a	mon/OSDMonitor: set CREATING flag for force-create-pg In order to recreate a lost PG, we need to set the CREATING flag for the pool. This prevents pg_num from changing in future OSDMap epochs until after the PG has successfully been instantiated. Note that a pg_num change in this epoch is fine; the recreated PG will instantiate in this epoch, which is /after/ the split a pg_num in this epoch would describe. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	fdfc5c64e8	mon/OSDMonitor: start sending new-style pg_create2 messages The new sharded wq implementation cannot handle a resent mon create message and a split child already existing. This a side effect of the new pg create path instantiating the PG at the pool create epoch osdmap and letting it roll forward through splits; the mon may be resending a create for a pg that was already created elsewhere and split elsewhere, such that one of those split children has peered back onto this same OSD. When we roll forward our re-created empty parent it may split and find the child already exists, crashing. This is no longer a concern because the mgr-based controller for pg_num will not split PGs until after the initial PGs are all created. (We know this because the pool has the CREATED flag set.) The old-style path had it's own problem http://tracker.ceph.com/issues/22165. We would build the history and instantiate the pg in the latest osdmap epoch, ignoring any split children that should have been created between teh pool create epoch and the current epoch. Since we're now taking the new path, that is no longer a problem. Fixes: http://tracker.ceph.com/issues/22165 Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	fffd50cee0	mon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes This will force pre-nautilus clients to resend ops when we are adjusting pg_num_pending. This is a big hammer: for nautilus+ clients, we only have an interval change for the affected PGs (the two PGs that are about to merge), whereas this compat hack will do an op resend for the whole pool. However, it is better than requiring all clients be upgraded to nautilus in order to do PG merges. Note that we already do the same thing for pre-luminous clients both for splits, so we've already inflicted similar pain the past (and, to my knowledge, have not seen any negative feedback or fallout from that). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	9426dc4c2b	osd: ignore pg creates when pool FLAG_CREATING is not set We only process mon-initiated PG creates while the pool is is CREATING mode. This ensures that we will not have any racing split or merge operations. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	101671d95a	mgr: do not adjust pg_num until FLAG_CREATING removed from pool This is more reliable than looking at PG states because the PG may have gone active and sent a notification to the mon (pg created!) and mgr (new state!) but the mon may not have persisted that information yet. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	f4f014937d	mon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	21458e88e4	mon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	7fc3a9bd07	mon/OSDMonitor: disallow pg_num changes while CREATING flag is set Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	41c38559db	mon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created Set the flag when the pool is created, and clear it when the initial set of PGs have been created by the mon. Move the update_creating_pgs() block so that we can process the pgid removal from the creating list and the pool flag removal in the same epoch; otherwise we might remove the pgid but have no cluster activity to roll over another osdmap epoch to allow the pool flag to be removed. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	0e526b467a	osd/osd_types: add pg_pool_t FLAG_POOL_CREATING Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	52e18ec08a	osd/osd_types: introduce last_force_resend_prenautilus Previously, we renamed the old last_force_resend to last_force_resend_preluminous and created a new last_force_resend for luminous+. This allowed us to force preluminous clients to resend ops (because they didn't understand the new pg split => new interval rule) without affecting luminous clients. Do the same rename again, adding a last_force_resend_prenautilus (luminous or mimic). Adjust the OSD code accordingly so it matches the behavior we'll see from a luminous client. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00

1 2 3 4 5 ...

90144 Commits