RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-29 06:52:35 +00:00

Author	SHA1	Message	Date
Sage Weil	925163c7eb	osd: debug pending_creates_from_osd cleanup, don't use cbegin Got a segv on the erase line :/ Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	a7757085c0	ceph-objectstore-tool: debug intervals update Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:41 -05:00
Sage Weil	9e906733fe	mgr/ClusterState: discard pg updates for pgs >= pg_num Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	ba7f9af21c	mon/OSDMonitor: fix long line Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	9400a8f33c	mon/OSDMonitor: move pool created check into caller This makes for less confusing debug output. Speaking from experience. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	0d5dc37c9d	mon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed If the user asks to reduce pg_num, reduce pg_num_target too at the same time. Don't completely hide pgp_num yet (by increasing it when pg_num_target increases). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	1a08a41266	mon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs Configure how many initial PGs we create a pool with. If the user wants more than this then we do subsequent splits. Default to 1024, so that pool creation works in the usual way for most users, but does some splitting for very large pools/clusters. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	93928ff029	osd/OSDMap: set pg[p]_num_target in build_simple*() methods These are only used by unit tests and osdmaptool as far as I can tell. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	45d8d5dcf4	mon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values If the cluster is failing to converge on the target values that is a separate problem. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	aea329eb9a	mon/OSDMonitor: set CREATING flag for force-create-pg In order to recreate a lost PG, we need to set the CREATING flag for the pool. This prevents pg_num from changing in future OSDMap epochs until after the PG has successfully been instantiated. Note that a pg_num change in this epoch is fine; the recreated PG will instantiate in this epoch, which is /after/ the split a pg_num in this epoch would describe. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	fdfc5c64e8	mon/OSDMonitor: start sending new-style pg_create2 messages The new sharded wq implementation cannot handle a resent mon create message and a split child already existing. This a side effect of the new pg create path instantiating the PG at the pool create epoch osdmap and letting it roll forward through splits; the mon may be resending a create for a pg that was already created elsewhere and split elsewhere, such that one of those split children has peered back onto this same OSD. When we roll forward our re-created empty parent it may split and find the child already exists, crashing. This is no longer a concern because the mgr-based controller for pg_num will not split PGs until after the initial PGs are all created. (We know this because the pool has the CREATED flag set.) The old-style path had it's own problem http://tracker.ceph.com/issues/22165. We would build the history and instantiate the pg in the latest osdmap epoch, ignoring any split children that should have been created between teh pool create epoch and the current epoch. Since we're now taking the new path, that is no longer a problem. Fixes: http://tracker.ceph.com/issues/22165 Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	fffd50cee0	mon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes This will force pre-nautilus clients to resend ops when we are adjusting pg_num_pending. This is a big hammer: for nautilus+ clients, we only have an interval change for the affected PGs (the two PGs that are about to merge), whereas this compat hack will do an op resend for the whole pool. However, it is better than requiring all clients be upgraded to nautilus in order to do PG merges. Note that we already do the same thing for pre-luminous clients both for splits, so we've already inflicted similar pain the past (and, to my knowledge, have not seen any negative feedback or fallout from that). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	9426dc4c2b	osd: ignore pg creates when pool FLAG_CREATING is not set We only process mon-initiated PG creates while the pool is is CREATING mode. This ensures that we will not have any racing split or merge operations. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	101671d95a	mgr: do not adjust pg_num until FLAG_CREATING removed from pool This is more reliable than looking at PG states because the PG may have gone active and sent a notification to the mon (pg created!) and mgr (new state!) but the mon may not have persisted that information yet. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	f4f014937d	mon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	21458e88e4	mon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	7fc3a9bd07	mon/OSDMonitor: disallow pg_num changes while CREATING flag is set Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	41c38559db	mon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created Set the flag when the pool is created, and clear it when the initial set of PGs have been created by the mon. Move the update_creating_pgs() block so that we can process the pgid removal from the creating list and the pool flag removal in the same epoch; otherwise we might remove the pgid but have no cluster activity to roll over another osdmap epoch to allow the pool flag to be removed. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	0e526b467a	osd/osd_types: add pg_pool_t FLAG_POOL_CREATING Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	52e18ec08a	osd/osd_types: introduce last_force_resend_prenautilus Previously, we renamed the old last_force_resend to last_force_resend_preluminous and created a new last_force_resend for luminous+. This allowed us to force preluminous clients to resend ops (because they didn't understand the new pg split => new interval rule) without affecting luminous clients. Do the same rename again, adding a last_force_resend_prenautilus (luminous or mimic). Adjust the OSD code accordingly so it matches the behavior we'll see from a luminous client. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	7074ad4a56	osd/PGLog: merge_from helper When merging two logs, we throw out all of the actual log entries. However, we need to convert them to dup ops as appropriate, and merge those together. Reuse the trim code to do this. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	27168cf502	osd: no cache agent or snap trimming during premerge The PG is quiesced; not background activity. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	bbf952125e	osd: notify mon when pending PGs are ready to merge When a PG is in the pending merge state it is >= pg_num_pending and < pg_num. When this happens quiesce IO, peer, wait for activate to commit, and then notify the mon that we are idle and safe to merge. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	a5274c75e2	mgr: add simple controller to adjust pg[p]_num_actual This is a pretty trivial controller. It adds some constraints that were obviously not there before when the user could set these values to anything they wanted, but does not implement all of the "nice" stepping that we'll eventually want. That can come later. Splits: - throttle pg_num increases, currently using the same config option (mon_osd_max_creating_pgs) that we used to throttle pg creation - do not increase pg_num until the initial pg creation has completed. Merges: - wait until the source and target pgs for merge are active and clean before doing a merge. Adjust pgp_num all at once for now. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	d48d7c9ce5	mon/OSDMonitor: MOSDPGReadyToMerge to complete a pg_num change This message allows pg_num to be decremented (once the final PGs are ready). Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:40 -05:00
Sage Weil	7f3d156ebf	mon/OSDMonitor: allow pg_num to adjusted up or down via pg[p]_num_target The CLI now sets the *_target values, imposing only the subset of constraints that the user needs to be concerned with. new "pg_num_actual" and "pgp_num_actual" properties/commands are added that allow the underlying raw values to be adjusted. For the merge case, this sets pg_num_pending instead of pg_num so that the OSDs can go through the merge prep process. A controller (in a future commit) will make pg[p]_num converge to pg[p]_num_target. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:08:39 -05:00
Sage Weil	17b270a04f	osd/osd_types: make pg merge an interval boundary Both the merge itself and the pending merge are interval transitions. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	0540492461	osd/osd_types: add pg_t::is_merge() method This checks if we are a merge source, and if so, who the parent (target) will be. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	71f4691909	osd/osd_types: add pg_num_pending to pg_pool_t Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	46ba9febab	osd: allow multiple threads to block on wait_min_pg_epoch Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	2177350b01	osd: restructure advance_pg() call mechanism Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	45ef31d84f	mon/PGMap: prune merged pgs Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	8a9b3f33f0	mon/PGMap: track pgs by state for each pool We had this globally, but it's useful to have the per-pool breakdowns. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	cba9dea7da	osd/SnapMapper: allow split_bits to decrease (merge) Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	d39337fdf8	os/bluestore: fix osr_drain before merge We need to make sure the deferred writes on the source collection finish before the merge so that ops ordered via the final target sequencer will occur after those writes. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	044ce83b1e	os/bluestore: allow reuse of osr from existing collection We try to attach an old osr at prepare_new_collection time, but that happens before a transaction is submitted, and we might have a transaction that removes and then recreates a collection. Move the logic to _osr_attach and extend it to include reusing an osr in use by a collection already in coll_map. Also adjust the _osr_register_zombie method to behave if the osr is already there, which can happen with a remove, create, remove+create transaction sequence. Fixes: https://tracker.ceph.com/issues/25180 Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	8a1100bf59	os/filestore: (re)implement merge Merging is a bit different then splitting, because the two collections may already be hashed at different levels. Since lookup etc rely on the idea that the object is always at the deepest level of hashing, if you merge collections with different levels that share some common bit prefix then some objects will end up higher up the hierarchy even though deeper hashed directories exist. Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	2465df57b7	os/filestore: add _merge_collections post-check Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	1a80ba0636	os: implement merge_collection Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
Sage Weil	ad3aab364b	os/ObjectStore: add merge_collection operation to Transaction Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 12:07:56 -05:00
myoungwon oh	da749d6336	common,rgw: rename sha1_digest_t rename existing sha1_digest_t to sha1_digest_array_t and add a new sha1_digest_t Signed-off-by: Myoungwon Oh <omwmw@sk.com>	2018-09-08 02:06:47 +09:00
Casey Bodley	eb65532378	rgw: data sync respects error_retry_time for backoff on error_repo don't restart processing the error_repo until error_retry_time. when data sync is otherwise idle, don't sleep past error_retry_time Fixes: http://tracker.ceph.com/issues/26938 Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 12:12:11 -04:00
Sage Weil	b2240bce41	doc/dev/msgr2: better formatting Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 11:01:33 -05:00
Sage Weil	b24e187b5a	doc/dev/msgr2: clarify padding alignment Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 11:01:24 -05:00
Casey Bodley	11cd4254ff	common: adding missing ceph::coarse_real_clock helpers Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 11:20:12 -04:00
Casey Bodley	233ee9cf29	rgw: data sync uses coarse clock for error_retry_time Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 11:20:12 -04:00
Casey Bodley	539c675db9	Merge pull request #23634 from cbodley/wip-21154 rgw: RGWRadosGetOmapKeysCR takes result by shared_ptr Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>	2018-09-07 11:05:20 -04:00
Sage Weil	4fc02a7f48	osd/OSDMap: include age in up and in counts for ceph status Signed-off-by: Sage Weil <sage@redhat.com>	2018-09-07 09:07:50 -05:00
Casey Bodley	a6e392f2b8	rgw: raise debug level on redundant data sync error messages each of these errors have already been logged at a lower level with a more detailed error message. by logging them as ERRORs at level 0 here, the messages could be easily confused as separate failures Fixes: http://tracker.ceph.com/issues/35830 Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 09:51:57 -04:00
Casey Bodley	fd77ff74ae	rgw: RGWRadosGetOmapKeysCR takes result by shared_ptr Fixes: http://tracker.ceph.com/issues/21154 Signed-off-by: Casey Bodley <cbodley@redhat.com>	2018-09-07 09:34:05 -04:00

... 7 8 9 10 11 ...

90435 Commits