Commit Graph

347 Commits

Author SHA1 Message Date
Xie Xingguo
f28f35f79b
Merge pull request #30591 from xiexingguo/wip-balancer-throttler
mgr/balancer: upmap_max_iterations -> upmap_max_optimizations; behave as it is per pool

Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-01 07:42:25 +08:00
xie xingguo
d7ea56f3df mgr/balancer: upmap_max_iterations -> upmap_max_optimizations
With osd_calc_pg_upmaps_aggressively on we might have to iterate
hundreds or thousands of times to figure out an optimization,
but upmap_max_optimizations will remain a hard limit for the
total optimizations that can be returned.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-09-28 08:31:45 +08:00
Sage Weil
dff5697464 Merge PR #30525 into master
* refs/pull/30525/head:
	qa/tasks/ceph.conf.template: disable power-of-2 warning
	qa/standalone/mon/health-mute: use power of 2 for pg_num
	osd/OSDMap: remove remaining g_conf() usage
	PendingReleaseNotes: add note for 14.2.5 so we can backport this
	osd/OSDMap: health alert for non-power-of-two pg_num

Reviewed-by: Kai Wagner <kwagner@suse.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-09-27 12:08:37 -05:00
Sage Weil
f1f8e9a520 PendingReleaseNotes: add note for 14.2.5 so we can backport this
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-24 09:26:45 -05:00
Sage Weil
6e46b1c0e5 osd/OSDMap: health alert for non-power-of-two pg_num
Fixes: https://tracker.ceph.com/issues/41647
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-24 09:26:33 -05:00
Patrick Donnelly
30909f5a6a
Merge PR #29824 into master
* refs/pull/29824/head:
	qa: whitelist new FS_INLINE_DATA_DEPRECATED health warning
	mds: add a HEALTH_WARN message when inline_data is enabled
	mds: log a warning message when mds is started on an fs with inline_data
	mon: deprecate CephFS inline_data support

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Douglas Fuller <dfuller@redhat.com>
2019-09-24 04:32:28 -07:00
Sage Weil
e1569a18c3 common: default pg_autoscale_mode=on for new pools
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-22 16:58:33 -05:00
Jeff Layton
9c406d0ab3 mon: deprecate CephFS inline_data support
The plan is to start deprecating this feature now so that we can remove
it in a future release. Change it to require the
--yes-i-really-really-mean-it flag, and to emit a custom
warning when that isn't specified.

For now, we leave the testing in place since we do want to be notified
if something breaks before we're ready to rip it out completely.

Fixes: https://tracker.ceph.com/issues/41311
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2019-09-19 09:15:13 -04:00
David Zafman
dd2782c15a Revert "common: default pg_autoscale_mode=on for new pools"
This reverts commit 91e4fc24e7.

Fixes: https://tracker.ceph.com/issues/41900

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-17 14:51:08 -07:00
Kefu Chai
7c1b53dc49
Merge pull request #30350 from liewegas/wip-bluestore-poolstats-configs
os/bluestore: simplify per-pool-stat config options

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2019-09-16 19:37:27 +08:00
Sage Weil
7e96024254 os/bluestore: simplify per-pool-stat config options
The previous bluestore_no_per_pool_stats_tolerance had a lot of possible
values, not all of which make sense to users.  Replace with a single new
option, bluestore_fsck_error_on_no_per_pool_stats, which controls whether
the lack of per-pool stats is an error or a warning.  On repair, we will
unconditionally convert to per-pool stats.

This brings us in sync with the newer
bluestore_fsck_error_on_no_per_pool_omap.

Note that one part of the ceph_test_objectstore test is dropped since it
is no longer possible to create a store with legacy stats.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-12 13:25:46 -05:00
Sage Weil
91e4fc24e7 common: default pg_autoscale_mode=on for new pools
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-12 10:42:36 -05:00
David Zafman
b3e1c58b0e osd: Replace active/pending scrub tracking for local/remote
This is similar to how recovery reservations are split between
local and remote.

It was the case that scrubs_pending was used for reservations at
the replicas as well as at the primary while requesting reservations
from the replicas.  There was no need for scrubs_pending to turn
into scrubs_active at the primary as nothing treated that value
as special.  scrubber.active = true when scrubbing is
actually going.

Now scurbber.local_reserved indicates scrubs_local incremented
Now scrubber.remote_reserved indicates scrubs_remote incremented

Fixes: https://tracker.ceph.com/issues/41669

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:33:27 -07:00
David Zafman
8dffd68365 osd: Add dump_scrub_reservations to show scrub reserve tracking
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:32:29 -07:00
David Zafman
b98950e707 osd: Rename dump_reservations to dump_recovery_reservations
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:32:29 -07:00
Yan Jun
aed94cf203 PendingReleaseNotes: note about changes in ceph osd erasure-code-profile set
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2019-09-09 20:02:08 +08:00
David Zafman
5f83a6158b osd doc mon mgr: To milliseconds for config value, user input and threshold out
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-04 17:13:32 +00:00
David Zafman
f4a0be2e87 doc: Add documentation and release notes
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-26 15:25:34 +00:00
Casey Bodley
f0575a7144
Merge pull request #26787 from soumyakoduri/bucket_name_validation
[rgw]:Validate bucket names as per revised s3 spec

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2019-08-16 10:53:54 -04:00
Soumya Koduri
eb6eddbe8d Validate bucket names as per revised s3 spec
As per amazon s3 spec -
https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html

* The s3 bucket names should not contain upper case letters or underscore.
* Name cannot end with dash or have consecutive periods, or dashes adjacent
  to periods.
* Each label in the bucket name must start and end with a lowercase
  letter or a number.
* Name cannot exceed 63 characters.

This change is to enforce these rules if rgw_relaxed_s3_bucket_names is set to
'false' which is by default.

Fixes: https://tracker.ceph.com/issues/36293

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2019-08-08 16:54:12 +05:30
Sage Weil
3e7c185bd4 mon: make mon summary more concise in 'ceph -s'
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-05 22:02:31 -05:00
Sage Weil
9e0916160d mon/MgrMap: make print_summary (used by 'ceph -s') more concise
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-05 09:20:12 -05:00
Joao Eduardo Luis
ff080391c1 PendingReleaseNotes: add ceph osd info
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2019-07-15 17:13:37 +00:00
Sage Weil
35c0d75888 osd: add hdd and ssd variants for osd_recovery_max_active
Semi-arbitrarily set the SSD max to 10 (instead of 3).  This should be
tuned based on some real data.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-06-20 16:24:51 -05:00
Sage Weil
1f73631e75 PendingReleaseNotes: note about change to ISO 8601 throughout
Signed-off-by: Sage Weil <sage@redhat.com>
2019-05-29 14:12:15 -05:00
Dmitriy Rabotjagov
c336882cf6 mgr/zabbix: Fix raw_bytes_used key name
This patch fixes raw_bytes_used key which was renamed to stored_raw.
Also added key percent_used and fixed zabbix template to be fully
compatible with zabbix 3.0

Fixes: https://tracker.ceph.com/issues/39644
Signed-off-by: Dmitriy Rabotjagov <noonedeadpunk@ya.ru>
2019-05-28 13:10:49 +03:00
Wido den Hollander
0ec7bc491d
mgr/zabbix: Fix typo in key name for PGs in backfill_wait state
Fixes: http://tracker.ceph.com/issues/39666

Signed-off-by: Wido den Hollander <wido@42on.com>
2019-05-14 07:45:41 +02:00
J. Eric Ivancich
ce1df3ceef
Merge pull request #27870 from theanalyst/rgw-objexp-fixes-cli
rgw: object expirer fixes

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2019-05-11 09:10:54 -04:00
Abhishek Lekshmanan
ef8329f203 doc: add a note on reshard object expirer fixes in PendingReleaseNotes
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
2019-05-09 20:51:50 +02:00
Sage Weil
3ec66c2f74 PendingReleaseNotes: 14.2.1 note on crush required version
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-17 08:51:29 -05:00
Jason Dillaman
136848d27f librbd: enable the simple IO scheduler by default
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2019-04-11 12:47:00 -04:00
Jason Dillaman
ec99eeeb41 librbd: switch to write-around cache policy by default
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2019-04-11 12:47:00 -04:00
Kefu Chai
77c5ee0631 PendingReleaseNotes: note on python3.6 changes
Fixes: http://tracker.ceph.com/issues/39164
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-10 18:11:00 +08:00
Adam C. Emerson
55180511e5 rgw: Remove rgw_num_rados_handles option
This has been deprecated for some time and underlies much of the
complexity of the RADOS service.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2019-03-21 15:13:56 -04:00
Sage Weil
fdf75b2d22 doc/releases/nautilus: draft notes
Signed-off-by: Sage Weil <sage@redhat.com>
2019-02-25 08:39:23 -06:00
Patrick Donnelly
c0b3a11484
mds: simplify recall warnings
Instead of a timeout and complicated decisions about whether the client is
releasing caps in an expeditious fashion, just use a DecayCounter that tracks
the number of caps we've recalled. This counter is decremented whenever the
client releases caps. If the counter passes a threshold, then we raise the
warning.

Similar reworking is done for the steady-state recall of client caps. Another
release DecayCounter is added so we can tell when the client is not releasing
any more caps.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-31 12:07:54 -08:00
Patrick Donnelly
48ca097a9f
mds: limit maximum number of caps held by session
This is to prevent unsustainable situations where a client has so many
outstanding caps that a linear traversal/operation on the session's caps takes
unacceptable amounts of time.

Fixes: http://tracker.ceph.com/issues/38022
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-29 15:16:31 -08:00
Patrick Donnelly
ef46216d8d
mds: recall caps incrementally
As with trimming, use DecayCounters to throttle the number of caps we recall,
both globally and per-session.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-29 15:16:30 -08:00
Patrick Donnelly
7bf2f31abf
mds: add throttle for trimming MDCache
This is necessary when the MDS cache size decreases by a significant amount.
For example, when stopping a large MDS or when the operator makes a large cache
size reduction.

Fixes: http://tracker.ceph.com/issues/37723

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-29 15:16:30 -08:00
David Zafman
3e6ff119e2
Merge pull request #25112 from dzafman/wip-scrub-warning
scrub warning check incorrectly uses mon scrub interval

Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-01-28 10:46:18 -08:00
David Zafman
6a9895b97a mon: Fix scrub health warning handling and change config to a ratio
Make this mon_warn code clearer since it involves 2 values
Code used mon scrub interval instead of pg scrub interval
Rename config values to include _pg_ and ratio to make it more clear
Fix scrub warniing handling use per-pool intervals when specified

Fixes: http://tracker.ceph.com/issues/37264

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-01-23 16:49:33 -08:00
Patrick Donnelly
7fa1e3c37f
mds: remove cache drop asok command
`cache drop` is a long running command that will block the asok interface
(while the tell version does not). Attempting to abort the command with ^C or
equivalents will simply cause the `ceph` command to exit but won't stop the
asok command handler from waiting for the cache drop operation to complete.

Instead, just allow the tell version.

Fixes: http://tracker.ceph.com/issues/38020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-23 06:48:58 -08:00
Patrick Donnelly
4c49f165ec
MDSMonitor: add fs fail command
This command sets the fs as not joinable and fails all ranks. This is a simpler
command than the typical sequence: (a) set fs not joinable; (b) iterate through
and fail ranks. It also does this in a single FSMap update.

Fixes: http://tracker.ceph.com/issues/37085

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-15 14:22:48 -08:00
Yan, Zheng
b593e5a881 tools/cephfs: make 'cephfs-data-scan scan_links' update snaptable
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2019-01-07 16:50:04 +08:00
Yan, Zheng
01089652d3 tools/cephfs: make 'cephfs-data-scan scan_links' update inotable
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2019-01-07 16:49:31 +08:00
Sage Weil
251f667ef8 Merge PR #25009 into master
* refs/pull/25009/head:
	librbd: stringify locker name with get_legacy_str()
	osdc/Objecter: fix list_watchers addr rendering to match legacy
	test/crimson: disable unittest_seastar_messenger test
	msg/msg_types: encode entity_addr_t TYPE_ANY as TYPE_LEGACY for pre-nautilus
	client: make blacklist detection handle TYPE_ANY entries
	mon/OSDMonitor: maintain compat output for 'blacklist ls'
	client: maintain compat for {inst,addr}_str in status dump
	qa/tasks/ceph_manager: compare osd flush seq #'s as ints
	qa/suites/fs: make use of simple.yaml where appropriate
	qa/msgr: move msgr factet into generic re-usable dir
	crimson: fix monmap build for seastar
	doc/start/ceph.conf: trim the sample ceph.conf file
	doc/rados/operations: only describe --public-{addr,network} method for adding mons
	PendingReleaseNotes: deprecate 'mon addr'
	doc: fix some 'mon addr' references
	doc/rados/configuration: fix some 'mon addr' references
	doc/rados/configuration/network-config-ref: revise network docs somewhat
	doc/rados/configuration/network-config-ref: remove totally obsolete section
	qa/suites/rados: replace mon_seesaw.py task with a small bash script
	qa/suites/fs/upgrade: don't bind to v2 addrs
	qa/tasks/mon_thrash: avoid 'mon addr' in mon section
	mon/MonClient: disable ms_bind_msgr2 if NAUTILUS feature not set
	osd/OSDMap: maintain compat addr fields
	msg/msg_types: add get_legacy_str()
	mds/MDSMap.h: maintain compat addr field
	mon/MgrMap: maintain compat active_addr field
	mon/MonClient: reconnect to mon if it's addrvec appears to have changed
	qa/tasks/ceph.conf.template: increase mon_mgr_mkfs_grace
	msg/async/ProtocolV2: fill in IP for all peer_addrs
	msg/async: print all addrs on debug lines
	mon/MonMap: no noname- mon name prefix when for_mkfs
	ceph-monstore-tool: print initial monmap
	msg/async/ProtocolV2: advertise ourselves as a v2 addr when using v2 protocol
	msg/async: assert existing protocol matches current protocol
	msg/async: add missing modelines
	mon/MonMap: add missing modeline
	vstart.sh: put mon addrs in mon_host, not 'mon addr'
	msg/async: better debug around conn map lookups and updates
	mon/MonClient: dump initial monmap at debug level 10
	qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
	qa/tasks/ceph: set initial monmap features with using addrvec addrs
	monmaptool: add --enable-all-features option
	qa/tasks/ceph: only use monmaptool --addv if addr has [,:v]
	qa/tasks/ceph_manager: make get_mon_status use mon addr
	qa/tasks/ceph: keep mon addrs in ctx namespace
	mon/OSDMonitor: log all osd addrs on boot
	msg/simple: behave when v2 and v1 addrs are present at target
	mon/MonClient: warn if global_id changes
	msg/Connection: add warning/note on get_peer_global_id
	mds/MDSDaemon: clean up handle_mds_map debug output a bit
	qa/suites/rados/upgrade: debug mds
	mds/MDSRank: improve is_stale_message to handle addrvecs
	msg/async: make loopback detect when sending to one of our many addrs
	qa/suites/rados/upgrade: no aggressive pg num changes
	mon/OSDMonitor: require nautilus mons for require_osd_release=nautilus
	mon/OSDMonitor: require mimic mons for require_osd_release=mimic
	qa/suites/rados/thrash-old-clients: use legacy addr syntax in ceph.conf
	msg/async: preserve peer features when replacing a connection
	qa/tasks/ceph.py: move methods from teuthology.git into ceph.py directly; support mon bind * options
	mon/MonMap: adjust build_initial behavior for mkfs vs probe
	mon/MonMap: improve ambiguous addr behavior
	qa/suites/rados/upgrade: spread mons a bit
	qa/rados/thrash-old-clients: keep mons on separate hosts
	qa/standalone/mon/misc.sh: tweak test to be more robust
	qa/tasks/mon_seesaw: expect v1/v2 prefix in addr
	osd/OSDMap: fix is_blacklisted() check to assume type ANY
	mon/OSDMonitor: use ANY addr type for blacklisting
	mon/msg_types: TYPE_V1ORV2 -> TYPE_ANY
	qa/workunits/cephtool: fix blacklist test
	qa/suites/upgrade: install old version with only v1 addrs
	common/options: by default, bind to both msgr v1 and v2 addresses
	vstart.sh: add --msgr1, --msgr2, --msgr21 options
	msg/async/ProtocolV2: be flexible with server identity check
	msg/msg_types: fix entity_addrvec_t::parse() with null end arg
	qa/suites/rados/basic/msgr: no msgr2 addrs in initial monmaps
	qa/tasks/ceph: add 'mon_bind_addrvec' and 'mon_bind_msgr2' options
	monmaptool: add --addv argument to pass in addrvec directly
	qa/suites/rados/basic/msgr: do not use msgr2 with simplemessenger
	qa/suites/rados/basic/msgr: async is not experimental
	messages/MOSDBoot: fix compat with pre-nautilus
	mon/MonMap: allow v1 or v2 to be explicitly specified along with part
	msg/msg_types: allow parsing of IPs without assuming v1 vs v2
	msg/msg_types: default parse to v2 addrs
	msg: standarize on v1: and v2: prefixes for *all* entity_addr_t's
	vstart.sh: use msgr2 by default
	mon/MonMap: remove get_addr() methods
	ceph-mon: adjust startup/bind/join sequence to use addrs
	mon: use MonMap::get_addrs() (instead of get_addr())
	mon/MonClient: change pending_cons to addrvec-based map
	mon/MonMap: fix set_addr() caller, kill wrapper
	mon/MonMap: remove addr-based add()
	monmaptool: fix --add to do either legacy or msgr2+legacy
	monmaptool: clean up iterator use a bit
	mon/MonMap: handle ambiguous mon addrs by trying both legacy and msgr
	mon/MonMap: take addrvec for set_initial_members
	mon/MonMap: use addrvecs for test instances
	mon: pass addrvec via MMonJoin
	mon/MonmapMonitor: fix 'mon add' to populate addrvec
	mon/MonMap: addr -> addrvec
	msg/async/ProtocolV2: only update socket_addr if we learned our addr
	osd: go active even if mon only accepted our v1 addr
	test/msgr: add test for msgr2 protocol
	msg/async/ProtocolV2: share socket_addr and all addrs during handshake
	msg/async: print socket_addr for the connection
	msg/async: msgr2 protocol placeholder
	msg/async: move ProtocolV1 class to its own source file
	msg/async: keep listen addr in ServerSocket, pass to new connections
	msg/async/AsyncMessenger: fix set_addr_unknowns

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-01-04 13:42:09 -06:00
Sage Weil
ad39c086a0 PendingReleaseNotes: deprecate 'mon addr'
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
Kefu Chai
0c643f8cea pybind/rgw: pass the flags to callback function
before this change, the `flags` parameter passed to `LibRGWFS.readdir()`
will be dropped on the floor and ignored.
after this change, it will be passed to the specified callback function.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-01-03 19:00:28 +08:00
Sage Weil
31dd620883 mon: mon_osd_pool_ec_fast_read -> osd_pool_default_ec_fast_read
More consistent name!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-12-13 07:05:08 -06:00
Sage Weil
4ba456484d ceph-create-keys: depreceate, print warning
We'll remove this post-nautilus or post-octopus, I guess?

Signed-off-by: Sage Weil <sage@redhat.com>
2018-12-13 04:23:53 -06:00