Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:
Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`
Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
use APIs instead of apis to be consistent throughout.
fixes: https://tracker.ceph.com/issues/44374
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
Before this, "mds_join_fs" config enforced a preference for the standby
the monitors would select. Now the monitors actively enforce this
by purposefully removing an MDS wither lower "affinity". An MDS standby
has highest affinity if its mds_join_fs is the file system in question
or a vanilla standby (no mds_join_fs).
Fixes: https://tracker.ceph.com/issues/43392
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
1GB is too low as a default and usually results in cache size warnings
at that size; the MDS will struggle to maintain such a small cache size
for most workloads.
Fixes: https://tracker.ceph.com/issues/43182
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Remove last bits of support for 'mds_cache_size'.
'mds_cache_memory_limit' is preferred.
Fixes: https://tracker.ceph.com/issues/41951
Signed-off-by: Ramana Raja <rraja@redhat.com>
osd/OSDMap: Show health warning if a pool is configured with size 1
Reviewed-by: Sage Weil <sweil@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
0b369e1aff masked the original behaviour of '-o' which was to indicate
'outfile' as documented in the man page. Changing object-size to capital
o will restore the original behaviour.
Fixes: https://tracker.ceph.com/issues/42477
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Introduce a config option called 'mon_warn_on_pool_no_redundancy' that is
used to show a health warning if any pool in the ceph cluster is
configured with a size of 1. The user can mute/unmute the warning using
'ceph health mute/unmute POOL_NO_REDUNDANCY'.
Add standalone test to verify warning on setting pool size=1. Set the
associated warning to 'false' in ceph.conf.template under qa/tasks so
that existing tests do not break.
Fixes: https://tracker.ceph.com/issues/41666
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Currently, if a client requests the 1000 next entries from a bucket,
each bucket index shard will receive a request for the 1000 next
entries. When there are hundreds, thousands, or tens of thousands of
bucket index shards, this results in a huge amplification of the
request, even though only 1000 entries will be returned.
These changes reduce the per-bucket index shard requests. These also
allow re-requests in edge cases where all of one shard's returned
entries are consumed. Finally these changes improve the determination
of whether the resulting list is truncated.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
For an entity a.b.c.d, search all dot-delineated prefix sections. This
enables you to establish a hierarchical set of options for clients, such
as radosgw daemons.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/30859/head:
auth: EACCES, not EPERM
mon: shunt old tell commands from cli interface to asok
mon: allow mgr to tell mon.foo smart
mon: include quorum features in quorum_status
qa/workunits/mon/caps.sh: fix test
ceph_test_rados_api_cmd: fix MonDescribe test
Merge branch 'vstart-fs-auth' of git://github.com/batrick/ceph into wip-cleanup-mon-asok
test/pybind/test_ceph_argparse: fix tests
vstart: add volume client keys to keyring
vstart: use fs authorize to create master client key
vstart: redirect some output to stderr
vstart: output command strings to stderr
qa/workunits/cephtool/test.sh: fix 'quorum enter' caller
qa: change mon_status calls to quorum_status or tell commands
mon: fix 'heap ...' command
mon: consolidate 'sync force' commands
mon: allow asok commands to return an error code
mon: move 'quorum enter|exit' and 'mon_status' to asok
mon: fix 'smart' asok command
mon: remove old 'config set' and 'injectargs'
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Replace the 'ceph [mon] sync force' commands and just use the asok
sync_force command instead. This is a low-level command that nobody should
reasonsbly using except in an emergency, so do not bother with trying to
maintain compatibility; it's a bit rediculous that we had 3 variations of
this to being with!
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/30724/head:
mgr/telemetry: bump content revision and add a release note
telemetry/server: add device report endpoint
mgr/telemetry: include device telemetry
mgr/devicehealth: factor _get_device_metrics out of show_device_metrics
mgr/devicehealth: pull out MAX_SAMPLES
Reviewed-by: Dan Mick <dmick@redhat.com>
* refs/pull/30217/head:
crimson: common/admin_socket kludge so that it builds
mon/MonClient: fix sending mon command to a specific rank
src/.gitignore: ignore .tox
mon/MonClient: interpret numeric mon target name as rank
mgr,mgr/MgrClient: use fsid to signal mon-mgr vs cli MCommands
qa/workunits/cephtool: fix errpr checks for 'ceph daemon' commands
common/ceph_context: make 'config unset' idempotent
qa/tasks/dump_stuck: mon.a, not mon.0
qa/suites/rados/singleton/all/admin-socket: fix test
common/config: EPERM setting config option after startup
qa/workunits/cephtool/test.sh: fix tell output error check
common/admin_socket: pass Formatter from generic infrastructure
common/admin_socket: pass ostream to call() for error output
os/bluestore: fix asok hook return value
rgw: fix asok return value
common/ceph_context: return error code from asok commands
test/pybind/test_rados: fix accidental mon tell test
mon: print entity_name along with caps to debug log
PendingReleaseNotes: notes about asok changes
mgr/MgrClient: empty target string for 'tell' means active mgr
common/admin_socket: report error code as part of output string
osd: change trigger_[deep_]scrub tommands to a pg tell command
osd: remove old command workqueue, threadpool
osd: drop MMonCommand handling
osdc/Objecter: resend OSD tell commands on EAGAIN
osd: route tell commands to asok; migrate commands
osd: use unique_ptr<Formatter> for asok_command
common/ceph_context: add generic asok 'injectargs'
common/admin_socket: allow dup prefixes
common/admin_socket: refactor with sync and async execute_command variants
common/admin_socket: pass input bufferlist
osd: transition to call_async() for asok
common/admin_socket: support alternative call_async()
mon/MonClient: send tell commands out of band via MCommand
mon: accept tell commands via MCommand and send them to asok handler
common/admin_socket: return int from hook call()
mgr/DaemonServer: route MCommand (for octopus+) to asok commands
do not use 'ceph tell mgr'
pybind/ceph_argparse: disambiguate mgr tell and CLI commands
ceph: make 'ceph tell mgr.*' send to the active mgr
ceph: send 'ceph tell mgr.X' to the right mgr
librados: add rados_mgr_command_target
mgr/MgrClient: add start_command variant that takes a target
common/admin_socket: drop unregister_command(); use per-hook variant
common/admin_socket: drop explicit prefix arg to register_command
common/admin_socket: simplify command routing
common/admin_socket: add ability to process MCommand via asok queue
common/admin_socket: pass cmdvec to execute_command
common/admin_socket: use pipe for general wakeup
include/compat: add flags arg to pipe_cloexec
common/admin_socket: drop unused args
Reviewed-by: Neha Ojha <nojha@redhat.com>
With osd_calc_pg_upmaps_aggressively on we might have to iterate
hundreds or thousands of times to figure out an optimization,
but upmap_max_optimizations will remain a hard limit for the
total optimizations that can be returned.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/30525/head:
qa/tasks/ceph.conf.template: disable power-of-2 warning
qa/standalone/mon/health-mute: use power of 2 for pg_num
osd/OSDMap: remove remaining g_conf() usage
PendingReleaseNotes: add note for 14.2.5 so we can backport this
osd/OSDMap: health alert for non-power-of-two pg_num
Reviewed-by: Kai Wagner <kwagner@suse.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/29824/head:
qa: whitelist new FS_INLINE_DATA_DEPRECATED health warning
mds: add a HEALTH_WARN message when inline_data is enabled
mds: log a warning message when mds is started on an fs with inline_data
mon: deprecate CephFS inline_data support
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Douglas Fuller <dfuller@redhat.com>
The plan is to start deprecating this feature now so that we can remove
it in a future release. Change it to require the
--yes-i-really-really-mean-it flag, and to emit a custom
warning when that isn't specified.
For now, we leave the testing in place since we do want to be notified
if something breaks before we're ready to rip it out completely.
Fixes: https://tracker.ceph.com/issues/41311
Signed-off-by: Jeff Layton <jlayton@redhat.com>
The previous bluestore_no_per_pool_stats_tolerance had a lot of possible
values, not all of which make sense to users. Replace with a single new
option, bluestore_fsck_error_on_no_per_pool_stats, which controls whether
the lack of per-pool stats is an error or a warning. On repair, we will
unconditionally convert to per-pool stats.
This brings us in sync with the newer
bluestore_fsck_error_on_no_per_pool_omap.
Note that one part of the ceph_test_objectstore test is dropped since it
is no longer possible to create a store with legacy stats.
Signed-off-by: Sage Weil <sage@redhat.com>
This is similar to how recovery reservations are split between
local and remote.
It was the case that scrubs_pending was used for reservations at
the replicas as well as at the primary while requesting reservations
from the replicas. There was no need for scrubs_pending to turn
into scrubs_active at the primary as nothing treated that value
as special. scrubber.active = true when scrubbing is
actually going.
Now scurbber.local_reserved indicates scrubs_local incremented
Now scrubber.remote_reserved indicates scrubs_remote incremented
Fixes: https://tracker.ceph.com/issues/41669
Signed-off-by: David Zafman <dzafman@redhat.com>
As per amazon s3 spec -
https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
* The s3 bucket names should not contain upper case letters or underscore.
* Name cannot end with dash or have consecutive periods, or dashes adjacent
to periods.
* Each label in the bucket name must start and end with a lowercase
letter or a number.
* Name cannot exceed 63 characters.
This change is to enforce these rules if rgw_relaxed_s3_bucket_names is set to
'false' which is by default.
Fixes: https://tracker.ceph.com/issues/36293
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
This patch fixes raw_bytes_used key which was renamed to stored_raw.
Also added key percent_used and fixed zabbix template to be fully
compatible with zabbix 3.0
Fixes: https://tracker.ceph.com/issues/39644
Signed-off-by: Dmitriy Rabotjagov <noonedeadpunk@ya.ru>
Instead of a timeout and complicated decisions about whether the client is
releasing caps in an expeditious fashion, just use a DecayCounter that tracks
the number of caps we've recalled. This counter is decremented whenever the
client releases caps. If the counter passes a threshold, then we raise the
warning.
Similar reworking is done for the steady-state recall of client caps. Another
release DecayCounter is added so we can tell when the client is not releasing
any more caps.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is to prevent unsustainable situations where a client has so many
outstanding caps that a linear traversal/operation on the session's caps takes
unacceptable amounts of time.
Fixes: http://tracker.ceph.com/issues/38022
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
As with trimming, use DecayCounters to throttle the number of caps we recall,
both globally and per-session.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is necessary when the MDS cache size decreases by a significant amount.
For example, when stopping a large MDS or when the operator makes a large cache
size reduction.
Fixes: http://tracker.ceph.com/issues/37723
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Make this mon_warn code clearer since it involves 2 values
Code used mon scrub interval instead of pg scrub interval
Rename config values to include _pg_ and ratio to make it more clear
Fix scrub warniing handling use per-pool intervals when specified
Fixes: http://tracker.ceph.com/issues/37264
Signed-off-by: David Zafman <dzafman@redhat.com>
`cache drop` is a long running command that will block the asok interface
(while the tell version does not). Attempting to abort the command with ^C or
equivalents will simply cause the `ceph` command to exit but won't stop the
asok command handler from waiting for the cache drop operation to complete.
Instead, just allow the tell version.
Fixes: http://tracker.ceph.com/issues/38020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This command sets the fs as not joinable and fails all ranks. This is a simpler
command than the typical sequence: (a) set fs not joinable; (b) iterate through
and fail ranks. It also does this in a single FSMap update.
Fixes: http://tracker.ceph.com/issues/37085
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/25009/head:
librbd: stringify locker name with get_legacy_str()
osdc/Objecter: fix list_watchers addr rendering to match legacy
test/crimson: disable unittest_seastar_messenger test
msg/msg_types: encode entity_addr_t TYPE_ANY as TYPE_LEGACY for pre-nautilus
client: make blacklist detection handle TYPE_ANY entries
mon/OSDMonitor: maintain compat output for 'blacklist ls'
client: maintain compat for {inst,addr}_str in status dump
qa/tasks/ceph_manager: compare osd flush seq #'s as ints
qa/suites/fs: make use of simple.yaml where appropriate
qa/msgr: move msgr factet into generic re-usable dir
crimson: fix monmap build for seastar
doc/start/ceph.conf: trim the sample ceph.conf file
doc/rados/operations: only describe --public-{addr,network} method for adding mons
PendingReleaseNotes: deprecate 'mon addr'
doc: fix some 'mon addr' references
doc/rados/configuration: fix some 'mon addr' references
doc/rados/configuration/network-config-ref: revise network docs somewhat
doc/rados/configuration/network-config-ref: remove totally obsolete section
qa/suites/rados: replace mon_seesaw.py task with a small bash script
qa/suites/fs/upgrade: don't bind to v2 addrs
qa/tasks/mon_thrash: avoid 'mon addr' in mon section
mon/MonClient: disable ms_bind_msgr2 if NAUTILUS feature not set
osd/OSDMap: maintain compat addr fields
msg/msg_types: add get_legacy_str()
mds/MDSMap.h: maintain compat addr field
mon/MgrMap: maintain compat active_addr field
mon/MonClient: reconnect to mon if it's addrvec appears to have changed
qa/tasks/ceph.conf.template: increase mon_mgr_mkfs_grace
msg/async/ProtocolV2: fill in IP for all peer_addrs
msg/async: print all addrs on debug lines
mon/MonMap: no noname- mon name prefix when for_mkfs
ceph-monstore-tool: print initial monmap
msg/async/ProtocolV2: advertise ourselves as a v2 addr when using v2 protocol
msg/async: assert existing protocol matches current protocol
msg/async: add missing modelines
mon/MonMap: add missing modeline
vstart.sh: put mon addrs in mon_host, not 'mon addr'
msg/async: better debug around conn map lookups and updates
mon/MonClient: dump initial monmap at debug level 10
qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
qa/tasks/ceph: set initial monmap features with using addrvec addrs
monmaptool: add --enable-all-features option
qa/tasks/ceph: only use monmaptool --addv if addr has [,:v]
qa/tasks/ceph_manager: make get_mon_status use mon addr
qa/tasks/ceph: keep mon addrs in ctx namespace
mon/OSDMonitor: log all osd addrs on boot
msg/simple: behave when v2 and v1 addrs are present at target
mon/MonClient: warn if global_id changes
msg/Connection: add warning/note on get_peer_global_id
mds/MDSDaemon: clean up handle_mds_map debug output a bit
qa/suites/rados/upgrade: debug mds
mds/MDSRank: improve is_stale_message to handle addrvecs
msg/async: make loopback detect when sending to one of our many addrs
qa/suites/rados/upgrade: no aggressive pg num changes
mon/OSDMonitor: require nautilus mons for require_osd_release=nautilus
mon/OSDMonitor: require mimic mons for require_osd_release=mimic
qa/suites/rados/thrash-old-clients: use legacy addr syntax in ceph.conf
msg/async: preserve peer features when replacing a connection
qa/tasks/ceph.py: move methods from teuthology.git into ceph.py directly; support mon bind * options
mon/MonMap: adjust build_initial behavior for mkfs vs probe
mon/MonMap: improve ambiguous addr behavior
qa/suites/rados/upgrade: spread mons a bit
qa/rados/thrash-old-clients: keep mons on separate hosts
qa/standalone/mon/misc.sh: tweak test to be more robust
qa/tasks/mon_seesaw: expect v1/v2 prefix in addr
osd/OSDMap: fix is_blacklisted() check to assume type ANY
mon/OSDMonitor: use ANY addr type for blacklisting
mon/msg_types: TYPE_V1ORV2 -> TYPE_ANY
qa/workunits/cephtool: fix blacklist test
qa/suites/upgrade: install old version with only v1 addrs
common/options: by default, bind to both msgr v1 and v2 addresses
vstart.sh: add --msgr1, --msgr2, --msgr21 options
msg/async/ProtocolV2: be flexible with server identity check
msg/msg_types: fix entity_addrvec_t::parse() with null end arg
qa/suites/rados/basic/msgr: no msgr2 addrs in initial monmaps
qa/tasks/ceph: add 'mon_bind_addrvec' and 'mon_bind_msgr2' options
monmaptool: add --addv argument to pass in addrvec directly
qa/suites/rados/basic/msgr: do not use msgr2 with simplemessenger
qa/suites/rados/basic/msgr: async is not experimental
messages/MOSDBoot: fix compat with pre-nautilus
mon/MonMap: allow v1 or v2 to be explicitly specified along with part
msg/msg_types: allow parsing of IPs without assuming v1 vs v2
msg/msg_types: default parse to v2 addrs
msg: standarize on v1: and v2: prefixes for *all* entity_addr_t's
vstart.sh: use msgr2 by default
mon/MonMap: remove get_addr() methods
ceph-mon: adjust startup/bind/join sequence to use addrs
mon: use MonMap::get_addrs() (instead of get_addr())
mon/MonClient: change pending_cons to addrvec-based map
mon/MonMap: fix set_addr() caller, kill wrapper
mon/MonMap: remove addr-based add()
monmaptool: fix --add to do either legacy or msgr2+legacy
monmaptool: clean up iterator use a bit
mon/MonMap: handle ambiguous mon addrs by trying both legacy and msgr
mon/MonMap: take addrvec for set_initial_members
mon/MonMap: use addrvecs for test instances
mon: pass addrvec via MMonJoin
mon/MonmapMonitor: fix 'mon add' to populate addrvec
mon/MonMap: addr -> addrvec
msg/async/ProtocolV2: only update socket_addr if we learned our addr
osd: go active even if mon only accepted our v1 addr
test/msgr: add test for msgr2 protocol
msg/async/ProtocolV2: share socket_addr and all addrs during handshake
msg/async: print socket_addr for the connection
msg/async: msgr2 protocol placeholder
msg/async: move ProtocolV1 class to its own source file
msg/async: keep listen addr in ServerSocket, pass to new connections
msg/async/AsyncMessenger: fix set_addr_unknowns
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
before this change, the `flags` parameter passed to `LibRGWFS.readdir()`
will be dropped on the floor and ignored.
after this change, it will be passed to the specified callback function.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Misplaced objects are not something that puts the health or safety of
data in jeopardy. Don't warn about it by default.
Since this is a change in behavior, add a release note.
Signed-off-by: Sage Weil <sage@redhat.com>
This is a simple implementation that treats anything
that matches the "--X=Y" pattern as separate from
positional arguments.
This works well for optional arguments. Mandatory
arguments still need to be specified positionally,
or the parsing code will think the command's
argument description has not been satisfied.
Signed-off-by: John Spray <john.spray@redhat.com>
This is shown to corrupt otherwise healthy rocksdb databases. Rename to
make it clear that it is generally not safe to run and shoud only be used
as a last resort.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/23530/head:
qa/vstart_runner: fix daemons list
PendingReleaseNotes: note multifs support in libcephfs
test/cephfs: add pybind test for mount_root
pybind/cephfs: enable passing filesystem name to mount
libcephfs: add ceph_select_filesystem
common: add doc strings to client_mds_namespace
client: allow passing fs name to mount()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Conflicts:
PendingReleaseNotes
As of nautilus, this will be more than two versions old:
external tooling should have been updated by now.
Signed-off-by: John Spray <john.spray@redhat.com>
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
It is no longer necessary to fetch a monmap pre-authentication, something
we previous did for get_monmap_privately(). New code has replaced this
with get_monmap_and_config(), and it authenticates in order to get that
same information (plus configs).
That change was made in mimic, but we must support upgrades from N-2,
which means that luminous daemons still need to function. The only caller
for get_monmap_privately() in luminous is from ceph-osd during mkfs.
Disabling this here means that new OSDs cannot be created using nautilus
mons and a luminous ceph-osd. Include a note for the (future) nautilus
upgrade notes.
Reported-by: Christopher Ryan Harrell <harrellcr@email.arizona.edu>
Signed-off-by: Sage Weil <sage@redhat.com>
Users should use 'osd destroy' instead. It does more and has a scary
force flag. And suggests that CLI users check 'osd safe-to-destroy'
first.
Signed-off-by: Sage Weil <sage@redhat.com>
For controlling whether a client is allowed to create or delete
snapshots
Fixes: http://tracker.ceph.com/issues/24284
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
We want to switch to an addrvec. This requires multiple parts:
- switch the Key type to use just the rank
- separate entity_name_t rank
- compat encoding
- graylog field naming has changed (includes name)
- syslog output formatting has changed (includes name)
- LogEntry operator<< modified a bit
Signed-off-by: Sage Weil <sage@redhat.com>
Adding mimic.rst and dropping related changes from PendingReleaseNotes. Also
added a few ref. labels from the major changes section
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Telegraf is a agent for collecting and reporting metrics.
It has multiple inputs and can send data to various outputs like
for example InfluxDB or ElasticSearch.
This module works by using the socket_listener of Telegraf and can
send data over UDP, TCP and a local Unix Socket.
Signed-off-by: Wido den Hollander <wido@42on.com>
* refs/pull/21374/head:
qa: add test for snap format upgrade
mds: initialize SnapServer::snaprealm_v2_since after journal replay
mds: properly distinguish cap update from snap flush
mds: update dev document of cephfs snapshot
doc: add release notes for cephfs snapshot
mds: allow snapshot by default for new filesystem
mds: close past parents after snaprealm format gets converted
mds: automaticly allow multi-active MDS after scrubbing all inodes
mds: don't mark primary dentry damaged if inode has been repaired
mds: upgrade snaprealm format during scrub
mds: allow scrubbing mdsdir
mds: cleanup scrub code
mds: show health warning if multimds with old format snapshots
mds: automaticly allow multi-active MDS after removing all old snapshots
mds: disallow multi-active MDS if snapshot was ever created by pre-mimic mds
mds: validate SnapInfo::long_name before using it
mds: don't bump snaptable last_snap when renaming snapshot
mds: properly save snaptable after upgrading version
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>