The MgrMap stores a list of RADOS clients' addresses registered by the
mgr modules. During failover of ceph-mgr, the list is used to blocklist
clients belonging to the failed ceph-mgr.
Store the names of the mgr modules that registered the RADOS clients
along with the clients' addresses in the MgrMap. During debugging, this
allows easy identification of the mgr module that registered a
particular RADOS client by just dumping the MgrMap (`ceph mgr dump`).
Following is the MgrMap output with a module's client name displayed
along with its client addrvec,
$ ceph mgr dump | jq '.active_clients[0]'
{
"name": "devicehealth",
"addrvec": [
{
"type": "v2",
"addr": "10.0.0.148:0",
"nonce": 612376578
}
]
}
Fixes: https://tracker.ceph.com/issues/58691
Signed-off-by: Ramana Raja <rraja@redhat.com>
This bases on two commits:
* 7bbc92eda3 and
* 6b22d47863 which seems to be
a fixup to former one.
In contrast to them, in `OSDMonitor::create_initial()` I updated
also `newmap.require_osd_release` to pacific when
`mon_debug_no_require_reef` and `mon_debug_no_require_quincy`.
Please take have an extra look on that during the review.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
1. If data or metadata pool is already in-use by filesystem
then it is not allowed to reuse the same pool for another
filesystems.
2. Test is failing because above(1) restrictions/checks comes
before checking erasure-code pools. Hence test is failing
and not finding expected error string in output.
3. Proposed fix checks newly added error string instead of
'erasure-code'.
4. Also adding new tests to verify string 'erasure-code'
by passing --force option so that check for pools reuse(1)
will be skipped and check for 'erasure-code' will be hit.
Fixes: https://tracker.ceph.com/issues/56384
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
When we set the proxy mode to remove a writeback cache according to
the ceph official documentation an error occurred:
[root@controller-1 root]# ceph osd tier cache-mode cachepool proxy
Invalid command: proxy not in writeback|readproxy|readonly|none
osd tier cache-mode writeback|readproxy|readonly|none [--yes-i-really-mean-it]:
specify the caching mode for cache tier
According to the description of the official website document: since
a writeback cache may have modified data, you must take steps to ensure
that you do not lose any recent changes to objects in the cache before
you disable and remove it. Change the cache mode to proxy so that new and
modified objects will flush to the backing storage pool.
Fixes: https://tracker.ceph.com/issues/54576
Signed-off-by: tan changzhi <544463199@qq.com>
These tests are supposed to be validating we don't accept invalid IPs,
but they left out the "add" subcommand so they're all failing on that!
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Test the commands:
`osd pool create` <pool> --target_size_ratio <float>
`osd pool set` <pool> target_size_ratio <float>
`osd pool get` <pool> target_size_ratio
Signed-off-by: Kamoltat <ksirivad@redhat.com>
Currently,
# ceph orch ls -h
...
orch ls [<service_type>] [<service_name>] [--export] [-- List services known to orchestrator
format {plain|json|json-pretty|yaml}] [--refresh]
# ceph orch ls osd -h
... nothing ...
because the CLI is provided more arguments than the command prefix. Make
-h drop right-hand args until we get at least one prefix match. This
means we can have a partial command written with some args and add -h to
get a usage for that command.
Signed-off-by: Sage Weil <sage@newdream.net>
If there is application metadata on the base pool, it should be mirrored
to any other tiers in the set. This aligns with the fact that the
'ceph osd pool application ...' commands refuse to operate on a non-base
pool.
This fixes problems with accessing tiers (e.g., cache tiers) when the
cephx cap is written in terms of application metadata.
Fixes: https://tracker.ceph.com/issues/49788
Signed-off-by: Sage Weil <sage@newdream.net>
I haven't seen it be an issue, but I'm worried a slight different in ping
report timing might result in flapping leaders even with the new
ignore-out-of-quorum code.
Imagine DCs A, B, C where A and B are netsplit: C might first elect A, then
get a propose from B immediately following a successful ping reply that gives
it a better score than A and thus gets an election win; then A could do
the same, etc.
In a default 12-hour halflife, 2-second ping config, the most a single ping
can change the score is by 0.00002314814. Therefore a code default of .0001
and a config default of .0005 should be plenty of room to prevent that in
sane monitor configurations, while still responding quickly if connections are
restored.
Plus of course this only applies to out-of-quorum monitors to peons, so if
a monitor manages to contact the leader they will be allowed to join
instantly.
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
The "proxy" and "forward" cache-tier modes have been completely removed,
so it's sufficient to test once that they cannot be set.
Fixes: a0a3ed324a
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:
Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`
Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
Other parts of this script leave OSDs reweighted, which can make this test
fail to go fully clean.
0 ssd 0.08789 osd.0 up 0.63213 1.00000
1 ssd 0.08789 osd.1 up 0.63213 1.00000
2 ssd 0.08789 osd.2 up 1.00000 1.00000
35.0 raw ([2,1,2147483647], p2) up ([2,1,2147483647], p2) acting ([2,1,2], p2)
Fix by just deleting this pool when we're done.
Fixes: https://tracker.ceph.com/issues/44067
Signed-off-by: Sage Weil <sage@redhat.com>
- add "Tell $type commands" heading
- 'ceph tell mon.a -h' now works
- 'ceph tell mon.a prefix -h' also works
Signed-off-by: Sage Weil <sage@redhat.com>
Allow the autoscale mode to be set atomically with pool creation.
Fixes: https://tracker.ceph.com/issues/42638
Signed-off-by: Sage Weil <sage@redhat.com>
quote from Sage's reply
> This is a mon-specific command--it doesn't make sense as a CLI command
> for the entire cluster--it only makes sense as a command to tell a
> specific monitor. Like ``ceph tell mon.a compact``. Back when Joao
> did #4595 these commands were all mixed together and putting it under
> 'ceph mon ...' made sense, but now you're specifially sending it to a
> mon, so the 'mon' part of the command is redundant.
so let's drop "mon compact" in favor of "compact" command
Signed-off-by: Kefu Chai <kchai@redhat.com>
"scrub" was marked deprecated in
1814d7441b. this commit
was in turn included by v10.0.0. so it's long enough for its
retirement.
the test is updated accordingly
Signed-off-by: Kefu Chai <kchai@redhat.com>
otherwise wait_for_health() fails like:
wait_for_health: ceph health detail
HEALTH_WARN 1 pool(s) have non-power-of-two pg_num
[WRN] POOL_PG_NUM_NOT_POWER_OF_TWO: 1 pool(s) have non-power-of-two pg_num
pool 'rbd' pg_num 10 is not a power of two
../qa/workunits/cephtool/../../standalone/ceph-helpers.sh:1613: wait_for_health: return 1
the failure was found when testing test_mon_pg().
this behavior was introduced by 6e46b1c0e5
Signed-off-by: Kefu Chai <kchai@redhat.com>
"version" is not an asok command anymore in the sense that it's served
by registered asock hook. so in this change, we replace "version" with
"sessions", so we can verify that audit channel does not the dispatched
"sessions" command sent from ceph cli.
also, restructure the test as a loop for better readability
Signed-off-by: Kefu Chai <kchai@redhat.com>
tests all IEC and SI units related test with a tier pool. as
`target_max_objects` and `target_size_bytes` only apply to tier
pools. so, for the sake of simplicity, tests all of them using
a tier pool.
introduced by 9095f67e
Signed-off-by: Kefu Chai <kchai@redhat.com>
mgr/ActivePyModules: behave if a module queries a devid that does not exist
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
The GIL reacquire was being handled in the lambda, but that was not
getting called if the device didn't exist, leading to a crash.
Add a trivial CLI test.
Fixes: https://tracker.ceph.com/issues/42578
Signed-off-by: Sage Weil <sage@redhat.com>