* refs/pull/26468/head:
qa: config recall settings to test cache drop
qa: check cache dump works without timeout
mds: add 2nd order recall throttle
mds: drive log flush and cache trim during recall
mds: avoid gather assertion when subs exist
mds: output full details for recall threshold
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Currently each time NFS page is opened and NFS Ganesha is not configured
an error notification is thrown and no extra information is given.
Now the user will be redirected to an information page.
Removed the orchestrator information since it no longer applies.
Fixes: http://tracker.ceph.com/issues/38399
Signed-off-by: Tiago Melo <tmelo@suse.com>
mgr/dashboard: Add support for managing RBD QoS
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ricardo Dias <rdias@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
If we use the defaults, the MDS/client will recall/release everything quickly.
We want it to take time to see things like the timeout get hit.
Fixes: http://tracker.ceph.com/issues/38348
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mgr/dashboard: Add UI to configure the telemetry mgr plugin
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Patrick Nawracay <pnawracay@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
allow a user of the orchestrator interface to express that the inventory
query should not read from any cached inventory state.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
* refs/pull/26336/head:
qa/tasks/keystone.py: no need for notcmalloc in example
qa/suites/rgw/tempest/tasks/rgw_tempest: no need for notcmalloc
Reviewed-by: Alfredo Deza <adeza@redhat.com>
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
mgr/orchestrator: Unify `osd create` and `osd add`
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Also:
* Added some more tests
* Better validation of drive Groups
* Simplified `TestWriteCompletion`
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
* refs/pull/26038/head:
mds: simplify recall warnings
mds: add extra details for cache drop output
qa: test mds_max_caps_per_client conf
mds: limit maximum number of caps held by session
mds: adapt drop cache for incremental recall
mds: recall caps incrementally
mds: adapt drop cache for incremental trim
mds: add throttle for trimming MDCache
mds: cleanup SessionMap init
mds: cleanup Session init
Reviewed-by: Zheng Yan <zyan@redhat.com>
Instead of a timeout and complicated decisions about whether the client is
releasing caps in an expeditious fashion, just use a DecayCounter that tracks
the number of caps we've recalled. This counter is decremented whenever the
client releases caps. If the counter passes a threshold, then we raise the
warning.
Similar reworking is done for the steady-state recall of client caps. Another
release DecayCounter is added so we can tell when the client is not releasing
any more caps.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
-filter out mons from other clusters
-fix parsing of mon name from role
Fixes: http://tracker.ceph.com/issues/38115
Signed-off-by: Casey Bodley <cbodley@redhat.com>
* When the creation of the cluster is delegated to vstart_runner.py
(--create or --create-target-only) the amount of MGRs required
is calculated by the script so there is no more skipped tests
due to insufficient amount of MGRs.
* Additionally, this issue is not reproducible anymore:
Fixes: https://tracker.ceph.com/issues/37964
* Fixed typo: TEUTHOLOFY_PY_REQS
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
As with trimming, use DecayCounters to throttle the number of caps we recall,
both globally and per-session.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
`cache drop` is a long running command that will block the asok interface
(while the tell version does not). Attempting to abort the command with ^C or
equivalents will simply cause the `ceph` command to exit but won't stop the
asok command handler from waiting for the cache drop operation to complete.
Instead, just allow the tell version.
Fixes: http://tracker.ceph.com/issues/38020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/26012/head:
qa: add test that down fs does not ERR
mon/MDSMonitor: skip offline ERR for down fs
Reviewed-by: Douglas Fuller <dfuller@redhat.com>
* refs/pull/25973/head:
qa: use simpler fs fail to bring fs down
MDSMonitor: add fs fail command
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Douglas Fuller <dfuller@redhat.com>
If there are leftover merges at the end of the run they can take a long
time to get through, blowing our timeout for (waiting for pgs to become
active and to stop splitting/merge) and scrubbing pgs. Stop all of that
at the end of the run so that we don't have to wait so long.
Signed-off-by: Sage Weil <sage@redhat.com>
We used to rely on the monmap bootstrap code to magically create a valid
monmap with named mons because our old-style ceph.conf had mon_addr
values in each mon.foo section. Instead, just feed it a real monmap
from pre-destruction.
In practice, a user can manually generate this monmap, or rename the
mons after the fact with --inject-monmap, or whatever. Out of scope
for this test, so we just do the simplest thing to make the rebuild test
work.
Signed-off-by: Sage Weil <sage@redhat.com>
Use the new config option type names (given by the cluster) in the
dashboard.
Fixes: http://tracker.ceph.com/issues/37843
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
- if force-branch, use that
- otherwise:
- read default-branch from client config
- use suite branch or ceph branch if suite branch is not defined
- if this branch is one of official releases (or master), prefix
it with 'ceph-'
try to clone branch specified above, if failed (branch doesn't exist probably)
and not force-branch, use default-branch.
Also add an option to override ragweed repo.
Switched all force-branch from ragweed qa suite to default-branch.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Otherwise the Mutation for Truncate is done on obj_id of the last iteration of the previous loop.
Fixes: http://tracker.ceph.com/issues/37836
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/25621/head:
mds: allow boot on read-only
mds: setup readonly mode for PurgeQueue
mds: return string_view for type str
mds: add missing locks for PurgeQueue methods
mds: delete on_error context on des
Reviewed-by: Zheng Yan <zyan@redhat.com>
* refs/pull/25009/head:
librbd: stringify locker name with get_legacy_str()
osdc/Objecter: fix list_watchers addr rendering to match legacy
test/crimson: disable unittest_seastar_messenger test
msg/msg_types: encode entity_addr_t TYPE_ANY as TYPE_LEGACY for pre-nautilus
client: make blacklist detection handle TYPE_ANY entries
mon/OSDMonitor: maintain compat output for 'blacklist ls'
client: maintain compat for {inst,addr}_str in status dump
qa/tasks/ceph_manager: compare osd flush seq #'s as ints
qa/suites/fs: make use of simple.yaml where appropriate
qa/msgr: move msgr factet into generic re-usable dir
crimson: fix monmap build for seastar
doc/start/ceph.conf: trim the sample ceph.conf file
doc/rados/operations: only describe --public-{addr,network} method for adding mons
PendingReleaseNotes: deprecate 'mon addr'
doc: fix some 'mon addr' references
doc/rados/configuration: fix some 'mon addr' references
doc/rados/configuration/network-config-ref: revise network docs somewhat
doc/rados/configuration/network-config-ref: remove totally obsolete section
qa/suites/rados: replace mon_seesaw.py task with a small bash script
qa/suites/fs/upgrade: don't bind to v2 addrs
qa/tasks/mon_thrash: avoid 'mon addr' in mon section
mon/MonClient: disable ms_bind_msgr2 if NAUTILUS feature not set
osd/OSDMap: maintain compat addr fields
msg/msg_types: add get_legacy_str()
mds/MDSMap.h: maintain compat addr field
mon/MgrMap: maintain compat active_addr field
mon/MonClient: reconnect to mon if it's addrvec appears to have changed
qa/tasks/ceph.conf.template: increase mon_mgr_mkfs_grace
msg/async/ProtocolV2: fill in IP for all peer_addrs
msg/async: print all addrs on debug lines
mon/MonMap: no noname- mon name prefix when for_mkfs
ceph-monstore-tool: print initial monmap
msg/async/ProtocolV2: advertise ourselves as a v2 addr when using v2 protocol
msg/async: assert existing protocol matches current protocol
msg/async: add missing modelines
mon/MonMap: add missing modeline
vstart.sh: put mon addrs in mon_host, not 'mon addr'
msg/async: better debug around conn map lookups and updates
mon/MonClient: dump initial monmap at debug level 10
qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
qa/tasks/ceph: set initial monmap features with using addrvec addrs
monmaptool: add --enable-all-features option
qa/tasks/ceph: only use monmaptool --addv if addr has [,:v]
qa/tasks/ceph_manager: make get_mon_status use mon addr
qa/tasks/ceph: keep mon addrs in ctx namespace
mon/OSDMonitor: log all osd addrs on boot
msg/simple: behave when v2 and v1 addrs are present at target
mon/MonClient: warn if global_id changes
msg/Connection: add warning/note on get_peer_global_id
mds/MDSDaemon: clean up handle_mds_map debug output a bit
qa/suites/rados/upgrade: debug mds
mds/MDSRank: improve is_stale_message to handle addrvecs
msg/async: make loopback detect when sending to one of our many addrs
qa/suites/rados/upgrade: no aggressive pg num changes
mon/OSDMonitor: require nautilus mons for require_osd_release=nautilus
mon/OSDMonitor: require mimic mons for require_osd_release=mimic
qa/suites/rados/thrash-old-clients: use legacy addr syntax in ceph.conf
msg/async: preserve peer features when replacing a connection
qa/tasks/ceph.py: move methods from teuthology.git into ceph.py directly; support mon bind * options
mon/MonMap: adjust build_initial behavior for mkfs vs probe
mon/MonMap: improve ambiguous addr behavior
qa/suites/rados/upgrade: spread mons a bit
qa/rados/thrash-old-clients: keep mons on separate hosts
qa/standalone/mon/misc.sh: tweak test to be more robust
qa/tasks/mon_seesaw: expect v1/v2 prefix in addr
osd/OSDMap: fix is_blacklisted() check to assume type ANY
mon/OSDMonitor: use ANY addr type for blacklisting
mon/msg_types: TYPE_V1ORV2 -> TYPE_ANY
qa/workunits/cephtool: fix blacklist test
qa/suites/upgrade: install old version with only v1 addrs
common/options: by default, bind to both msgr v1 and v2 addresses
vstart.sh: add --msgr1, --msgr2, --msgr21 options
msg/async/ProtocolV2: be flexible with server identity check
msg/msg_types: fix entity_addrvec_t::parse() with null end arg
qa/suites/rados/basic/msgr: no msgr2 addrs in initial monmaps
qa/tasks/ceph: add 'mon_bind_addrvec' and 'mon_bind_msgr2' options
monmaptool: add --addv argument to pass in addrvec directly
qa/suites/rados/basic/msgr: do not use msgr2 with simplemessenger
qa/suites/rados/basic/msgr: async is not experimental
messages/MOSDBoot: fix compat with pre-nautilus
mon/MonMap: allow v1 or v2 to be explicitly specified along with part
msg/msg_types: allow parsing of IPs without assuming v1 vs v2
msg/msg_types: default parse to v2 addrs
msg: standarize on v1: and v2: prefixes for *all* entity_addr_t's
vstart.sh: use msgr2 by default
mon/MonMap: remove get_addr() methods
ceph-mon: adjust startup/bind/join sequence to use addrs
mon: use MonMap::get_addrs() (instead of get_addr())
mon/MonClient: change pending_cons to addrvec-based map
mon/MonMap: fix set_addr() caller, kill wrapper
mon/MonMap: remove addr-based add()
monmaptool: fix --add to do either legacy or msgr2+legacy
monmaptool: clean up iterator use a bit
mon/MonMap: handle ambiguous mon addrs by trying both legacy and msgr
mon/MonMap: take addrvec for set_initial_members
mon/MonMap: use addrvecs for test instances
mon: pass addrvec via MMonJoin
mon/MonmapMonitor: fix 'mon add' to populate addrvec
mon/MonMap: addr -> addrvec
msg/async/ProtocolV2: only update socket_addr if we learned our addr
osd: go active even if mon only accepted our v1 addr
test/msgr: add test for msgr2 protocol
msg/async/ProtocolV2: share socket_addr and all addrs during handshake
msg/async: print socket_addr for the connection
msg/async: msgr2 protocol placeholder
msg/async: move ProtocolV1 class to its own source file
msg/async: keep listen addr in ServerSocket, pass to new connections
msg/async/AsyncMessenger: fix set_addr_unknowns
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
The teuthology test did not like the change to remove 'mon addr' from
ceph.conf. The standalone script is easier to test.
Note that it avoids mon names 'a', 'b', 'c' since the MonMap::build_initial
uses those.
Signed-off-by: Sage Weil <sage@redhat.com>
The grace starts with the monmap creation stamp, and ceph.py does a lot
of work between creating that map and actually starting daemons (e.g.,
preparing all of the osd devices), leading to occasional MGR_DOWN errors.
Double the grace period.
Signed-off-by: Sage Weil <sage@redhat.com>
The --add option will only infer a bare IP to include a v2 addr if the
NAUTILUS feature is there, and that isn't normally present on a freshly
generate monmap. Add it if we are doing addrvecs!
Signed-off-by: Sage Weil <sage@redhat.com>
Having these live in teuthology.git is silly, since they are only consumed
by the ceph task, and it is hard to revise the behavior.
Revise the behavior by adding mon_bind_* options.
Signed-off-by: Sage Weil <sage@redhat.com>
Updated health controller & test to reflect changes introduced
in 'df' payload.
Return 'total_used_raw_bytes' instead of 'total_used_bytes'
to match CLI 'bin/rados df' used/avail summary in
Landing Page (frontend component).
Do not return 'stats_by_class' to save bandwidth as they are
not needed (right now) in the dashboard.
Fixes: https://tracker.ceph.com/issues/37717
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
* All pool controller methods with same default value
for stats flag.
* Stats requested explicitly by frontend service.
* Updated API tests accordingly.
Fixes: https://tracker.ceph.com/issues/36740
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
1. To be able to run the cli without an external orchestrator.
2. Run the CLI in Teuthology.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
Move it up into CephTestCase so that mgr tests can
use it too, and pick it up in vstart_runner.py so
that these tests will work neatly there.
Signed-off-by: John Spray <john.spray@redhat.com>
The current solution fails on our CI-system as some outputs can have
more values and some parameters like 'w' can vary in different
environments.
As this was only tested before in a vstart cluster environment it
worked.
Through this commit only the given attributes we know to be there,
will be tested.
Fixes: https://tracker.ceph.com/issues/37275
Signed-off-by: Stephan Müller <smueller@suse.com>
* refs/pull/24940/head:
qa: add test for getfattr ceph.dir.pin
client: support getfattr ceph.dir.pin extended attribute
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
The change in b3e69a9609 broke the test's assumption that the endpoint
wouldn't be readable by block-manager. It doesn't looks as though that's
actually problematic for the ECP controller, so just update the test to
use rgw-manager instead.
Signed-off-by: Zack Cerza <zack@redhat.com>
Some classes should still be imported directly from collections;
only OrderedDict, Iterable and Callable (in the context of the
ceph codebase) are found in collections.abc.
The current code works due to the fallback support for Python 2.
Signed-off-by: James Page <james.page@ubuntu.com>
This splits out the collection of health and log data from the
/api/dashboard/health controller into /api/health/{full,minimal} and
/api/logs/all.
/health/full contains all the data (minus logs) that /dashboard/health
did, whereas /health/minimal contains only what is needed for the health
component to function. /logs/all contains exactly what the logs portion
of /dashboard/health did.
By using /health/minimal, on a vstart cluster we pull ~1.4KB of data
every 5s, where we used to pull ~6KB; those numbers would get larger
with larger clusters. Once we split out log data, that will drop to
~0.4KB.
Fixes: http://tracker.ceph.com/issues/36675
Signed-off-by: Zack Cerza <zack@redhat.com>
* refs/pull/17526/head:
qa/tasks/ceph_manager: avoid test_map_discontinuity stall with too few up osds
Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
Some tests have m=2,k=2 and this will break them. Sometimes even if we
have 5 up osds, we end up with 4 and CRUSH gets picky, so build in a
buffer and only do this if we have 6 up.
We don't have an easy way from here to see what the min up osds for healthy
is... basically this map discontinuity test just sucks.
Signed-off-by: Sage Weil <sage@redhat.com>
The behavior of `safe-to-destroy` has changed in
432f194355 (PR#24799) and the backend
needs to be adapted accordingly.
Fixes: http://tracker.ceph.com/issues/37290
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
Enabled ctx.managers to take cluster name from config in restart() method instead of default 'ceph'.
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Separate diskprediction local cloud from the diskprediction plugin.
Devicehealth invoke device prediction function related on the global
configuration "device_failure_prediction_mode".
Signed-off-by: Rick Chen <rick.chen@prophetstor.com>
The new info endpoint will provide the frontend with the necessary
information it needs to create new profiles.
Fixes: https://tracker.ceph.com/issues/25156
Signed-off-by: Stephan Müller <smueller@suse.com>
- Fix bug in Dashboard QA unit test framework. Don't set the application type header manually, this is done by the requests library if required.
- Enhance QA unit test helper: Print the response of the API request if it fails. This should help to identify the problem more easily.
- Fix bug in the OSD controller. A parameter needs to be converted to integer.
- Take care that the params of the request object are not modified.
The issue was introduced by PR https://github.com/ceph/ceph/pull/24475. The CherryPy json_in plugin disclosed the errorneous unit test helper implementation.
Fixes: https://tracker.ceph.com/issues/36708
Signed-off-by: Volker Theile <vtheile@suse.com>
Python 3.7 now shows a warning as below.
/usr/bin/ceph:128: DeprecationWarning: Using or importing the ABCs from
'collections' instead of from 'collections.abc' is deprecated, and in
3.8 it will stop working
import rados
This patch addresses the that particular issue.
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
This is related to http://tracker.ceph.com/issues/36453. It is far from
a complete solution, but seems like a positive move.
I tested this change by first disabling my browser cache, and then used
the /docs endpoint to query /api/dashboard/health. Before compression:
Content-Length: 60748
Time: 615ms
After:
Content-Length: 7505
Time: 92ms
Then, I logged into the dashboard as normal and reloaded the page once I
was in. Some values for the reload operation before compression:
Total page load time: 58.48s
vendor.js Content-Length: 6486025
vendor.js time: 48.09s
After:
Total page load time: 14.55s
vendor.js Content-Length: 1143178
vendor.js time: 4.50s
Signed-off-by: Zack Cerza <zack@redhat.com>
This fixes "TypeError: admin_socket() got an unexpected keyword argument
'timeout'". The value is never used.
Signed-off-by: Zack Cerza <zack@redhat.com>
If there is a workunit task associated with the same client, the two
tasks will attempt to clone the suite repo to the same directory.
Worse, if it's parallel tasks, the two clones will clobber each
other.
Fixes: http://tracker.ceph.com/issues/36542
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* refs/pull/24292/head:
qa: add test for rctime on root inode
mds: set rctime on new system inode
mds: small refactor
Reviewed-by: Zheng Yan <zyan@redhat.com>
This makes it easier to re-run tests against a suite branch without
requiring a full ceph-ci build and repo.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Apparently 15m is not long enough for some workunits like fsstress.
Fixes: http://tracker.ceph.com/issues/36365
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It is now commented out like it was before,
but I've added a comment what happened during this test with the QA
system. The problem was that even with only a increase of 1 PG the QA
cluster went into a cluster warning state and did not recover in time.
The QA coverage timeout is 2 minutes.
I could not reproduce this behavior with a local cluster, but I've
added a loop to wait until pgp and pg number are equal and the cluster
is in a healthy state again. This can take locally about 5 seconds.
The internal loop has a timeout of 3 minutes.
Fixes: https://tracker.ceph.com/issues/36362
Signed-off-by: Stephan Müller <smueller@suse.com>
The dashboard backend can now unset all set compression arguments if the
compression mode is switched to 'unset'. In the case of 'unset' Ceph
itself will only delete the 'compression_mode' argument, not all other
set arguments. The other arguments that should be removed, too, are
added to the update arguments in order to delete all set arguments.
Fixes: https://tracker.ceph.com/issues/36355
Signed-off-by: Stephan Müller <smueller@suse.com>
Refactor '_get_mon_allow_pool_delete_config' method to be a little bit
more general. The method can now be used to get the value of every
config option known to the cluster.
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Otherwise a bug preventing an asok operation from completing will cause the
entire job to fail.
Fixes: http://tracker.ceph.com/issues/36335
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
when enabling a module attempt to determine if it is an always on
module, and if it is, then return without waiting on the active manager
daemon to restart---which it won't if it is an always on module.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
* refs/pull/21566/head:
test: add test for mds drop cache command
mds: command to trim mds cache and client caps
mds: implement journal flush as asynchronous context execution
mds: cleanup some asok commands
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
If there is a bug preventing rm from completing, the workunit will get stuck.
Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise QA sits forever waiting for the kclient to umount when there is a
problem.
Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise the command will hang if the mount is broken.
Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/23187/head:
test: make rank argument mandatory when running journal_tool
cephfs-journal-tool: make "--rank" argument mandatory
cephfs-journal-tool: pass local arg vector for Journal actions
cephfs-journal-tool: dump to per rank output file wherever necessary
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/23530/head:
qa/vstart_runner: fix daemons list
PendingReleaseNotes: note multifs support in libcephfs
test/cephfs: add pybind test for mount_root
pybind/cephfs: enable passing filesystem name to mount
libcephfs: add ceph_select_filesystem
common: add doc strings to client_mds_namespace
client: allow passing fs name to mount()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Conflicts:
PendingReleaseNotes