* refs/pull/31502/head:
qa/tasks/ceph2: get ceph-daemon from same place as ceph
qa/tasks/ceph2: use safe_while
qa/tasks/ceph2: pull image using sha1
qa/tasks/ceph2: docker needs quay.io/ prefix for image name
qa/workunits/rados/test_python: make sure rbd pool exists
qa/suites/rados/ssh: new tests!
qa/tasks/ceph2: pull ceph-ci/ceph:$branch
qa/tasks/ceph2: register_daemons after pods start
qa/tasks/ceph2: fix conf
qa/tasks/ceph2: add restart
qa/tasks/ceph2: pass ceph-daemon path to DaemonState
qa/tasks/ceph2: tolerate no mdss or 1 mgr
qa/tasks/ceph: replace wait_for_osds_up with manager.wait_for_all_osds_up
qa/tasks/ceph: wait-until-healthy
qa/tasks/ceph2: set up managers
qa/tasks/ceph2: use seed ceph.conf
qa/tasks/ceph: healthy: use manager helpers (instead of teuthology/misc ones)
qa/tasks/ceph2: name mds daemons
qa/tasks/ceph2: fix osd ordering
qa/tasks/ceph2: start up mdss
qa/tasks/ceph2: set up daemon handles and use them to stop
qa/tasks/ceph2: make it multicluster-aware
qa/tasks/ceph2: can bring up mon, mgr, osds!
qa/tasks/ceph2: basic task to bring up cluster with ceph-daemon and ssh
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Balancer triggers peering, which may make PGs briefly go inactive--when
they possibly haven't been active yet. E.g.,
"PG_AVAILABILITY": {
"severity": "HEALTH_WARN",
"summary": {
"message": "Reduced data availability: 3 pgs inactive, 3 pgs peering",
"count": 6
},
"detail": [
{
"message": "pg 2.6 is stuck peering since forever, current state peering, last acting [2,0]"
},
{
"message": "pg 2.1c is stuck peering since forever, current state peering, last acting [2,1]"
},
{
"message": "pg 2.7a is stuck peering since forever, current state peering, last acting [2,0]"
}
]
}
Signed-off-by: Sage Weil <sage@redhat.com>
in cephtool/test.sh, we
ceph fs set cephfs inline_data {1,0}
so the health check fails when the test ends, like
mon.a (mon.0) 3498 : cluster [WRN] Health check failed: 1 filesystem
with deprecated feature inline_data (FS_INLINE_DATA_DEPRECATED)" in
cluster log
so, before we remove the test, we need to whitelist this warning
Signed-off-by: Kefu Chai <kchai@redhat.com>
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately. This has a few important benefits:
- We immediately stop responding (binding) to any sockets, which means
other OSDs will immediately decide we are down (and dead!). This
minimizes IO interruption.
- We avoid the complex "clean" shutdown process, which is historically a
source of bugs.
In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind. Set
this option to false for valgrind QA runs so we can still do that.
Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out. This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on. If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/31168/head:
ceph-daemon: try py2 import before py3
qa/suites/rados/singleton-nomsgr/ceph-daemon: make sure python3 is installed
qa/standalone/test_ceph_damon.sh: test with python2 and python3
mgr/ssh: python, not python3
ceph-daemon: python, not python3
ceph-daemon: os.makedirs
ceph-daemon: configparser is ConfigParser on py2
ceph-daemon: avoid py3-isms
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Alfredo Deza <adeza@redhat.com>
qa: enable dashboard tests to be run with "--suite rados/dashboard"
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
This moves dashboard.yaml from rados/mgr into a new, separate rados/dashboard
suite. The common elements it uses are moved from rados/mgr into qa/ and
replaced with symlinks.
Fixes: https://tracker.ceph.com/issues/41820
Signed-off-by: Nathan Cutler <ncutler@suse.com>
We're currently facing some issues with our integration
tests. Because of that we agreed on commenting questionable
suites out to be able to run all other suites on open pull
requests.
'test_health' and 'test_perf_counters' are commented out
because they led to issues in relation to
https://tracker.ceph.com/issues/41538
As soon as the issue has been fixed, we need to re-add
these two suites again.
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
i did the original addition by grepping for ceph-mgr-ssh, but that's
included in nautilus so I missed this one!
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/30603/head:
ceph-daemon: -n type.id instead of -i id
ceph-daemon: drop unused VERSION
ceph-daemon: clean up dir helpers, tighten up permissions
ceph-daemon: fchmod before writing to keyring file
test_ceph_daemon.sh: skip ssh until container image has remoto
ceph-daemon: decode utf-8 in run() helper
mgr/ssh: clean up debug cruft
mgr/ssh: clean up bare except: block
ceph-daemon: clean up bare except: blocks
ceph-daemon: all imports to top
ceph-volume: no_tmpfs -> tmpfs
doc/bootstrap: add new bootstrap documentation
ceph-daemon: add --output-pub-ssh-key for bootstrap
ceph-daemon: make 'shell' easier to use
ceph-daemon: support docker; prefer podman
qa: add ceph-daemon
debian: ceph-daemon package, required by ceph-mgr-ssh
ceph.spec.in: ceph-daemon package, required by ceph-mgr
common/options: cleanup whitespace
mgr/ssh: simplify getting the cluster fsid
mgr/ssh: pipe ceph-daemon script to stdin of python3
ceph-daemon: add support for args and/or stdin from top of script
ceph-daemon: make ceph-volume use get_config_and_keyring
ceph-daemon: ls: behave if /var/log/ceph doesn't exist
ceph-daemon: implement 'adopt' for legacy style daemons
ceph-daemon: fix fsid detection for legacy osds
ceph-daemon: make rm-cluster clean up system-ceph*.slice too
ceph-daemon: configure ssh orchestrator
ceph-daemon: be more restrictive with file permissions
mgr/ssh: create osd with ceph-daemon
mgr/ssh: pass daemon id separately to _create_daemon
ceph-daemon: add --config-and-keyring to ceph-volume command
ceph-daemon: create log path for shell (if needed)
mgr/ssh: use _run_ceph_daemon for _create_daemon
mgr/ssh: factor _run_ceph_daemon out of _get_device_inventory
mon/ConfigMonitor: allow entity type only for 'config get'
ceph-daemon: add ceph-volume subcommand
ceph-daemon: remove unused CephContainer dname property
ceph-daemon: drop useless uid/gid checks
mgr/ssh: deploy new mgrs with ceph-daemon
mgr/ssh: factor _create_daemon out of create_mon
mon/MonCap: allow mgr to create new auth keys
mgr/ssh: run c-v with podman when getting inventory
mgr/ssh: simplify ssh connection management
mgr/ssh: use ceph-daemon for deploying mon
ceph-daemon: allow --mon-network for deploying new mon (vs specifying IP)
ceph-daemon: --config-and-keyring (not key)
common/options: add 'image' config option
test_ceph_daemon: specify image name
vstart.sh: add --ssh to enable+configure ssh orchestrator
mgr/ssh: use ssh identity from config-key, if present
mgr/ssh: hardcode default ssh_config
ceph-daemon: store ssh identity in mon config-key store
ceph-daemon: --privileged arg for 'exec'
ceph-daemon: make deploy work for osd (do a c-v prepare)
ceph-daemon: make shell privileged
ceph-daemon: move get_container_mounts to a helper
ceph-daemon: pass full path for entrypoint
ceph-daemon: make id portion of 'shell' optional
ceph-volume: accept --no-tmpfs argument for bluestore
ceph-daemon: 'unit' command
ceph-daemon: fix run command to use call(), not check_output()
src/ceph-daemon: whitespace
ceph-daemon: add 'enter', 'exec' commands
ceph-daemon: bind config to default location
test_ceph_daemon.sh: test deploy mds too
ceph-daemon: generate ssh keys
ceph-daemon: --config, not --conf
ceph-daemon: long lines
ceph-daemon: add --config to bootstrap
ceph-daemon: add 'shell' command
ceph-daemon: do not import subprocess symbols directly
ceph-daemon: add mons with 'deploy mon.x ...'
ceph-daemon: add 'ls'
ceph-daemon: simplify uid/gid a bit
ceph-daemon: fix libudev
ceph-daemon: autodetect uid/gid from container image
ceph-daemon: default to empty log files, log to stderr (systemd journal)
ceph-daemon: rm-{daemon,cluster}
ceph-daemon: fix bootstrap config
ceph-daemon: fix args.fsid usage
ceph-daemon: be careful overwriting live files
ceph-daemon: slurp some options over from the standard systemd unit
ceph-daemon: add ceph.target and ceph-$fsid.target units
test_ceph_daemon.sh: stupid test script
ceph-daemon: bootstrap and deploy (mgr) work
ceph-daemon: initial checkin
ceph-mon: fix debug print of public_addr
* refs/pull/30217/head:
crimson: common/admin_socket kludge so that it builds
mon/MonClient: fix sending mon command to a specific rank
src/.gitignore: ignore .tox
mon/MonClient: interpret numeric mon target name as rank
mgr,mgr/MgrClient: use fsid to signal mon-mgr vs cli MCommands
qa/workunits/cephtool: fix errpr checks for 'ceph daemon' commands
common/ceph_context: make 'config unset' idempotent
qa/tasks/dump_stuck: mon.a, not mon.0
qa/suites/rados/singleton/all/admin-socket: fix test
common/config: EPERM setting config option after startup
qa/workunits/cephtool/test.sh: fix tell output error check
common/admin_socket: pass Formatter from generic infrastructure
common/admin_socket: pass ostream to call() for error output
os/bluestore: fix asok hook return value
rgw: fix asok return value
common/ceph_context: return error code from asok commands
test/pybind/test_rados: fix accidental mon tell test
mon: print entity_name along with caps to debug log
PendingReleaseNotes: notes about asok changes
mgr/MgrClient: empty target string for 'tell' means active mgr
common/admin_socket: report error code as part of output string
osd: change trigger_[deep_]scrub tommands to a pg tell command
osd: remove old command workqueue, threadpool
osd: drop MMonCommand handling
osdc/Objecter: resend OSD tell commands on EAGAIN
osd: route tell commands to asok; migrate commands
osd: use unique_ptr<Formatter> for asok_command
common/ceph_context: add generic asok 'injectargs'
common/admin_socket: allow dup prefixes
common/admin_socket: refactor with sync and async execute_command variants
common/admin_socket: pass input bufferlist
osd: transition to call_async() for asok
common/admin_socket: support alternative call_async()
mon/MonClient: send tell commands out of band via MCommand
mon: accept tell commands via MCommand and send them to asok handler
common/admin_socket: return int from hook call()
mgr/DaemonServer: route MCommand (for octopus+) to asok commands
do not use 'ceph tell mgr'
pybind/ceph_argparse: disambiguate mgr tell and CLI commands
ceph: make 'ceph tell mgr.*' send to the active mgr
ceph: send 'ceph tell mgr.X' to the right mgr
librados: add rados_mgr_command_target
mgr/MgrClient: add start_command variant that takes a target
common/admin_socket: drop unregister_command(); use per-hook variant
common/admin_socket: drop explicit prefix arg to register_command
common/admin_socket: simplify command routing
common/admin_socket: add ability to process MCommand via asok queue
common/admin_socket: pass cmdvec to execute_command
common/admin_socket: use pipe for general wakeup
include/compat: add flags arg to pipe_cloexec
common/admin_socket: drop unused args
Reviewed-by: Neha Ojha <nojha@redhat.com>
We can't set the filestore setting because filestore isn't active and so
the option isn't observed, so it isn't changeable.
Signed-off-by: Sage Weil <sage@redhat.com>
Increasing osd_object_clean_region_max_num_intervals to track more
clean regions, resulting in more partial recovery.
Signed-off-by: Neha Ojha <nojha@redhat.com>