a0b453ad335671bd92f165115d6ee984d2412448 added the wait state, which can
make PGs stay in active+clean+wait for a while instead of going into
active+clean directly. As far as TEST_auto_repair_bluestore_failed is
concerned, we only care about the repair state being cleared.
Fixes: https://tracker.ceph.com/issues/45075
Signed-off-by: Neha Ojha <nojha@redhat.com>
v2 was introduced in nautilus, and we don't support mimic -> pacific
upgrades (only mimic -> octopus). This test can be removed!
Signed-off-by: Sage Weil <sage@redhat.com>
to address the test failures like
```
2020-04-07T15:44:58.693 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed: ceph pg dump
pgs
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed: pgid
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh: line 498: pgid: command not found
```
Signed-off-by: Kefu Chai <kchai@redhat.com>
It is possible for the pg dump to not be the latest when we check for newprimary
in _common_test(). This is because mgr_stats_period is 5 seconds, and we may not
have fetched the latest stats just yet. This causes the test to look at the same
stats before and after wait_for_clean.
Fixes: https://tracker.ceph.com/issues/43807 (2)
Signed-off-by: Neha Ojha <nojha@redhat.com>
Mon might fail to share the newest map with any of up osds, e.g.,
due to an injected broken pipe. Since we don't have any client
activities during the osd-markdown tests, osds might be unaware of
the map changes made through CLI. Make sure osds have pulled the
newest map down before we can test its reaction correctly.
Fixes: https://tracker.ceph.com/issues/44662
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/33885/head:
Merge pull request #33848 from mchangir/octopus-tests-remove-suprious-whitespace
Merge PR #33746 into octopus
Merge PR #33830 into octopus
Merge PR #33732 into octopus
Merge PR #33620 into octopus
Merge pull request #33876 from tchaikov/octopus-cephadm-mypy
cephadm: add "assert foo is not None" for mypy check
Merge pull request #33067 from tspmelo/wip-rbd-delete-with-snapshot
cephadm: add grafana adopt
Merge PR #33771 into octopus
Merge PR #33850 into octopus
Merge PR #33853 into octopus
Merge PR #33857 into octopus
Merge PR #32990 into octopus
Merge PR #33713 into octopus
Merge PR #33838 into octopus
qa/tasks/cephadm: no default mon|mgr|crash service specs
qa/suites/rados/cephadm/upgrade: upgrade start point that supports the no-spec option
Merge PR #33832 into octopus
cephadm: bootstrap: wait for mgr to restart after enabling a module
mgr: add 'mgr_status' tell command
Merge pull request #33839 from rhcs-dashboard/44538-fix-rgw-grafana-get-put-latencies
Merge pull request #33743 from votdev/issue_43869_fix_qa_test
cephadm: create initial mon and mgr service specs too
cephadm: no need to pregenerate a crash key for the bootstrap host
mgr/cephadm: do not complain when we don't have enough hosts
mgr/cephadm: remove orphan daemons
mgr/cephadm: report size=0 for fabricated ServiceDescription
mgr/cephadm: safety check to prevent removing all mon|mgr daemons
mgr/cephadm: prevent scaling mon|mgr below count=1
mgr/cephadm: do not remove daemons from remove_service
Merge pull request #33805 from tchaikov/wip-44500
spec: Podman (temporarily) requires apparmor-abstractions on suse
mgr/cephadm: Make sure we don't co-locate the same daemon
monitoring: fix RGW grafana chart 'Average GET/PUT Latencies'
tests: remove spurious whitespace
mgr/cephadm: fix service list filtering
Merge PR #33825 into octopus
Merge PR #33811 into octopus
Revert "Merge pull request #33673 from cbodley/wip-denc-enum"
mgr/cephadm: fix upgrade order
Merge PR #33801 into octopus
Merge PR #33822 into octopus
cephadm: bootstrap: tolerate error return from -h
Merge PR #33809 into octopus
Merge PR #32678 into octopus
cephadm: use `sh` instead of `bash` during enter
ceph.in: only shut down rados on clean exit
common/ceph_timer: Pass reference to waited time on stack
common/ceph_timer: Add test
common/ceph_timer: Use unique_function, allowing noncopyable events
common/ceph_timer: Couple cleanups
common/ceph_timer: Fix namespaces
common/ceph_timer: Add missing includes
common/ceph_timer.h: Don't indent contents of a namespace
mgr/dashboard: Crush rule modal
mgr/dashboard: Preserve rule selection on pool type change
mgr/dashboard: Crush rule is only send during replicated pool creation
mgr/dashboard: Explicit returns in pool form
mgr/dashboard: Removes fork join in pool form
mgr/dashboard: Hide ECP actions during ec pool edit
mgr/dashboard: Pool form erasure/replicated boolean
mgr/dashboard: Change pool info API endpoint
mgr/dashboard: Moves ECP info endpoint to UI-API
mgr/cephadm: add _remove_osds_bg back to main loop
mgr/cephadm/osd: update removal report immediately
qa/tasks/ceph_manager: use StringIO for capturing COT output
qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
qa/standalone/scrub/osd-scrub-test: wait longer for update
qa/tasks/ceph_manager: capture stderr for COT
qa/suites/rados/ceph: drop opensuse for now
mon/MonClient: send logs to mon on separate schedule than pings
mgr/dashboard: Fix missing ImageSpec usage
mgr/dashboard: Allow removing RBD with snapshots
mgr/dashboard: Refactor and cleanup tasks.mgr.dashboard.test_user
mgr/dashboard: support multiple DriveGroups when creating OSDs
mon/MonClient: send logs to mon even if we have no keelalive2
cephadm: flag dashboard user to change password
Reviewed-by: Sebastian Wagner <swagner@suse.com>
* refs/pull/33809/head:
qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
qa/standalone/scrub/osd-scrub-test: wait longer for update
Reviewed-by: David Zafman <dzafman@redhat.com>
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:
Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`
Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
test: Expect being off by up to 2 and make sure all PGs are active+clean
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
The -N option to vstart.sh was removed, use -k
Old hinfo_key binary happen to be utf-8 decodable, now it
throws an exception trying to decode it. Use new
option to ceph-objectstore-tool to treat stdout as a terminal
and convert binary data to base64.
Signed-off-by: David Zafman <dzafman@redhat.com>
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
hence leaving the corresponding pg pinning undersized for a long time until all backfill
targets completes
It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.
The right way to achieve the above goal is for:
boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)
to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
Make sure PGs peer (simply flushing state to mon isn't enough).
Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.
Fixes: https://tracker.ceph.com/issues/43592
Signed-off-by: David Zafman <dzafman@redhat.com>
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>