Commit Graph

265 Commits

Author SHA1 Message Date
Sage Weil
2762955576 qa/standalone/mon/mon-handle-forward: fix grep path and check return results
This makes the test more strict and less confusing.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-10 17:18:38 -06:00
Sage Weil
251f667ef8 Merge PR #25009 into master
* refs/pull/25009/head:
	librbd: stringify locker name with get_legacy_str()
	osdc/Objecter: fix list_watchers addr rendering to match legacy
	test/crimson: disable unittest_seastar_messenger test
	msg/msg_types: encode entity_addr_t TYPE_ANY as TYPE_LEGACY for pre-nautilus
	client: make blacklist detection handle TYPE_ANY entries
	mon/OSDMonitor: maintain compat output for 'blacklist ls'
	client: maintain compat for {inst,addr}_str in status dump
	qa/tasks/ceph_manager: compare osd flush seq #'s as ints
	qa/suites/fs: make use of simple.yaml where appropriate
	qa/msgr: move msgr factet into generic re-usable dir
	crimson: fix monmap build for seastar
	doc/start/ceph.conf: trim the sample ceph.conf file
	doc/rados/operations: only describe --public-{addr,network} method for adding mons
	PendingReleaseNotes: deprecate 'mon addr'
	doc: fix some 'mon addr' references
	doc/rados/configuration: fix some 'mon addr' references
	doc/rados/configuration/network-config-ref: revise network docs somewhat
	doc/rados/configuration/network-config-ref: remove totally obsolete section
	qa/suites/rados: replace mon_seesaw.py task with a small bash script
	qa/suites/fs/upgrade: don't bind to v2 addrs
	qa/tasks/mon_thrash: avoid 'mon addr' in mon section
	mon/MonClient: disable ms_bind_msgr2 if NAUTILUS feature not set
	osd/OSDMap: maintain compat addr fields
	msg/msg_types: add get_legacy_str()
	mds/MDSMap.h: maintain compat addr field
	mon/MgrMap: maintain compat active_addr field
	mon/MonClient: reconnect to mon if it's addrvec appears to have changed
	qa/tasks/ceph.conf.template: increase mon_mgr_mkfs_grace
	msg/async/ProtocolV2: fill in IP for all peer_addrs
	msg/async: print all addrs on debug lines
	mon/MonMap: no noname- mon name prefix when for_mkfs
	ceph-monstore-tool: print initial monmap
	msg/async/ProtocolV2: advertise ourselves as a v2 addr when using v2 protocol
	msg/async: assert existing protocol matches current protocol
	msg/async: add missing modelines
	mon/MonMap: add missing modeline
	vstart.sh: put mon addrs in mon_host, not 'mon addr'
	msg/async: better debug around conn map lookups and updates
	mon/MonClient: dump initial monmap at debug level 10
	qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
	qa/tasks/ceph: set initial monmap features with using addrvec addrs
	monmaptool: add --enable-all-features option
	qa/tasks/ceph: only use monmaptool --addv if addr has [,:v]
	qa/tasks/ceph_manager: make get_mon_status use mon addr
	qa/tasks/ceph: keep mon addrs in ctx namespace
	mon/OSDMonitor: log all osd addrs on boot
	msg/simple: behave when v2 and v1 addrs are present at target
	mon/MonClient: warn if global_id changes
	msg/Connection: add warning/note on get_peer_global_id
	mds/MDSDaemon: clean up handle_mds_map debug output a bit
	qa/suites/rados/upgrade: debug mds
	mds/MDSRank: improve is_stale_message to handle addrvecs
	msg/async: make loopback detect when sending to one of our many addrs
	qa/suites/rados/upgrade: no aggressive pg num changes
	mon/OSDMonitor: require nautilus mons for require_osd_release=nautilus
	mon/OSDMonitor: require mimic mons for require_osd_release=mimic
	qa/suites/rados/thrash-old-clients: use legacy addr syntax in ceph.conf
	msg/async: preserve peer features when replacing a connection
	qa/tasks/ceph.py: move methods from teuthology.git into ceph.py directly; support mon bind * options
	mon/MonMap: adjust build_initial behavior for mkfs vs probe
	mon/MonMap: improve ambiguous addr behavior
	qa/suites/rados/upgrade: spread mons a bit
	qa/rados/thrash-old-clients: keep mons on separate hosts
	qa/standalone/mon/misc.sh: tweak test to be more robust
	qa/tasks/mon_seesaw: expect v1/v2 prefix in addr
	osd/OSDMap: fix is_blacklisted() check to assume type ANY
	mon/OSDMonitor: use ANY addr type for blacklisting
	mon/msg_types: TYPE_V1ORV2 -> TYPE_ANY
	qa/workunits/cephtool: fix blacklist test
	qa/suites/upgrade: install old version with only v1 addrs
	common/options: by default, bind to both msgr v1 and v2 addresses
	vstart.sh: add --msgr1, --msgr2, --msgr21 options
	msg/async/ProtocolV2: be flexible with server identity check
	msg/msg_types: fix entity_addrvec_t::parse() with null end arg
	qa/suites/rados/basic/msgr: no msgr2 addrs in initial monmaps
	qa/tasks/ceph: add 'mon_bind_addrvec' and 'mon_bind_msgr2' options
	monmaptool: add --addv argument to pass in addrvec directly
	qa/suites/rados/basic/msgr: do not use msgr2 with simplemessenger
	qa/suites/rados/basic/msgr: async is not experimental
	messages/MOSDBoot: fix compat with pre-nautilus
	mon/MonMap: allow v1 or v2 to be explicitly specified along with part
	msg/msg_types: allow parsing of IPs without assuming v1 vs v2
	msg/msg_types: default parse to v2 addrs
	msg: standarize on v1: and v2: prefixes for *all* entity_addr_t's
	vstart.sh: use msgr2 by default
	mon/MonMap: remove get_addr() methods
	ceph-mon: adjust startup/bind/join sequence to use addrs
	mon: use MonMap::get_addrs() (instead of get_addr())
	mon/MonClient: change pending_cons to addrvec-based map
	mon/MonMap: fix set_addr() caller, kill wrapper
	mon/MonMap: remove addr-based add()
	monmaptool: fix --add to do either legacy or msgr2+legacy
	monmaptool: clean up iterator use a bit
	mon/MonMap: handle ambiguous mon addrs by trying both legacy and msgr
	mon/MonMap: take addrvec for set_initial_members
	mon/MonMap: use addrvecs for test instances
	mon: pass addrvec via MMonJoin
	mon/MonmapMonitor: fix 'mon add' to populate addrvec
	mon/MonMap: addr -> addrvec
	msg/async/ProtocolV2: only update socket_addr if we learned our addr
	osd: go active even if mon only accepted our v1 addr
	test/msgr: add test for msgr2 protocol
	msg/async/ProtocolV2: share socket_addr and all addrs during handshake
	msg/async: print socket_addr for the connection
	msg/async: msgr2 protocol placeholder
	msg/async: move ProtocolV1 class to its own source file
	msg/async: keep listen addr in ServerSocket, pass to new connections
	msg/async/AsyncMessenger: fix set_addr_unknowns

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-01-04 13:42:09 -06:00
Sage Weil
16980bd12f qa/suites/rados: replace mon_seesaw.py task with a small bash script
The teuthology test did not like the change to remove 'mon addr' from
ceph.conf.  The standalone script is easier to test.

Note that it avoids mon names 'a', 'b', 'c' since the MonMap::build_initial
uses those.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
Sage Weil
b92be2ca9b qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
Sage Weil
7559a47f5b qa/standalone/mon/misc.sh: tweak test to be more robust
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
David Zafman
554ea73cb5 test: Disable duplicate request command test during scrub testing
Scrub testing requires an orderly control of scrubbing.  Most but not
all the time, the duplicate scrub request is ignored because the first
request hasn't finished.  Teuthology enables this environment variable
in the workunit handling.

Fixes: https://tracker.ceph.com/issues/36525

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-21 18:28:23 -08:00
David Zafman
975dbc5841 test: Minor improvement to create_ec_pool()
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-10 20:16:01 -08:00
Igor Fedotov
79fd227639 qa: replace raw_bytes_used field access in QA test cases
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-12-06 18:54:21 +03:00
Igor Fedotov
d07c10dfc0 os/bluestore: add main device expand capability.
One can do that via ceph-bluestore-tool's bluefs-bdev-expand command

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-11-29 12:48:20 +03:00
David Zafman
1841928e28 test: Add test for requested scrub priority
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-14 23:57:20 -08:00
Josh Durgin
fd2a4c5733
Merge pull request #22476 from dzafman/wip-23875
Removal of snapshot with corrupt replica crashes osd

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-11-09 15:15:01 -08:00
David Zafman
a159f162c5 test: osd-scrub-snaps.sh: After snapshot removal wait for snaptrim to complete
Due to deliberate corruptions snaptrim_error means snaptrim is done

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-08 14:48:20 -08:00
David Zafman
e37f95ac27 test: osd-scrub-snaps.sh: Testing with new --rmtype in ceph-objectstore-tool
Use --rmtype snapmap with new obj16 to remove snapmap only, check for repair message
Use --rmtype nosnapmap to remove obj5 while leaving snapmap behind

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-08 14:48:20 -08:00
David Zafman
f43faf4ad7 test: cleanup: Remove redundant cat of log and handle errors in create_scenario()
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-08 14:48:19 -08:00
Sage Weil
c8a8dc21fd Merge PR #24828 into master
* refs/pull/24828/head:
	qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
	qa/osd-bluefs-volume-ops: reduce space usage for the test case

Reviewed-by: David Zafman <dzafman@redhat.com>
2018-11-08 16:26:52 -06:00
Sage Weil
5b9be42bf5 Merge PR #15047 into master
* refs/pull/15047/head:
	tool/ceph_objectstore_tool: add new op that reset last_complete to last_update

Reviewed-by: Sage Weil <sage@redhat.com>
2018-11-06 10:47:18 -06:00
Sage Weil
9ab9dcfc0d Merge PR #24809 into master
* refs/pull/24809/head:
	os/bluestore: omit redundant '/' in OSD path for ceph-bluestore-tool if
	os/bluestore: improve error handling for migrate ops in
	qa/standtalone/osd-bluefs-volume-ops: remove redundant code.

Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-30 15:09:45 -05:00
Igor Fedotov
f5520ea304 qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:16 +03:00
Igor Fedotov
80e67abdfd qa/osd-bluefs-volume-ops: reduce space usage for the test case
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:15 +03:00
Sage Weil
c40685ebdd Merge PR #24787 into master
* refs/pull/24787/head:
	Merge PR #24796 into nautilus
	osd: fix heartbeat_reset unlock
	Merge PR #24780 into nautilus
	Merge PR #24761 into nautilus
	Merge PR #24651 into nautilus
	osd: fix race between op_wq and context_queue
	test: Make sure kill_daemons failure will be easy to find
	test: Add flush_pg_stats to make test more deterministic
2018-10-29 08:36:34 -05:00
Igor Fedotov
5d38f8b49b qa/standtalone/osd-bluefs-volume-ops: remove redundant code.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-29 16:30:36 +03:00
Xie Xingguo
e6f9241aeb
Merge pull request #24657 from xiexingguo/wip-rm-device-class-fix
mon/OSDMonitor: two "ceph osd crush class rm" fixes

Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-27 09:49:57 +08:00
xie xingguo
5bcac35213 mon/OSDMonitor: do not remove device class still referenced by ec-profiles
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-10-23 21:17:56 +08:00
xie xingguo
4bc54587a1 mon/OSDMonitor: make "ceph osd crush class rm" idempotent
Removing a non-existent device class should be generally okay.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-10-23 21:17:56 +08:00
Yan Jun
1e98c72dfc mon: drop repeated 'goodchars' and add osd crush ls testcase
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2018-10-23 16:32:45 +08:00
Kefu Chai
4af71e7c00
Merge pull request #23103 from ifed01/wip-ifed-bluefs-migrate
os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-22 22:33:08 +08:00
liuchang0812
7c008d279e tool/ceph_objectstore_tool: add new op that reset last_complete to last_update
Fixes: http://tracker.ceph.com/issues/19382

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>
2018-10-22 11:03:06 +08:00
David Zafman
da3c556aa2 test: Make sure kill_daemons failure will be easy to find
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
David Zafman
b33edbc4f6 test: Add flush_pg_stats to make test more deterministic
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
Igor Fedotov
02b5768a4f tests: add qa test case for bluefs volume coalescence
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-17 22:39:27 +03:00
Sage Weil
54d539d79a Merge PR #24603 into master
* refs/pull/24603/head:
	crush: get "ceph osd crush class create/rm" back

Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-17 10:06:26 -05:00
xie xingguo
d7ff33e9fd crush: get "ceph osd crush class create/rm" back
This reverts a27fd9d25c and
b863883ca7.

Quote form Sébastien Han:
> IIRC at some point, we were able to create a device class from the CLI.
Now it seems that the device class gets created when at least one OSD
of a particular class starts.
In ceph-ansible, we create pools after the initial monitors are up and
we want to assign a device crush class on some of them.
That's not possible at the moment since there no device class available yet.
Also, someone might want to create its own device class.
Something as crazy as running Filestore with a tmpfs osd store and
might want to isolate them.
I know it's a very limited use case, but still, it could be desired.

See also https://www.spinics.net/lists/ceph-devel/msg41152.html

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-10-16 08:45:49 +08:00
huanwen ren
f1219d716d qa/osd: fixup osd-rep-recov-eio.sh fails to parse pg dump
Fixes: http://tracker.ceph.com/issues/36418
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-10-16 02:18:22 +08:00
John Spray
67d147c00d
Merge pull request #23622 from renhwztetecs/renhw-wip-25103
mgr: fixup pgs show in unknown state

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
2018-10-10 13:28:33 +01:00
huanwen ren
ed442447c0 qa: modify the format for add pgmap_ready.
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-09-27 23:22:50 +08:00
Sage Weil
9bf7c810a7 Merge PR #23985 into master
* refs/pull/23985/head:
	ceph-objectstore-tool: add back pool dne check
	qa/suites/rados/singleton/reg11184: remove old test
	ceph-objectstore-tool: import pg at original epoch
	osd: handle null pg slot on startup
	ceph-objectstore-tool: drop support for ancient export files
	osd: avoid dropping osd_lock when pg osdmaps are not laggy
	qa/standalone/osd/pg-merge.sh: add merge vs pg import test

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-09-21 08:21:53 -05:00
Kefu Chai
4b0e2c8ed4 qa: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:41:42 +08:00
Sage Weil
26cb966cab ceph-objectstore-tool: import pg at original epoch
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.

- In mimic, as of 2347ecb961, we brought the
PG up to date while updating past_intervals.  (At the same time we removed
the OSD's parallel past_intervals regeneration.)

The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges.  Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.

This patch changes it so we import the PG and leave the pg_epoch matching
the import file.  The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.

We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.

Fixes: http://tracker.ceph.com/issues/35955
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-20 12:58:00 -05:00
Sage Weil
da887c82ce qa/standalone/osd/pg-merge.sh: add merge vs pg import test
- You can't import the source half a PG that's since merged.  Sorry!  We
could implement this later.
- You can import the target half, but the result will then be incomplete,
and you rely on backfill to clean it up.
- Map gaps don't affect this behavior.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-17 12:52:46 -05:00
Kefu Chai
338612ad88
Merge pull request #24088 from dzafman/wip-35982
qa/standalone: Standalone test corrections

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-17 22:35:43 +08:00
Kefu Chai
f46523e464
Merge pull request #23955 from wjwithagen/wjw-fix-ceph-helpers.sh
test: Start using GNU awk and fix archiving directory

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-17 15:44:06 +08:00
David Zafman
ef6940fbb6 test: osd-backfill-stats.sh: Fix subtests to get primary which can change
Fixes: http://tracker.ceph.com/issues/35982

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-13 13:19:23 -07:00
David Zafman
6d53e2c380 test: Fix for error message changed in ceph-objectstore-tool
Fixes: http://tracker.ceph.com/issues/35982

Caused by: 6bd682f53d

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-13 13:19:11 -07:00
David Zafman
7f83a24553
Merge pull request #24018 from dzafman/wip-35912
qa/standalone: Minor test improvements

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-12 13:15:44 -07:00
Kefu Chai
1578875194
Merge pull request #24013 from dzafman/wip-35845
test: Use a grep pattern that works across releases

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-12 23:00:39 +08:00
Kefu Chai
510d9e1345
Merge pull request #23723 from xiexingguo/wip-list-missing
osd/PrimaryLogPG: rename list_missing -> list_unfound command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-11 20:25:21 +08:00
David Zafman
6e3f04365f test: Trap termination so we can capture logs on teuthology timeout
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-10 12:23:07 -07:00
David Zafman
dc80f8585a test: Use a grep pattern that works across releases
Fixes: http://tracker.ceph.com/issues/35845

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-10 08:21:36 -07:00
Sage Weil
4d2a73c7f1 Merge PR #23845 into master
* refs/pull/23845/head:
	osd/OSDMap: include age in up and in counts for ceph status
	mon/OSDMonitor: set new_last_{up,in}_change
	osd/OSDMap: store last_up_change and last_in_change
	mgr/MgrMap: include mgr age in map printer
	mon/MgrMap: track active_changed timestamp
	mon: include mon quorum age in status
	include/utime: add utimespan_str helper

Reviewed-by: John Spray <john.spray@redhat.com>
2018-09-10 07:45:58 -05:00
Sage Weil
f47921f293 qa/standalone/osd/osd-backfill-stats: fixes
Grep from the primary's log, not every osd's log.

For the backfill_remapped task in particular, after the pg_temp change it
just so happens that the primary changes across the pool size change and
thus two different primaries do (some) backfill.  Fix that test to pass
the correct primary.

Other tests are unaffected as they do not (happen to) trigger a primary
change and already satisfied the (removed) check that only one OSD does
backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 17:11:18 -05:00