RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-02-22 18:47:18 +00:00

Author	SHA1	Message	Date
chunmei-liu	b127fa3cdd	crimson/seastore: fix assert in read_extent lba btree root leaf is empty after osd reboot, because SegmentStateTracker's states are wrong. and that is caused by tracker->do_write not finished then seastore closed. in transaction manager read_extent, can't read extent. ceph_assert(0 == "Should be impossible"); Signed-off-by: chunmei-liu <chunmei.liu@intel.com>	2021-05-31 22:59:31 -07:00
Aashish Sharma	479d738be1	test,cmake:remove run-promtool-unitests.sh script This PR intends to remove the run-promtool-unittests.sh script as CMakeLists.txt handles the promtool execution (also adding the description to run these tests in Readme.md) Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2021-06-01 11:15:27 +05:30
Aashish Sharma	dc4becfde8	mgr/dashboard: API Version changes do not apply to pre-defined methods (list, create etc.) Methods like list(), create(), get() etc doesn't get applied the version.Also for the endpoints that get the version changed, the docs and the request header has still the version v1.0+ in them. So with the version reduced it gives 415 error when trying to make the request. This PR fixes this issue. Fixes: https://tracker.ceph.com/issues/50855 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2021-06-01 10:39:24 +05:30
Kefu Chai	2f1dd0ce9f	pybind/mgr/selftest: add "mgr self-test eval" command and a simple REPL client allowing developer to peek and poke the selftest module. if this turns out to be useful, we can promote this method into a dedicated mix-in class, so other module can use it if developer wants to test it manually. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-06-01 11:03:10 +08:00
Mykola Golub	109f0b3c05	Merge pull request #41514 from ideepika/wip-49592-upgrade qa/upgrade: conditionally disable update_features tests Reviewed-by: Kefu Chai <kchai@redhat.com> Reviewed-by: Mykola Golub <mgolub@suse.com>	2021-05-31 19:34:53 +03:00
Sage Weil	f4585775ca	doc/foundation: remove amihan Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-31 11:26:01 -05:00
Ernesto Puerta	957c9c304b	mgr/dashboard: pass Grafana datasource in URL PR https://github.com/ceph/ceph/pull/24314 added support for specifying the Grafana datasource via $datasource template variable, but this hadn't been used from the Dashboard side so far. As per https://grafana.com/docs/grafana/latest/variables/#templates, by adding `var-datasource=Dashboard1`, Dashboard can specify the datasource. Fixes: https://tracker.ceph.com/issues/51026 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>	2021-05-31 16:19:44 +02:00
Kefu Chai	1df55c2378	Merge pull request #41589 from tchaikov/wip-crimson-start-up-error crimson: handle startup failures properly Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2021-05-31 20:07:33 +08:00
Kefu Chai	703545c595	crimson/os/alienstore: do not cleanup if not started there is chance stop() and umount() methods get called even if start() is not called in the error handling path. in that case, just make these methods no-op. to ensure that OSD behaves in that case. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-31 20:06:40 +08:00
Kefu Chai	cc59c82483	crimson/os/alienstore: create tp in AlienStore::start() thread pool is not needed until AlienStore::start(). with this change, we are able to tell if the AlienStore is actually started or not in AlienStore::stop(). as seastar::sharded<Service> start a service in two phases: 1. construct the shard instances 2. actually start them and it stops a service in a single shot, which both stops the services and destructs the service instance(s). so we have to implement a proper stop() method for services whose start() might not be called after its instance is created by seastar::sharded<Service>::start() in case of error handling or if we just don't want to call start(). to ensure we can skip the steps to clean up the stuff created by start(), we need to have a flag in the sharded service, because AlienStore is a member variable of OSD, and when we do mkfs, AlienStore is not start()'ed, and as explained above, we have to call OSD::stop() to ensure OSD instance is destructed properly. but OSD::stop() calls store->umount() and store->stop() unconditionally. these methods in AlienStore rely on a functional thread pool. fortunately, we don't need to call these methods if the store is never mounted or started. in a case of failed "mkfs", store is not mounted at all but the store and osd instances are created. so, in this change, thread pool is created in AlienStore::start(), and we will use it to tell if AlienStore is started or not in the following change which makes the related method no-op if AlienStore is not started yet. also, postpone the creation of `store` until in AlienStore::start(), so we don't need to destroy it in the dtor of AlienStore. otherwise, BlueStore::~BlueStore() would need to reference resources which are only available in alien threads, but when OSD::~OSD() is called, we are in seastar's reactor. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-31 20:06:40 +08:00
Kefu Chai	d4671c2ff9	crimson/osd/main: always stop osd as long as it started otherwise the sharded_service's dtor complains if we destruct it without stopping it first, like: FATAL: startup failed: std::system_error (error crimson::net:3, negotiation failure) crimson-osd: ../src/seastar/include/seastar/core/sharded.hh:523: seastar::sharded<T>::~sharded() [with Service = crimson::osd::OSD]: Assertion `_instances.empty()' failed. Aborting on shard 0. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-31 20:06:40 +08:00
Kefu Chai	37b83f4ed7	crimson/osd/main: do cleanup using defer() since we do the startup in a seastar thread, we have the luxury of doing cleanup using the RAII machinery. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-31 20:06:22 +08:00
Kefu Chai	a6314f1542	crimson/osd/main: catch exception thrown in the async() call * use seastar::app_template::run() instead of seastar::app_template::run_deprecated() for returning int, instead of returning `void`. so the application can return int explicitly in the continuation passed to run(). more readable this way. * wrap the all the block in run() in a giant try-catch block, so the exceptions thrown by the startup code can be captured and handled. * do not capture the exceptions individually, in the try-catch block anymore. the outer catch block takes care of them. this change improves the error handling when crimson-osd launches. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-31 20:05:52 +08:00
Misono Tomohiro	571a9e6d53	vstart: update podman detection Since it is possible there is no podman process running when launching vstart, use 'command -v' instead of 'pgrep -f'. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>	2021-05-31 21:00:10 +09:00
Deepika	9c0b239d70	qa/upgrade: conditionally disable update_features tests with the recent support for async rbd operations from pacific+ when an older client(non async support) goes on upgrade, and simultaneously interacts with a newer client which expects the requests to be async, experiences hang; considering the return code for request completion to be acknowledgement for async request, which then keeps waiting for another acknowledgement of request completion. this if happens should be a rare only when lockowner is an old client and should be deferred if compatibility issues arises. see also: 541230475d3b25ab18c4eb9bc5011060462594a6(octopus) Signed-off-by: Deepika <dupadhya@redhat.com>	2021-05-31 16:46:31 +05:30
Ilya Dryomov	16d9a68a3e	librbd: don't stop at the first unremovable image when purging As there is no inherent ordering, there may be multiple removable images past the unremovable image. On top of that, removing a clone may make its parent removable so perform an additional pass if any image gets removed. Fixes: https://tracker.ceph.com/issues/51021 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2021-05-31 11:44:47 +02:00
Ilya Dryomov	0bcb910217	rbd: combined error message for expected Trash::purge() errors Output to stderr instead of the log where regular users wouldn't see it given the elevated log level. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2021-05-31 11:44:47 +02:00
Zac Dover	c02fb2b25b	doc/cephadm: enriching "Service Specification" This PR adds parallel construction to the "Service Specification" section of the "Service Managment" chapter of the cephadm documentation. Signed-off-by: Zac Dover <zac.dover@gmail.com>	2021-05-31 14:15:56 +10:00
Zac Dover	80dcbc8019	doc/cephadm: enriching "daemon status" This PR creates parallel structure for the text in the "Daemon Status" section of the cephadm Service Management chapter. Signed-off-by: Zac Dover <zac.dover@gmail.com>	2021-05-31 13:55:20 +10:00
Kefu Chai	54d02f098a	Merge pull request #41552 from tchaikov/wip-mgr-find-roots mgr: expose CRUSHMap.find_roots() Reviewed-by: Avan Thakkar <athakkar@redhat.com>	2021-05-31 09:40:50 +08:00
J. Eric Ivancich	0cebfae56b	Merge pull request #41563 from cybozu/rgw-add-the-description-of-blocking-io-during-index-resharding rgw: add the description of blocking io during index resharding Reviewed-by: Matt Benjamin mbenjamin@redhat.com Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>	2021-05-29 12:18:45 -04:00
Kefu Chai	6eba570dbc	crimson/osd/main: handle and rethrow exception in fetch_config() print more verbose error message when monc fails to connect to moitor. for better user experience. also, unregister all dispatchers by calling msgr->stop() before calling monc.stop() to ensure the messenger can be shutdown gracefully. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-29 16:48:44 +08:00
Kefu Chai	3681c6c716	test/crimson/test_messenger: add editor variables in header to help emacs and vim to format the code better. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-29 16:48:44 +08:00
Kefu Chai	f7d18aa835	crimson/osd/main: do cleanup using defer() in fetch_config() so we can stop the started services even if some of the step(s) throw or fail. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-29 16:48:44 +08:00
Kefu Chai	57c1277c64	vstart.sh: remove unused variable osdmap_fn is not used after being initialized, so drop it. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-29 16:48:44 +08:00
Igor Fedotov	f4d1ef9a95	test/allocator_replay_test: make allocator type configurable Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2021-05-29 08:33:02 +03:00
Kefu Chai	2ecb738e2d	Merge pull request #41278 from sebastian-philipp/mgr-cephadm-set-user-no-hosts mgr/cephadm: Don't call _check_host without hosts Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com> Reviewed-by: Adam King <adking@redhat.com>	2021-05-29 10:42:14 +08:00
Kefu Chai	2a35c562a1	Merge pull request #41520 from tchaikov/wip-osd-unique-ptr os: let ObjectStore::create() return unique_ptr<> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2021-05-29 10:37:31 +08:00
Kefu Chai	2ba0f48bd1	Merge pull request #41573 from tchaikov/wip-allocat-ctor os/bluestore: pass string_view to ctor of Allocator Reviewed-by: Igor Fedotov <ifedotov@suse.com>	2021-05-29 10:36:43 +08:00
Ilya Dryomov	d0dd4b75d3	rbd: propagate Trash::purge() result Exit with respective status like other commands do. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2021-05-28 22:19:54 +02:00
Kefu Chai	0331281e8a	Merge pull request #41582 from cyx1231st/wip-seastore-swap-read-extent crimson/seastore: introduce and adopt LBAManager::get_mapping(t, offset) Reviewed-by: Kefu Chai <kchai@redhat.com>	2021-05-28 15:35:01 +08:00
Yingxin Cheng	88a41c3922	crimson/seastore: adopt get_mapping(t, offset) interface Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>	2021-05-28 15:05:53 +08:00
Yingxin Cheng	c165a289e6	crimson/seastore: implement and test get_mapping(t, laddr) Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>	2021-05-28 15:05:44 +08:00
Yingxin Cheng	6f4b296056	crimson/seastore: add stub to introduce get_mapping() without length Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>	2021-05-28 10:30:42 +08:00
Kefu Chai	596ae330d9	Merge pull request #41578 from rzarzynski/wip-crimson-monc-auth-req crimson/monc: handle_auth_request() doesn't depend on active_con. Reviewed-by: Kefu Chai <kchai@redhat.com>	2021-05-28 08:09:07 +08:00
Kefu Chai	9091261749	Merge pull request #41544 from tchaikov/wip-doc-confval doc/mgr: use confval directive to define options Reviewed-by: Neha Ojha <nojha@redhat.com>	2021-05-28 07:59:34 +08:00
Kefu Chai	dfdcf2cf92	doc/mgr: use confval directive to define options less repeating this way Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-28 07:44:44 +08:00
Yuri Weinstein	e1f273928d	Merge pull request #41540 from ceph/wip-15213 doc: 15.2.13 Release Notes Reviewed-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Casey Bodley <cbodley@redhat.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com> Reviewed-by: Ramana Raja <rraja@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2021-05-27 16:40:41 -07:00
Sage Weil	1f30c0114d	Merge PR #41483 into master * refs/pull/41483/head: cephadm: stop passing --no-hosts to podman mgr/nfs: use host.addr for backend IP where possible mgr/cephadm: convert host addr if non-IP to IP mgr/dashboard,prometheus: new method of getting mgr IP doc/cephadm: remove any reference to the use of DNS or /etc/hosts mgr/cephadm: use known host addr mgr/cephadm: resolve IP at 'orch host add' time Reviewed-by: Sebastian Wagner <swagner@suse.com>	2021-05-27 19:14:53 -04:00
zdover23	fe258aad49	Merge pull request #41561 from zdover23/wip-doc-cephadm-s-mgmt-service-status-improvement-2021-05-26 doc/cephadm: enrich "service status" Reviewed-by: Sebastian Wagner <sewagner@redhat.com>	2021-05-28 07:41:40 +10:00
Sage Weil	d1bb94ba4c	cephadm: stop passing --no-hosts to podman This reverts `cfc1f914ce`, which is no longer neceesary because (1) we don't use socket.getfqdn(), and (2) we generally do not rely on DNS or /etc/hosts at all anymore (with the exception of the upgrade transition). Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-27 12:00:20 -04:00
Sage Weil	7e9f4ac7a1	mgr/nfs: use host.addr for backend IP where possible Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-27 12:00:20 -04:00
Sage Weil	781bfa14ff	mgr/cephadm: convert host addr if non-IP to IP Previously we allowed the host.addr to be a DNS name (short or fqdn). This is problematic because of the inconsistent way that docker and podman handle /etc/hosts, and undesirable because relying on external DNS is an external source of failure for the cluster without any benefit in return (simply updating DNS is not sufficient to make ceph behave). So: update any non-IP to an IP as soon as we start up (presumably on upgrade). If we get a loopback address (127.0.0.1 or 127.0.1.1), then wait and hope that the next instance of the manager has better luck. Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-27 12:00:20 -04:00
Sage Weil	157a7b4183	mgr/dashboard,prometheus: new method of getting mgr IP - Use a centralized method get_mgr_ip() - Look up the hostname via DNS. This is a bit more reliable than getfqdn() since it will work even when podman adds the container name to /etc/hosts. Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-27 12:00:20 -04:00
Sage Weil	872668a9b3	doc/cephadm: remove any reference to the use of DNS or /etc/hosts Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-27 12:00:20 -04:00
Sage Weil	900880050a	mgr/cephadm: use known host addr If the host IP/addr is known, use that. The addr might even be a FQDN instead of an IP address, in which case we want to look that up instead of the bare hostname. Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-27 12:00:20 -04:00
Radoslaw Zarzynski	d328cbdfe2	crimson/monc: handle_auth_request() doesn't depend on active_con. Following crash occured at Sepia [1]: ``` INFO 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] ProtocolV2::start_accept(): targ et_addr=172.21.15.119:55220/0 DEBUG 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] TRIGGER ACCEPTING, was NONE DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] SEND(26) banner: len_payload=16, supported=1, required=0, banner="ceph v2 " DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(10) banner: "ceph v2 " DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT banner: payload_len=16 DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(16) banner features: supported=1 required=0 DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] WRITE HelloFrame: my_type=osd, peer_addr=172.21.15.119:55220/0 DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT HelloFrame: my_type=client peer_addr=v2:172.21.15.119:6803/31733 INFO 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false) DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection' Segmentation fault on shard 0. Backtrace: 0# 0x000055E84CF44C1F in ceph-osd 1# FatalSignal::signaled(int, siginfo_t const) in ceph-osd 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t, void)#1}::_FUN(int, siginfo_t, void) in ceph-osd 3# 0x00007F2BC88C0B20 in /lib64/libpthread.so.0 4# crimson::mon::Connection::get_conn() in ceph-osd 5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list) in ceph-osd 6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd 7# 0x000055E84DF67669 in ceph-osd 8# 0x000055E84DF68775 in ceph-osd 9# 0x000055E846F47F60 in ceph-osd 10# 0x000055E85296770F in ceph-osd 11# 0x000055E85296CC50 in ceph-osd 12# 0x000055E852B1ECBB in ceph-osd 13# 0x000055E85267C73A in ceph-osd 14# main in ceph-osd 15# __libc_start_main in /lib64/libc.so.6 16# _start in ceph-osd Fault at location: 0x98 ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136907 When the `handle_auth_request()` happens, there is no guarantee `active_con` is being available. This is reflected in the classical implementation: ```cpp int MonClient::handle_auth_request( Connection con, // ... ceph::buffer::list reply) { // ... bool isvalid = ah->verify_authorizer( cct, *rotating_secrets, payload, auth_meta->get_connection_secret_length(), reply, &con->peer_name, &con->peer_global_id, &con->peer_caps_info, &auth_meta->session_key, &auth_meta->connection_secret, ac); ``` The patch transplate the same logic to crimson. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2021-05-27 15:47:45 +00:00
Kefu Chai	258ffd289a	os/bluestore: pass string_view to ctor of Allocator just for the sake of correctness, as they don't need a full-blown std::string, what they need is but a string like object. and they always create a std::string instance as a member variable if they want to have a copy of it. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-27 23:37:08 +08:00
Kefu Chai	d5445b8f11	tools/ceph_objectstore_tool: destruct ObjectStore using unique_ptr<> before this change, cot never destructs the created ObjectStore instances. after this change, they are destructed upon returning from main(). Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-27 23:14:44 +08:00
Kefu Chai	b04b2f4d2a	osd: pass unique_ptr<ObjectStore> to ctor of OSD less error-prone, and it's simpler to manage the resource using RAII Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-05-27 23:07:10 +08:00

1 2 3 4 5 ...

123682 Commits