Commit Graph

124397 Commits

Author SHA1 Message Date
Ramana Raja
e031a59691 doc/cephfs/nfs: update recommendation for versions
... of Ceph and NFS-Ganesha packages.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2021-06-21 22:59:58 -04:00
Radoslaw Zarzynski
c862be649a crimson/osd: introduce more asserts to the Watch timeout handling.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-06-21 23:58:36 +00:00
Patrick Donnelly
f27ec02a61
Merge PR #41860 into master
* refs/pull/41860/head:
	qa: log messages when falling back to force/lazy umount

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2021-06-21 16:54:59 -07:00
Radoslaw Zarzynski
9e002f7d05 crimson/osd: fix construction of InternalClientRequest in DEBUG builds.
The assert in the ctor of `InternalClientRequest` actually operates on
the ctor's argument we `std::moved` from, not on the class' member.
When a debug build is used, this translates into failures like the one
below:

```
2021-06-16T22:53:03.410 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:02 smithi170 conmon[43770]: ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-
build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4987-gec8844b6/rpm/el8/BUILD/ceph-17.0.0-4987-gec8844b6
/src/crimson/osd/osd_operations/internal_client_request.cc:19: crimson::osd::InternalClientRequest::InternalClientRequest(Ref<crimson::osd::PG>): Assertion `bool(pg)' f
ailed.
2021-06-16T22:53:05.363 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  0# 0x0000558BE7BBF68F in /usr/bin/ceph-osd
2021-06-16T22:53:05.363 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  1# FatalSignal::signaled(int, siginfo_t const*) in /usr/bi
n/ceph-osd
2021-06-16T22:53:05.363 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  2# FatalSignal::install_oneshot_signal_handler<6>()::{lamb
da(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in /usr/bin/ceph-osd
2021-06-16T22:53:05.364 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  3# 0x00007F8AD7535B20 in /lib64/libpthread.so.0
2021-06-16T22:53:05.364 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  4# gsignal in /lib64/libc.so.6
2021-06-16T22:53:05.364 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  5# abort in /lib64/libc.so.6
2021-06-16T22:53:05.364 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  6# 0x00007F8AD5B2EC89 in /lib64/libc.so.6
2021-06-16T22:53:05.365 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  7# 0x00007F8AD5B3CA76 in /lib64/libc.so.6
2021-06-16T22:53:05.365 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  8# crimson::osd::InternalClientRequest::InternalClientRequ
est(boost::intrusive_ptr<crimson::osd::PG>) in /usr/bin/ceph-osd
2021-06-16T22:53:05.365 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]:  9# crimson::osd::Watch::do_watch_timeout(boost::intrusive_ptr<crimson::osd::PG>) in /usr/bin/ceph-osd
2021-06-16T22:53:05.365 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 10# seastar::noncopyable_function<void ()>::direct_vtable_for<crimson::osd::Watch::Watch(crimson::osd::Watch::private_ctag_t, boost::intrusive_ptr<crimson::osd::ObjectContext>, watch_info_t const&, entity_name_t const&, boost::intrusive_ptr<crimson::osd::PG>)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) in /usr/bin/ceph-osd
2021-06-16T22:53:05.366 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 11# 0x0000558BED653759 in /usr/bin/ceph-osd
2021-06-16T22:53:05.366 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 12# 0x0000558BED61B148 in /usr/bin/ceph-osd
2021-06-16T22:53:05.366 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 13# 0x0000558BED61B576 in /usr/bin/ceph-osd
2021-06-16T22:53:05.366 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 14# 0x0000558BED7C93C9 in /usr/bin/ceph-osd
2021-06-16T22:53:05.367 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 15# 0x0000558BED326D5A in /usr/bin/ceph-osd
2021-06-16T22:53:05.367 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 16# 0x0000558BED330E7E in /usr/bin/ceph-osd
2021-06-16T22:53:05.367 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 17# main in /usr/bin/ceph-osd
2021-06-16T22:53:05.367 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 18# __libc_start_main in /lib64/libc.so.6
2021-06-16T22:53:05.368 INFO:journalctl@ceph.osd.6.smithi170.stdout:Jun 16 22:53:05 smithi170 conmon[43770]: 19# _start in /usr/bin/ceph-osd
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-06-21 23:54:53 +00:00
Radoslaw Zarzynski
809c5d10a3 crimson/os: synchronize producers with consumers in AlienStore's queues.
Some time ago we replaced the single, `boost::lockfree`-based queue
in `ThreadPool` with the in-house, lockish `ShardedWorkQueue` vector.
Unfortunately, pushing into such queue isn't synchronized with
consuming from it -- the former happens without locking the `mutex`.
As the underlying primitive behind `ShardedWorkQueue::pending` is
plain `std::deque`, it's unsafe to operate that way in multi-thread
environment. Indeed, weirdly looking crashes have been spotted at Sepia:

```
(virtualenv) rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-06-21_14:49:36-rados-master-distro-basic-smithi/6182668$ less ./remote/smithi196/log/ceph-osd.7.log.gz
...
 0# 0x000055862FD67ADF in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
 4# 0x00005586357540E4 in ceph-osd
 5# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
 6# pthread_cond_timedwait in /lib64/libpthread.so.0
 7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
 8# 0x00005586313E303B in ceph-osd
 9# 0x00007FB22CC51BA3 in /lib64/libstdc++.so.6
10# 0x00007FB22CF2C14A in /lib64/libpthread.so.0
11# clone in /lib64/libc.so.6
Fault at location: 0x18
daemon-helper: command crashed with signal 11
```

This fix introduces the synchronization to the `push_back()` method of
`ShardedWorkQueue`. The side effect is that it may stall the reactor.
Therefore, a follow-up change that switches to e.g. `boost::lockfree`
is expected.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-06-21 23:51:38 +00:00
Sage Weil
324a5ff589 vstart.sh: fix docker url
Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-21 15:01:29 -04:00
Yuval Lifshitz
201942f2d4 rgw/notification: make notifications agnostic of bucket reshard
Fixes: https://tracker.ceph.com/issues/51293

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
2021-06-21 20:36:47 +03:00
Yuval Lifshitz
17cc2a4afc rgw/notification: send correct size in COPY events
Fixes: https://tracker.ceph.com/issues/51305

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
2021-06-21 19:36:57 +03:00
Kefu Chai
5a26875049 os/bluestore/AvlAllocator: introduce bluestore_avl_alloc_ff_max_search_bytes
so AvlAllocator can switch from the first-first mode to best-fit mode
without walking through the whole space map tree. in the
highly-fragmented system, iterating the whole tree could hurt the
performance of fast storage system a lot.

the idea comes from openzfs's metaslab allocator.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-21 22:10:27 +08:00
Kefu Chai
40f05b971f os/bluestore/AvlAllocator: introduce bluestore_avl_alloc_ff_max_search_count
so AvlAllocator can switch from the first-first mode to best-fit mode
without walking through the whole space map tree. in the
highly-fragmented system, iterating the whole tree could hurt the
performance of fast storage system a lot.

the idea comes from openzfs's metaslab allocator.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-21 22:10:26 +08:00
Kefu Chai
87ab4d7afa
Merge pull request #41941 from tchaikov/wip-crimson-errorator-loop
crimson/common: extract parallel_for_each_state out

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-06-21 20:46:41 +08:00
Kefu Chai
596e1ca370
Merge pull request #41949 from tchaikov/wip-crimson-prometheus
crimson/osd: expose metrics using http server

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-06-21 16:36:20 +08:00
Kefu Chai
80961c27d1 crimson/osd: expose metrics using http server
so, we can query the metrics using HTTP API, like

http://localhost:9180/metrics?name=io*

or

http://192.168.2.8:9180/metrics?name=io_queue_delay

or

http://localhost:9180/metrics

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-21 15:41:04 +08:00
Kefu Chai
175bb4d724
Merge pull request #41934 from cyx1231st/wip-seastore-onode-logs
crimson/onode-staged-tree: improve logs to understand inconsistent load from seastore

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Xuehan Xu <xuxuehan@360.cn>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-21 12:34:02 +08:00
Yingxin Cheng
6673bf88a4 crimson/onode-staged-tree: print NodeExtent with the header
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-06-21 10:08:46 +08:00
Yingxin Cheng
860ddba0f0 crimson/onode-staged-tree: validate node header when load
Add logs to detect corruptions when load nodes. assert() is not
informative enough to understand the context.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-06-21 10:08:44 +08:00
Yingxin Cheng
366efc403e crimson/onode-staged-tree: delete copy constructor of DummyNodeExtent
Dummy backend is used for unit tests without transactions, so there
should be no copy-on-write behavior.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-06-21 10:05:28 +08:00
Yingxin Cheng
512dac2c6e crimson/onode-staged-tree: add trace logs when start to load nodes
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-06-21 10:05:20 +08:00
Amnon Hanuhov
ac7ab31ef6
Merge pull request #41861 from AmnonHanuhov/wip-Refactor_crimson_internals
crimson/net: Complete the refactor to std::unique_ptr inside Messenger

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-20 21:30:36 +03:00
Chen Fan
0ffadad3a8 osd/OSD: mkfs need wait for transcation completely finish
when do ceph-osd mkfs, when ceph-osd process exit, sometimes
the block data could be written incompletely. we need add
wait for it complete.

Signed-off-by: Chen Fan <fan.chen@easystack.cn>
2021-06-21 00:11:20 +08:00
Aaryan Porwal
ad5b3f2005 mgr/dashboard: telemetry activate: show ident fields when checked
Signed-off-by: Aaryan Porwal <aaryanporwal2233@gmail.com>
2021-06-20 12:17:03 +05:30
Kefu Chai
230a1c4113
Merge pull request #41921 from gregsfortytwo/wip-mon-stretch-crush-rule
mon: Sanely set the default CRUSH rule when creating pools in stretch…

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-06-19 22:57:07 +08:00
Amnon Hanuhov
bb71ebbb19 tools/crimson: Use crimson::make_message() in perf_crimson_msgr
Instead of ceph::make_message() because conn::send() in crimson expects
a std::unique_ptr and not boost::intrusive_ptr

Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
2021-06-19 17:56:13 +03:00
Kefu Chai
6733d767dd
Merge pull request #41845 from agayev/zoned-revise-per-zone-naming-scheme
os/bluestore: Revise the naming scheme for per-zone cleaning informat…

Reviewed-by: Igor Fedotov <ifedotov@suse,com>
2021-06-19 22:54:25 +08:00
Amnon Hanuhov
84265a695b test/crimson: Use crimson::make_message() in test_alien_echo
Instead of ceph::make_message() because conn::send() in crimson expects
a std::unique_ptr and not boost::intrusive_ptr

Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
2021-06-19 17:52:54 +03:00
Kefu Chai
e05f186abc
Merge pull request #41830 from tchaikov/wip-ceph-argparse-cleanup
pybind/ceph_argparse: cleanups preparing for type annotations

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-19 22:51:59 +08:00
Amnon Hanuhov
c9a96891a9 crimson/net: Use MessageURef in messenger internals
Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
2021-06-19 17:03:08 +03:00
Amnon Hanuhov
85563a617e crimson/osd: Get rid of send_to_osd() overloading
Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
2021-06-19 17:03:08 +03:00
Amnon Hanuhov
a6d7fefde3 osd: Overload send_osd_message() in PeeringState
To allow passing MessageURef from crimson-osd and MessageRef from
ceph-osd

Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
2021-06-19 17:01:43 +03:00
Amnon Hanuhov
cf6de63e89 crimson/osd: Move message to send_to_osd() in ShardServices
To avoid refcounting the underlying RefCountedObject

Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
2021-06-19 17:01:43 +03:00
Kefu Chai
fe17b2b058
Merge pull request #41923 from liewegas/fix-51234
ceph_test_librados_service: wait longer for servicemap to update

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-19 21:11:32 +08:00
Kefu Chai
94ab1c8fbe
Merge pull request #41914 from lxbsz/wip-51092
os/memstore: make the used_bytes to atomic

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-19 21:03:04 +08:00
Kefu Chai
c5ff2450a5
Merge pull request #41896 from ifed01/wip-ifed-verbose-kernel-read
blk/KernelDevice: be more verbose on read errors.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-19 21:02:18 +08:00
Kefu Chai
6654d63393 crimson/common: specialize errorator<> for future<>
otherwise it always needs a return value.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-19 19:13:19 +08:00
Kefu Chai
108eaf795f crimson/common: extract parallel_for_each_state out
if `parallel_for_each_state` is defined as a nested class in errorator,
clang fails to compile it:

../src/crimson/common/errorator.h:716:47: error: no class named 'parallel_for_each_state' in 'errorator<AllowedErrors...>'
    friend class errorator<AllowedErrors...>::parallel_for_each_state;
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

and the forward declaration does not help. so we have to extract it
out of the errorator. to speed up the compilation, it is moved into
errorator-loop.h. its name mirrors `include/seastar/core/loop.h`.

we could extract the `errorator<>::parallel_for_each()` out as well,
as its return type can be deduced from the type of Iterator and Func.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-19 19:12:38 +08:00
Kefu Chai
a234acb1d2
Merge pull request #41920 from ljflores/patch-1
doc: fixed a small typo in Perf Counters documentation

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-19 16:51:58 +08:00
Kefu Chai
87a1384b1f
Merge pull request #41925 from tchaikov/wip-fmtlib
fmt: pickup fix of link failure with clang

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
2021-06-19 16:49:31 +08:00
Patrick Donnelly
a7af60243f
Merge PR #41900 into master
* refs/pull/41900/head:
	qa: use centos latest for fs:upgrade

Reviewed-by: Rishabh Dave <ridave@redhat.com>
2021-06-18 19:54:09 -07:00
Patrick Donnelly
71cca1e9c3
Merge PR #41899 into master
* refs/pull/41899/head:
	mon/MDSMonitor: check fscid exists for legacy case

Reviewed-by: Ramana Raja <rraja@redhat.com>
2021-06-18 19:52:54 -07:00
Patrick Donnelly
af9b123bb7
Merge PR #41898 into master
* refs/pull/41898/head:
	mon/MDSMonitor: fix whitespace in debug message

Reviewed-by: Rishabh Dave <ridave@redhat.com>
2021-06-18 19:52:24 -07:00
Patrick Donnelly
0fe649f237
Merge PR #41892 into master
* refs/pull/41892/head:
	client: remove unused include from barrier.cc

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-18 19:51:55 -07:00
Patrick Donnelly
a49db812d4
Merge PR #41833 into master
* refs/pull/41833/head:
	cephfs-mirror: silence warnings when connecting via mon host

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-18 19:51:05 -07:00
Patrick Donnelly
a8974febd2
Merge PR #41723 into master
* refs/pull/41723/head:
	mds: to print the unknow type value

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2021-06-18 19:50:22 -07:00
Patrick Donnelly
efea9ecac7
Merge PR #40997 into master
* refs/pull/40997/head:
	test: add test to verify adding an active peer back to source
	pybind/mirroring: disallow adding a active peer back to source
	pybind/cephfs: interface to fetch file system id

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-18 19:49:15 -07:00
Patrick Donnelly
c4de4b3df8
Merge PR #36823 into master
* refs/pull/36823/head:
	qa : add a test for the cmd, dump cache
	mds : add timeout to the command, dump cache, to prevent it from running too long and affecting the service

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-18 19:47:53 -07:00
Laura Flores
2ae0734dba doc: fixed a small typo in Perf Counters documentation
There is a small typo in the Perf Counters documentation. Gauge was spelled incorrectly.

Signed-off-by: Laura Flores <lflores@redhat.com>
2021-06-18 18:14:45 +00:00
Ernesto Puerta
05d7f883a0
Merge pull request #40506 from p-se/pse-update-grafana-deprecated-variables
mgr/dashboard: deprecated variable usage in Grafana dashboards

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: p-se <NOT@FOUND>
2021-06-18 20:08:11 +02:00
Ernesto Puerta
94a6c7120d
Merge pull request #41808 from rhcs-dashboard/51164-show-only-days-in-bucket-details
mgr/dashboard: bucket details: show lock retention period only in days

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2021-06-18 20:07:11 +02:00
Ernesto Puerta
6676352414
Merge pull request #41758 from rhcs-dashboard/support-multiple-crush-trees
mgr/dashboard: crushmap tree doesn't display crush type other than root

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2021-06-18 20:04:16 +02:00
Kefu Chai
8538644bc3
Merge pull request #35903 from agayev/fix-deployment-guide
doc: Add a missing instruction to manual deployment guide.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-18 16:26:09 +08:00