Commit Graph

91694 Commits

Author SHA1 Message Date
alfonsomthd
85570639cb mgr/dashboard: Add TSLint rule
- TSlint no-unused-variable rule added.
- Cleanup: unused imports and variables.

Signed-off-by: Alfonso Martínez <almartin@redhat.com>
2018-10-22 19:38:38 +02:00
Lenz Grimmer
9d79acdaac
Merge pull request #24591 from tspmelo/wip-testing-module
mgr/dashboard: Unit Tests cleanup

Reviewed-by: Ricardo Marques <rimarques@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
2018-10-22 19:33:53 +02:00
Sage Weil
1a01cf4872 ceph_test_msgr: fix authorizer behavior
Fixes breakage from this PR 2152d8ffb7.

Fixes: http://tracker.ceph.com/issues/36495
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-22 10:06:06 -05:00
Kefu Chai
3d0d24a228 include/ceph_assert.h: do not pack assert params if WITH_ASAN
we pack the asset() params for smaller code size, but this creates a
inlined `assert_data_ctx` instance for every compilation unit which
call ceph_assert() defined in .h .

__PRETTY_FUNCTION__ is likely to be referenced by `assert_data_ctx`
sections which are included by different compiled object files. if the
ceph_assert() call is used by header file, then there will be multiple
`assert_data_ctx` sections sharing the same identifier. these sections are
defined as "COMDAT" group sections, i.e. common data sections. when linker
see multiple COMDAT sections with the same identifer, it will simply discard
the duplicated ones, and only keep a single copy of them. without enabling
ASan, GCC can always handle this problem just fine. but the dedup feature
does not work well with ASan. if ASan is enabled, and we link the objects
with the wrong order, some references will be pointing to the discarded
sections.

to address this issue, we could audit the link command line and inspect
all .o files to make sure they are properly ordered. but this is
non-trivial. as a workaround, in this change, the assert params are not
packed, and sent to the  __ceph_assert_fail() overrides which accepts
unpacked params directly, so the COMDAT section is not created.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 23:01:36 +08:00
Jason Dillaman
5d56014c61 qa/tasks/qemu: use unique clone directory to avoid race with workunit
If there is a workunit task associated with the same client, the two
tasks will attempt to clone the suite repo to the same directory.
Worse, if it's parallel tasks, the two clones will clobber each
other.

Fixes: http://tracker.ceph.com/issues/36542
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-10-22 10:44:40 -04:00
Josh Durgin
36ca230776
Merge pull request #24667 from liewegas/wip-ec-thrash-full
qa/suites/rados/thrash-erasure-code*/thrashers/*: less likely resv rejection injection

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-10-22 07:39:26 -07:00
Kefu Chai
4af71e7c00
Merge pull request #23103 from ifed01/wip-ifed-bluefs-migrate
os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-22 22:33:08 +08:00
Kefu Chai
09c31bbea3
Merge pull request #23090 from mingshuaiwang/master
OSD: ceph-osd parent process need to restart log service after fork

Reviewed-by: Neha Ojha <nojha@redhat.com>
2018-10-22 22:30:41 +08:00
Sage Weil
883fc4d122 Merge PR #24689 into nautilus
* refs/pull/24689/head:
	qa/tasks/ceph_manager: fix get_stuck_pgs from pg dump change

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-10-22 09:20:50 -05:00
Sage Weil
ae583f5dde Merge PR #24689 into master
* refs/pull/24689/head:
	qa/tasks/ceph_manager: fix get_stuck_pgs from pg dump change
	Merge PR #24625 into nautilus
	qa/suites/rados/mgr/tasks/module_selftest: whitelist 'foo bar security'

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-10-22 09:19:46 -05:00
Sage Weil
90c3c2032c os/bluestore: show compress and buffered from WriteContext
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-22 08:29:27 -05:00
Sage Weil
6e14a50e20 os/bluestore: fix rename race with trim on replacement onode at old name
- rename from foo to bar
 - foo onode is moved to bar in onode_map
 - keys removed at position foo as part of txc
 - new onode for foo is installed at foo in map
...
- cache trims foo
...
- new txn B does get_onode on foo, reads old foo (now bar) onode into foo ***
- txn A commits
-> onode cache has foo with stale bar content

Fix by holding a ref to the replacement foo onode so that get_onode cannot
read stale metadata out of kvdb before txn A commits.

Fixes: http://tracker.ceph.com/issues/36541
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-22 08:29:26 -05:00
Ricardo Marques
48eb0a336b mgr/dashboard: Fix RBD actions disable
Fixes: https://tracker.ceph.com/issues/36403

Signed-off-by: Ricardo Marques <rimarques@suse.com>
2018-10-22 14:02:37 +01:00
Volker Theile
7a44726645 mgr/dashboard: Confirmation modal doesn't close
Fixes: https://tracker.ceph.com/issues/24729

Signed-off-by: Volker Theile <vtheile@suse.com>
2018-10-22 13:29:38 +02:00
Tiago Melo
f775e9844c mgr/dashboard: Fix problem with ErasureCodeProfileService
ErasureCodeProfileService was being provided twice and that was causing
problems in production mode.

Fixes: https://tracker.ceph.com/issues/36544

Signed-off-by: Tiago Melo <tmelo@suse.com>
2018-10-22 11:57:01 +01:00
Erwan Velu
ef0ceef7c7 ceph_volume: Checking device validity at init time
When initializing the Device structure, it have to run is_valid() to
ensure the data structures (_is_valid & rejected_reasons) to be
populated accordingly to the device state.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-10-22 11:56:19 +02:00
Erwan Velu
d5de9583ee ceph_volume: Rejecting locked devices
If we cannot open a block device in O_RDWR in exclusive mode, it means
someone is actually using it like a raw database or similar.

In that case, the device should be considered as unusable as OSDs will
not be in a position to use it.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-10-22 11:56:19 +02:00
Erwan Velu
e0ea3d475a ceph_volume: Reporting nr_requests
We are already reporting the rotational & scheduler of a disk device.
Reporting the nr_requests could be useful to get how many concurrent IOs
the device supports/reports.

That could help detecting badly detected/configured devices.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-10-22 11:56:19 +02:00
Erwan Velu
5972079407 ceph_volume: Reporting firmware revision
We are already reporting model & vendor of a given disk, let's also
report the revision of the firmware. That is useful to filter-out some
known broken revisions.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-10-22 11:56:19 +02:00
Erwan Velu
f1a9435006 ceph_volume: Rejecting Read-only devices
If a devices is said to be read-only, there is no chance we can actually
use it. So let's report it as unusable.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-10-22 11:56:19 +02:00
Erwan Velu
d8fdf0b753 ceph_volume: Adding Device.is_valid()
A block device can be filtered-out/ignored because it have features that
doesn't match Ceph's expectations.

As of today, the current code was rejected removable devices but it was
pretty hidden from the user, and implicit in the get_devices() function.

This patch is creating a new is_valid() function to perform all the
rejection tests and returns if this device can be used in the Ceph
context or not.

If is_valid() is returning False, the 'rejected_reasons' list reports all
the reasons why that devices got rejected.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-10-22 11:56:14 +02:00
Kefu Chai
399923c71a
Merge pull request #20004 from mogeb/steady-clock-tools-rados
librados: use steady clock for rados_mon_op_timeout

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-10-22 17:46:44 +08:00
Kefu Chai
298da11351
Merge pull request #24658 from tchaikov/wip-18202-rebased
blkdev: Rework API and add FreeBSD support

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
2018-10-22 17:44:07 +08:00
Kefu Chai
5e12cef930 include/ceph_assert: always use __PRETTY_FUNCTION__ for C++
we've moved to GCC-7, no need to check for ancient compiler versions

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 16:53:05 +08:00
Kefu Chai
fd58e5d4ad cmake,ceph.in: preload libasan if WITH_ASAN
we need to preload libasan.so as the python exectuable is not likely to
be compiled with ASan enabled.
see:
https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso#asan-and-ld_preload

just to ease the use of ASan, for fine-tuned behaviour, use
`ASAN_OPTIONS`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 14:40:03 +08:00
Kefu Chai
e851462977 ceph.in: extract get_cmake_variables()
so it can be reused

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 14:40:03 +08:00
Kefu Chai
669853e018 cmake: should compile libzstd with -fPIC
otherwise we will have

/usr/bin/ld: libzstd/lib/libzstd.a(error_private.c.o): relocation
R_X86_64_32S against `.rodata' can not be used when making a shared
object; recompile with -fPIC

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 14:40:03 +08:00
Kefu Chai
b605210b97 cmake: pass Sanitizers flags to linker for linking .so
see
https://github.com/google/sanitizers/wiki/AddressSanitizer#using-addresssanitizer

to be specific,

> In order to use AddressSanitizer you will need to compile and link your
> program using `clang` with the `-fsanitize=address` switch.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 14:40:03 +08:00
Kefu Chai
38e7686a37 cmake: pass cflags to disutils using CC instead of CFLAGS
in python's distutils.ccompiler, linker_exe is composed using CC instead
of LDFLAGS. the latter only effects how it builds (shared) library.

and put CMAKE_C_FLAGS into the cflags for the compiler for building
python C extensions, it's more consistent this way. more importantly,
if we build with ASan enabled, the canary program, a.k.a. rados_dummy.c,
won't link without proper CFLAGS.

without this change, rados.so fails to build with errors like:

/usr/bin/ld: /var/ssd/ceph/build/lib/librados.so: undefined reference to
`__asan_stack_free_10'
/usr/bin/ld: /var/ssd/ceph/build/lib/librados.so: undefined reference to
`__asan_report_exp_store8'
...
...

clang: error: linker command failed with exit code 1 (use -v to see
invocation)

Link Error: RADOS library not found
make[3]: ***
[src/pybind/rados/CMakeFiles/cython_rados.dir/build.make:57:
src/pybind/rados/CMakeFiles/cython_rados] Error 1

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 14:40:03 +08:00
Kefu Chai
3ac8c8dca7 common/TextTable: define endrow
otherwise "cmake -DWITH_ASAN=ON -DCMAKE_BUILD_TYPE=Debug" will fail to
build with

/usr/bin/ld: //var/ssd/ceph/build/lib/libceph-common.so.0: undefined
reference to `TextTable::endrow'

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-10-22 14:40:03 +08:00
Shiyang Ruan
99ce7cf48b common: fix typos in BackoffThrottle
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
2018-10-22 13:09:20 +08:00
Yan, Zheng
19d2cecd97 PendingReleaseNotes: note about cephfs client state reclaim
Fixes: http://tracker.ceph.com/issues/36394
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2018-10-22 11:57:08 +08:00
Sage Weil
b678356594 qa/tasks/ceph_manager: fix get_stuck_pgs from pg dump change
Fixes 95b7d2340c

Fixes: http://tracker.ceph.com/issues/36485
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-21 10:52:38 -05:00
myoungwon oh
10993596d9 src/test: fix unordered manifest-unset op
- manifest unset op to foo-chunk object
 - remove manifest flag
 - commit
 - send an ack to a client
 - send decrement mesages ("chunk_put") to old chunks (bar-chunk)

Current unit test(ManifestUnset) send "chunk_read" command (to bar-chunk)
in order to see whether chunk's reference count is decreased.
But, as described above, "chunk_read" event can be triggered after a client
(test application) receives an ack. Therefore, there is a corner case
such as bar-chunk (in chunk pool) receives "chunk_read" first instead of "chunk_put"

Reference count model of dedup/tiering is based on false-positive (#24230).
So decreasing reference count is not guaranteed. If reference mismatch occur,
chunk-scrub (this is WIP) will fix it.
One guaranteed thing is that existing manifest flag is removed.

So, the solution of this commit is just re-send unset op, and then
chenk that return value is -EOPNOTSUPP (this means manifest flags is removed).

Fixes: http://tracker.ceph.com/issues/24485
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
2018-10-21 18:05:40 +09:00
Mykola Golub
4402848911 common: make ceph_abort store same crash info as ceph_assert
Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-10-21 10:44:41 +03:00
Mykola Golub
626c93f7b6 global: store assert msg in global and dump to crash meta
Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-10-21 10:44:41 +03:00
Mykola Golub
82abf02370 pybind/mgr: make 'ceph crash ls' output sorted list
Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-10-21 10:44:41 +03:00
Mykola Golub
ce01d99c9a log: don't clear ring when dump_recent is called
so the recent entries are still available if dump_recent is called
again. This is the case e.g. when the signal handler wants to dump the
recent entries both to the regular log and to the crash log.

Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-10-21 10:44:41 +03:00
Mykola Golub
759432ac54 ceph-crash: make clear to user that 'posted' should be directory
Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-10-21 10:44:41 +03:00
Sage Weil
98fc7ebc99 Merge PR #24184 into master
* refs/pull/24184/head:
	mgr/DaemonServer: remove any upmaps on merging PGs
	mgr/DaemonServer: prevent merge if either pg is remapped|upmap
	mgr/DaemonServer: move pending merge check for more consistent code
	qa/suites/rados/thrash*/thrashers/careful.yaml: thrash with mgr controller
	mgr/DaemonServer: add option to bypass careful throttling for thrasher
	PendingReleaseNotes: note about mgr/balancer/max_misplaced change
	mgr/DaemonServer: remove stale/misleading check
	mgr/DaemonServer: throttle pgp_num changes based on misplaced %
	mgr/DaemonServer: block pg_num decrease(merge) until pgp_num is reduced
	mgr/DaemonServer: adjust_pgs(): cosmetic change to debug output
	mon/PGMap: add get_recovery_stats()
	mgr/balancer: mgr/balancer/max_misplaced -> pg_max_misplaced
	pybind/mgr/mgr_module: add get_option()
	mgr/DaemonServer: allow pg_num increases that abort pending merges
	mon/OSDMonitor: resent pre-nautilus client ops on aborted merge
	mon/OSDMonitor: make pgp_num track pg_num more consistently

Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-10-20 16:40:22 -05:00
Sage Weil
0db59b6c09 Merge PR #24654 into master
* refs/pull/24654/head:
	osd: remove unused parameter 'dev' in OSD::mkfs function

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-20 15:39:56 -05:00
Sage Weil
ba3679c60b mgr/DaemonServer: remove any upmaps on merging PGs
Remove any pg_upmap[_items] on pgs that are merging to ensure that they
land on the same OSDs.

This is a bit sloppy: we *could* set the source upmap to match the target
upmap (vs potentially moving both PGs to a third location, and/or then
having the balancer move the resulting PG somewhere else again), but for
now assume upmaps are not a common case and Keep It Simple.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-20 15:21:58 -05:00
Sage Weil
550dcd53bb mgr/DaemonServer: prevent merge if either pg is remapped|upmap
Remapping means they could be on different OSDs.

Fixes: http://tracker.ceph.com/issues/36166
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-20 15:21:58 -05:00
Sage Weil
5bb9820b2d mgr/DaemonServer: move pending merge check for more consistent code
No functional change, but this makes the code simpler to read.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-20 15:21:58 -05:00
Sage Weil
86ae8fb6b8 qa/suites/rados/thrash*/thrashers/careful.yaml: thrash with mgr controller
Thrash such that we still exercise the careful throttling in the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-20 15:21:58 -05:00
Sage Weil
70ec5bda23 mgr/DaemonServer: add option to bypass careful throttling for thrasher
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-20 15:21:58 -05:00
Josh Durgin
ba3252544c
Merge pull request #20581 from chrone81/patch-1
doc: Fix EC k=3 m=2 profile overhead calculation example.

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-10-19 15:23:55 -07:00
Gregory Farnum
c61acc5b31
Merge pull request #22349 from gregsfortytwo/wip-24368-osd-restarts
systemd: only restart 3 times in 30 minutes, as fast as possible

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-19 13:00:07 -07:00
Ilya Dryomov
b7a62742fc msg/async: fix is_queued() semantics
Before AsyncConnection was split into two classes as part of the
multi-protocol refactor, we only had AsyncConnection::is_queued().
It checked both out_q and outcoming_bl because out_q was part of
AsyncConnection.

out_q is now part of ProtocolV1.  AsyncConnection should no longer be
concerned with out_q, only with outcoming_bl.  Checking whether out_q
is empty in _try_send() is particuarly wrong because if the write is
fininished (i.e. outcoming_bl is empty) but out_q has something in it,
the write callback isn't invoked.

Although probably not strictly necessary, this commit preserves the
semantics of connection->is_queued() in Protocol.cc.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-10-19 20:25:12 +02:00
Casey Bodley
077ceb17d9 rgw: fix vector index out of range in RGWReadDataSyncRecoveringShardsCR
Fixes: http://tracker.ceph.com/issues/36537

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2018-10-19 12:54:48 -04:00