Commit Graph

90590 Commits

Author SHA1 Message Date
Jason Dillaman
5c317aef31 qa/tasks/rbd_mirror_thrash: let daemon gracefully shut down if possible
Otherwise, try to capture a core dump to discover what was blocking the
shutdown process.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-25 16:08:59 -04:00
Jason Dillaman
dca9e3e252 qa/workunits/rbd: wait max 2 hrs for all stress images to sync
Sporadically the rbd-mirror fsx stress test would fail due to very
slow sync times due to overloaded clusters. Attempt to wait for all
images to be replicated before proceeding with the comparison.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-25 16:08:59 -04:00
Jason Dillaman
d04a7679c0 qa/workunits/rbd: exclude rbd-mirror sync-point snaps from comparison
This is a temporary workaround to tracker ticket issue #36185

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-25 16:08:25 -04:00
Jason Dillaman
27832e2781 qa/workunits/rbd: image compare should print byte offset of any deltas
This will assist in debugging any mirroring issues.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-25 08:45:08 -04:00
Jason Dillaman
3e8f16b484 qa/suites/rbd: increase librbd debug level for mirror-thrash
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 15:12:26 -04:00
Jason Dillaman
157a03f8fd librbd: improve debug logging for create/clone state machines
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 15:12:26 -04:00
Jason Dillaman
9edcd77a83 rbd-mirror: always send image RPC messages via cluster
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 15:11:02 -04:00
Jason Dillaman
4bafa00c98 rbd-mirror: permit better tracking between image replayer and bootstrap
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 15:11:02 -04:00
Jason Dillaman
f80ba2e6a4 rbd-mirror: re-generate new image id upon collision
Fixes: http://tracker.ceph.com/issues/24139
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 15:11:02 -04:00
Jason Dillaman
5148164af8 librbd: create image should return unique error code on id collision
The image id is composed of the librados global instance id and a random
number. For long-lived clients that create multiple images (basically
only rbd-mirror daemon), it's more likely to hit a collision.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 15:11:02 -04:00
Sage Weil
9054ed81da Merge PR #24225 into master
* refs/pull/24225/head:
	osd/ECBackend: suppress 'Error -2 reading object' if EC fast reads

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2018-09-24 11:01:22 -05:00
Jason Dillaman
4b782613f7
Merge pull request #24078 from Songweibin/wip-rbd-trash-state
rbd: not allowed to restore an image when it is being deleted

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 10:19:13 -04:00
Jason Dillaman
762a7ae386
Merge pull request #23743 from trociny/wip-rbd-config-pool
librbd: pool and image level config overrides

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-09-24 10:19:01 -04:00
Lenz Grimmer
4d3f896c89
Merge pull request #23568 from rhcs-dashboard/wip-24573-landing-page
mgr/dashboard: New Landing Page

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kanika Murarka <kmurarka@redhat.com>
Reviewed-by: Ricardo Marques <rimarques@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tiago Melo <tmelo@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
2018-09-24 15:32:48 +02:00
John Spray
194c17c04a
Merge pull request #23570 from jcsp/wip-rook-api
mgr/rook: update for v1beta1 API

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2018-09-24 12:12:10 +01:00
Sage Weil
a9816f31dd Merge PR #24133 into master
* refs/pull/24133/head:
	common/Finisher: convert to ceph::mutex etc
	common/ceph_mutex: ceph::{mutex,condition_variable,lock_guard}
	common/mutex_debug: take const char * to ctor, and require a name
	common/mutex_debug: add lockdep support for recursive_mutex_debug
	common/mutex_debug: fix whitespace
	common/mutex_debug: refactor to remove intermediate class
	common/lockdep: add recursive flag for _will_lock
	do_cmake.sh: default to Debug build
	.gitignore: ignore build.*/

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-23 11:17:03 -05:00
Sage Weil
9ef5a956cb osd/ECBackend: suppress 'Error -2 reading object' if EC fast reads
When fast reads are enabled, it's possible for the ordering of a shard
read to not be enforced with respect to writes that come after because
the read completes on the primary before all shards reply.  This can lead
to an ENOENT on the non-primary, and an ERR message in the cluster log,
even though everything is fine.  (The reply will go back to the primary
with the error but it will be ignored since the read has completed.)

Suppress the error message so we don't see these ERR messages in the
cluster log during the normal course of events.

Fixes: http://tracker.ceph.com/issues/26972
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-22 10:42:20 -05:00
Sage Weil
f90edeb110 Merge PR #24202 into master
* refs/pull/24202/head:
	mon/MonClient: fix wait for monmap+config is non-cephx case

Reviewed-by: Mark Nelson <mnelson@redhat.com>
2018-09-22 10:28:39 -05:00
Sage Weil
47acd45476 Merge PR #24217 into master
* refs/pull/24217/head:
	osd/PG.cc: silence "-Wsign-compare" warnings

Reviewed-by: Erwan Velu <erwan@redhat.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-22 10:27:41 -05:00
Sage Weil
1ca7535a51 Merge PR #24220 into master
* refs/pull/24220/head:
	test/objectstore: set pool for fsck test

Reviewed-by: Jianpeng Ma <jianpeng.ma@intel.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-22 10:26:53 -05:00
Patrick Donnelly
9e9f3ead36
Merge PR #24157 into master
* refs/pull/24157/head:
	qa: cleanup parallel execution of fsstress

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-09-21 13:01:14 -07:00
Patrick Donnelly
1b7cabc732
githubmap: update contributors
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-09-21 13:01:13 -07:00
Patrick Donnelly
a5c572b13a
Merge PR #24207 into master
* refs/pull/24207/head:
	script/ptl-tool.py: fix BASE_PATH

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-09-21 11:55:42 -07:00
Patrick Donnelly
de824f74dd
qa: cleanup parallel execution of fsstress
Two instances of fsstress clobber each other. Just build it in the local sandbox.

Fixes: http://tracker.ceph.com/issues/24177

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-09-21 11:34:20 -07:00
Sage Weil
8535965ba4 common/Finisher: convert to ceph::mutex etc
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:55:58 -05:00
Sage Weil
c3f70dc4a8 common/ceph_mutex: ceph::{mutex,condition_variable,lock_guard}
If CEPH_DEBUG_MUTEX is defined, use the [recursive_]mutex_debug classes
that implement lockdep and a bucnh of other random debug checks.  Also
typedef ceph::condition_variable to std::condition_variable_debug, which
adds addition assertions and debug checks.

If CEPH_DEBUG_MUTEX is not defined, then use the bare-bones C++ std::mutex
primitives... or as close as we can get to them.

Since the [recursive_]mutex_debug classes take a string argument for the
lockdep piece, define factory functions ceph::make_[recursive_]mutex that
either pass arguments to the debug implementations or toss them out.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:55:56 -05:00
Sage Weil
77acda7219 common/mutex_debug: take const char * to ctor, and require a name
Require a name, like Mutex.

Most callers are passing a C string.  This may avoid a std::string
copy?

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:52:56 -05:00
Sage Weil
27c6d283e9 common/mutex_debug: add lockdep support for recursive_mutex_debug
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:52:08 -05:00
Sage Weil
5900c60623 common/mutex_debug: fix whitespace
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:51:09 -05:00
Sage Weil
3ae9e7ad06 common/mutex_debug: refactor to remove intermediate class
I don't see any purpose for this, and it prevents us from knowing whether
the mutex is recursive when _will_lock() is called.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:51:09 -05:00
Sage Weil
a02ae950b7 common/lockdep: add recursive flag for _will_lock
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 11:51:09 -05:00
Sage Weil
7e6a57bd26 do_cmake.sh: default to Debug build
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 10:27:04 -05:00
Sage Weil
98d25409a2 .gitignore: ignore build.*/
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-21 10:27:04 -05:00
Laura Paduano
3db50d5aca
Merge pull request #24219 from p-na/fix-test-scrub
mgr/dashboard: Possible fix for some dashboard timing issues

Reviewed-by: Volker Theile <vtheile@suse.com>
Reviewed-by: Ricardo Dias <rdias@suse.com>
2018-09-21 17:24:32 +02:00
Mykola Golub
b4e3935eb8
Merge pull request #24179 from dillaman/wip-36074
librbd: properly handle potential object map failures

Reviewed-by: Mykola Golub <mgolub@suse.com>
2018-09-21 18:15:09 +03:00
Neha Ojha
57e006b7cc script/ptl-tool.py: fix BASE_PATH
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-09-21 07:07:38 -07:00
Patrick Nawracay
8daffe86a6 mgr/dashboard: Fix for some dashboard timing issues
Specifically fixes the recurringly occurring `test_osd.py` error on the
`test_scrub` method. But this change should also prevent other issues of
the same kind. Issues of "same kind" are issues which occurr due to
tests which do not immediately result in a clean cluster status and
aren't manually programmed to wait for it.

Fixes: http://tracker.ceph.com/issues/36107

Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
2018-09-21 16:01:24 +02:00
Sage Weil
9bf7c810a7 Merge PR #23985 into master
* refs/pull/23985/head:
	ceph-objectstore-tool: add back pool dne check
	qa/suites/rados/singleton/reg11184: remove old test
	ceph-objectstore-tool: import pg at original epoch
	osd: handle null pg slot on startup
	ceph-objectstore-tool: drop support for ancient export files
	osd: avoid dropping osd_lock when pg osdmaps are not laggy
	qa/standalone/osd/pg-merge.sh: add merge vs pg import test

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-09-21 08:21:53 -05:00
Sage Weil
e496446034 Merge PR #24064 into master
* refs/pull/24064/head:
	osd: simplify init of fabricated pg
	osd/PG: inherit pg history from merge source, if necessary
	osd/osd_types: increasing pg_num_pending is also an interval change
	osd: cancel pg merge if PGs are undersized
	mon/OSDMonitor: handle ready_to_merge message that cancels the merge
	osd/PG: only signal ready_to_merge if we have all replicas
	osd/PG: move all mark_clean-ish activity into try_mark_clean()
	osd/PG: use last_epoch_clean from ReadyToMerge point in time for fabricated history
	osd: send last_epoch_clean when indicating PG is ready to merge
	osd/osd_types: rename pg_num_pending_dec_epoch -> pg_num_dec_last_epoch_clean
	osd,mon: stop setting pg_num_pending_dec_epoch

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-21 08:21:33 -05:00
Kefu Chai
67c05144bb osd/PG.cc: silence "-Wsign-compare" warnings
/ceph/src/osd/PG.cc: In member function 'void
PG::choose_async_recovery_ec(const std::map<pg_shard_t, pg_info_t>&,
const pg_info_t&, std::vector<int>*, std::set<pg_shard_t>*) const':
/ceph/src/osd/PG.cc:1572:32: warning: comparison of integer expressions
of different signedness: 'long int' and 'long unsigned int'
[-Wsign-compare]
     if (approx_missing_objects > cct->_conf.get_val<uint64_t>(
         ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         "osd_async_recovery_min_cost")) {
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/ceph/src/osd/PG.cc: In member function 'void
PG::choose_async_recovery_replicated(const std::map<pg_shard_t,
pg_info_t>&, const pg_info_t&, std::vector<int>*, std::set<pg_shard_t>*)
const':
/ceph/src/osd/PG.cc:1625:33: warning: comparison of integer expressions
of different signedness: 'long int' and 'long unsigned int'
[-Wsign-compare]
     if (approx_missing_objects  > cct->_conf.get_val<uint64_t>(
         ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         "osd_async_recovery_min_cost")) {
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 21:04:52 +08:00
Kefu Chai
8a814b2eb6 test/objectstore: set pool for fsck test
since 0bd2546eac, we check the pool id
of object when performing fsck to ensure we are looking at the right
collection, but the test is still using the pool id set by the
constructor of hobject_t. so all objects we created in that test belong
to the POOL_META. while the collection is created with the pool id of
`555`. hence the test fails.

Fixes: http://tracker.ceph.com/issues/36099
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 19:59:39 +08:00
Kefu Chai
f3bc838894
Merge pull request #24139 from tchaikov/wip-fix-typos
*/: fix typos in docs,messages,logs,comments

Reviewed-by: Lenz Grimmer <lgrimmer@suse.com>
Reviewed-by: Alfredo Deza <adeza@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Ricardo Dias <rdias@suse.com>
2018-09-21 16:56:31 +08:00
Xie Xingguo
3c7c8c991d
Merge pull request #23317 from xiexingguo/wip-fix-polog-overtrim
osd/PrimaryLogPG: fix potential pg-log overtrimming

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Yan Jun <yan.jun8@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-09-21 14:44:35 +08:00
Kefu Chai
fda8befc53 tools: fix typos in user-visible message and comments
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00
Kefu Chai
98b7e6b896 tools,test: fix typos in comments and usage message
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00
Kefu Chai
a2eff2fc32 script: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00
Kefu Chai
3b062b4278 rgw: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00
Kefu Chai
1544ef05bf pybind/rados: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00
Kefu Chai
67157ec7fd pybind/mgr: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00
Kefu Chai
c33ce07fb8 mount,osdc: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:43:33 +08:00