Commit Graph

101267 Commits

Author SHA1 Message Date
xie xingguo
e70f8ddb5a osd/osd_types: drop 'new_object' from missing.add
because below here we know we'll always mark object as fully dirty.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-08-26 13:13:27 +08:00
xie xingguo
679f0777a7 osd/osd_types: always call mark_fully_dirty for missing.add
In general we shall build missing set (and hence clean_regions)
based on pg log. However, currently there are still 5 cases we might call
missing.add to add a new pg_missing_item into the missing set
explicitly (or replace an existing pg_missing_item entirely):

1. we explicitly build missing set on startup, in which case
we know we are trying to be compatiable with pre-kraken versions,
so it should be ok for us to disable inc-recovery.

2. we are currently processing authoritative log, and there are
some divergent objects detected. For simplicity (and correctness),
we should disable inc-recovery entirly for these objects.

3. we are re-building missing set, e.g., due to the global
CEPH_OSDMAP_RECOVERY_DELETES policy changing.
In this case we know we are at the end of upgrading from a
pervious version that is lack of CEPH_OSDMAP_RECOVERY_DELETES support.
Hence it should be the recommended option to disable inc-recovery
simultaneously since these objects should be lack of inc-recovery support
too.

4. we are adding or re-adding missing object into primary's missing_loc.
It doesn't matter whether we have a correct clean_regions there
since we never actually refer to that field from missing_loc
when we actually start to perform object recovery later.

5. we are auto-repairing a corrupted object and hence the need of
adding it to the corresponding missing set first, e.g, by leveraging
the existing recovery procedure. In this case, we always disable
inc-recovery to make sure this object can be fully (and correctly)
recovered later.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-08-26 13:12:55 +08:00
Xie Xingguo
dce9704c28
Merge pull request #29754 from xiexingguo/wip-inc-recovery-3
osd: misc inc-recovery compat fixes 

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-08-24 13:51:21 +08:00
Kefu Chai
79c0fcf823
Merge pull request #28344 from iotcg/rdma
check rdma configuration and fix some logic problem

Reviewed-by: Roman Penyaev <rpenyaev@suse.de>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-24 11:37:03 +08:00
Patrick Donnelly
f13b3483e7
Merge PR #28855 into master
* refs/pull/28855/head:
	doc: document scrub summary in ceph status output
	test: extend scrub control test to validate mds task status
	mds: send scrub state changes to cluster log.
	mds: periodically sent mds scrub status to ceph manager
	mgr, mon: allow normal ceph services to register with manager

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-23 16:16:16 -07:00
Patrick Donnelly
7ac9f243cd
Merge PR #29167 into master
* refs/pull/29167/head:
	client: return -eio when sync file which unsafe reqs has been dropped

Reviewed-by: Zheng Yan <zyan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-23 16:10:40 -07:00
Patrick Donnelly
f68c087e37
Merge PR #29572 into master
* refs/pull/29572/head:
	mds: Reorganize class members in FSMap header

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-23 16:06:51 -07:00
Jason Dillaman
5bb807d1ae
Merge pull request #29796 from trociny/wip-journal-player-handle_cache_rebalanced2
journal: fix race between player shut down and cache rebalance

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-08-23 13:40:15 -04:00
Jason Dillaman
4e8a825777
Merge pull request #29775 from trociny/wip-41229
librbd: always try to acquire exclusive lock when removing image

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-08-23 13:39:37 -04:00
Jason Dillaman
648b649848
Merge pull request #29459 from zy751713126/config_set
pybind/rbd: add config_set/get/remove api in rbd.pyx

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-08-23 13:39:07 -04:00
Patrick Donnelly
d30af45a54
Merge PR #29715 into master
* refs/pull/29715/head:
	qa: fix broken ceph.restart marking of OSDs down
	qa: add debugging failed osd-release setting

Reviewed-by: Sage Weil <sage@redhat.com>
2019-08-23 10:09:17 -07:00
Patrick Donnelly
b82e87bda4
Merge PR #29821 into master
* refs/pull/29821/head:
	qa: stop DaemonWatchdog for each cluster in daemon roles

Reviewed-by: Jos Collin <jcollin@redhat.com>
2019-08-23 10:00:52 -07:00
Sage Weil
af4500be86 Merge PR #29575 into master
* refs/pull/29575/head:
	objclass, osd: improve const-correctness of PGLSFilter.
	common: add bl::contents_equal() override for void* + size_t.
	osd: refactor manufacturing of PGLSFilter.
	osd: don't carry PGLSFilter between multiple ops in MOSDOp.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-23 11:26:28 -05:00
Sage Weil
1f39b36b8e Merge PR #28727 into master
* refs/pull/28727/head:
	test/crimson: resolve name collision
	test: switch to ldout; let users specify mon debug level
	test: add new ElectionLogic unit test framework
	elector: const-ify a bunch of functions
	elector: swap order of parameters in ElectionLogic::receive_propose
	elector: Update Elector and ElectionLogic function documentation
	elector: persist the epoch in bump_epoch()
	elector: make some more ElectionLogic members private
	elector: fix privacy and restore dout in Elector
	elector: don't clear peer_info in bump_epoch()
	elector: split ElectionLogic into its own compilation unit
	elector: move all the elector callouts into the Elector
	elector: make ElectionLogic private to Elector; undo most public shenanigans
	elector: create declare_standlone_victory in Elector/Logic for Monitor
	elector: make ElectionLogic::declare_victory private
	elector: route _bump_epoch through the interface-to-be
	elector: rename handle_propose_logic -> receive_propose
	elector: hoist handle_victory into ElectionLogic
	elector: hoist handle_ack into ElectionLogic
	elector: hoist victory into ElectionLogic
	elector: hoist expire into ElectionLogic
	elector: hoist start into ElectionLogic
	elector: hoist participating into ElectionLogic
	elector: hoist init into ElectionLogic
	elector: hoist defer into ElectionLogic
	elector: split handle_propose in two and hoist into ElectionLogic
	elector: hoist bump_epoch into ElectionLogic
	elector: store accessors for ElectionLogic
	elector: hoist Elector data bits out into a new ElectionLogic class
	mon: Rearrange Paxos::dispatch to be a little cleaner

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-08-23 11:25:28 -05:00
Sage Weil
8876cf36c2 Merge PR #15183 into master
* refs/pull/15183/head:
	kv/rocksdb: support rmrange unconditionally
	cls/rgw: rgw_bi_log_trim() uses cls_cxx_map_remove_range()
	cls/log: cls_log_trim() uses cls_cxx_map_remove_range()
	test/cls: add cls_log.trim_by_marker test
	test/cls: test_cls_log doesn't allocate ObjectOperations
	test/cls: test_cls_log uses fixture for temporary pool
	test/cls: add cls_rgw.bi_log_trim test
	cls/rgw: expose cls_rgw_bilog_list/trim() for single shard
	test/cls: test_cls_rgw uses cls_rgw_obj_key
	test/cls: test_cls_rgw doesn't allocate ObjectOperations
	test/cls: test_cls_rgw uses fixture for temporary pool
	objclass: add cls_cxx_map_remove_range()
	librados: add rados_write_op_omap_rm_range2()
	osdc: add Objecter omap_rm_range()
	osd: add CEPH_OSD_OP_OMAPRMKEYRANGE to do_osd_ops()
	osd: add omap_rmkeyrange() to PGTransaction
	os: add bufferlist overload for omap_rmkeyrange()
	tracing: add do_osd_op_pre_omaprmkeyrange
	rados: add CEPH_OSD_OP_OMAPRMKEYRANGE

Reviewed-by: Sage Weil <sage@redhat.com>
2019-08-23 10:46:33 -05:00
Casey Bodley
6b0f3ce4cb
Merge pull request #29778 from cbodley/wip-41212
vstart: move [client.rgw] config into [client]

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
2019-08-23 10:24:41 -04:00
Lenz Grimmer
862876d900
mgr/dashboard: User Management E2E tests (#29641)
mgr/dashboard: User Management E2E tests 

Reviewed-by: Tiago Melo <tmelo@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
2019-08-23 14:00:16 +00:00
Lenz Grimmer
d373178c43
mgr/dashboard: run-backend-api-tests.sh CI improvements (#29504)
mgr/dashboard: run-backend-api-tests.sh CI improvements

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-08-23 09:11:39 +00:00
Kefu Chai
e909031d3c
Merge pull request #29590 from Aran85/fix_proc_replica_log
osd: merge replica log on primary need according to replica log's crt

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-08-23 14:59:02 +08:00
Changcheng Liu
940a5b5ae3 msg/async/Stack: rename variable to improve readability
1. rename var i to be worker_id when creating Worker
"i" is assigned to be Worker::id, it means worker's id

2. rename EventCenter::idx to EventCenter::center_id
"idx" is EventCenter's index in global_centers obj.
rename it to be center_id.

3. rename EventCenter::init API's parameter n to be nevent
"n" is actually assigned to EventCenter::nevent. rename it
to be "nevent".

4. rename EventCenter::init API's paramter t to be type
"t" is corresponding to Epoll Driver's implementation's type.

5. rename EpollDriver::size to be EpollDriver::nevent
"size" is actually epoll events number, rename it to be "nevent"

6. use event_id as index name to get event instead of "j"

7. rename "nw" to be "nowait"

8. Processor::start unify variable name with Processor::accept & Processor::stop
==> auto &l to be auto &listen_socket

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
fdd4053d12 msg/async/rdma: remove stack from RDMAWorker
There's no need to cache stack since RDMAWorker already has
Inifiniband obj ib & RDMADispatcher obj dispatcher.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
44a1820da8 msg/async/rdma: use shared_ptr to manage RDMADispatcher obj
1. Don't use bare pointer to manage RDMADispatcher obj.

2. access RDMADispatcher obj directly instead of accessing it
from RDMAStack. This could avoid caching RDMAStack obj in
RDMAWorker & RDMADispatcher.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
923b30f57e msg/async/rdma: remove stack from RDMADispatcher
There's no need to cache stack since RDMADispatcher already has
Inifiniband obj ib.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
297452c2c6 msg/async/rdma: use shared_ptr to manage Infiniband obj
1. Don't use bare pointer to manage Infiniband obj.

2. access Infiniband obj directly instead of accessing it from
RDMAStack. This could avoid caching RDMAStack obj in RDMAWorker
& RDMADispatcher.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
2754d60f66 msg/async/rdma: implement function to prefetch buffers
The original RDMAConnectedSocketImpl::read read date from buffers and
prefertch data into buffers for next round of reading. It makes the
logical a little complex and the code isn't smooth to be read.
In this patch:
1) RDMAConnectedSocketImpl::buffer_prefetch private API is added to
prefetch data into buffers at the head of read_buffers.
2) reduce one time of calling notify() to reduce context switches.
It's really not needed to notify upper layer to read data since current
read operation hasn't finished yet.
3) Simplify RDMAConnectedSocketImpl::read implementation.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
9b63845ad1 msg/async/rdma: remove redundant code
1. Below three bits are meaningless in pollfd::events field:
   POLLERR, POLLHUP, or POLLNVAL.
2. QueuePair::pd is initialized in the initialize list.
   There's no need to assign same value to it.
3. Remove the never used function Chunk::set_bound
4. Remove the never used function Chunk::set_offset
5. Remove the never used function QueuePair::is_error
6. Remove SimplePolicyMessenger used vars
7. remove socket_fd() interface since it's never used.
   All data write/read is based on ConnectedSocketImpl::fd.
   So, there's no need to expose socket_fd since it's never used.
8. Remove RDMAServerSocketImpl::get_fd which is not used.
   BTW, RDMAServerSocketImpl::fd has the same function as get_fd.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
c86e927888 msg/async/rdma: show port state with string
Show the port state with string is more easy to be read through
value.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
b61a48c197 msg/async/rdma: convert port_id from type uint8_t to int for output
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:35:52 +08:00
Kefu Chai
1116362118
Merge pull request #29747 from liewegas/wip-39546
osd/PeeringState: do not complain about past_intervals constrained by oldest epoch

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-08-23 13:28:52 +08:00
Kefu Chai
25ed83e14a
Merge pull request #29624 from NancySu05/osdmonitor_markmedown
mon:C_AckMarkedDown has not handled the Callback Arguments

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-23 13:23:46 +08:00
Kefu Chai
5e5eca2834
Merge pull request #29738 from ifed01/wip-ifed-alloc-cleanup
os/bluestore: minor improvements/cleanup around allocator

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
2019-08-23 13:22:52 +08:00
Kefu Chai
ed8a0fb3c6
Merge pull request #29614 from votdev/issue_41205
mgr/dashboard: Access control database does not restore disabled users correctly

Reviewed-by: Patrick Seidensal <pnawracay@suse.com>
2019-08-23 13:20:52 +08:00
Kefu Chai
4deb2b90b5
Merge pull request #29146 from badone/wip-tracker-40835-OSDCap.PoolClassRNS-abort
osd/OSDCap: Check for empty namespace

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-23 13:16:22 +08:00
Kefu Chai
263a78c3dd
Merge pull request #25697 from Aran85/fix-onode-trim
os/bluestore: more aggressive deferred submit when onode trim skipping

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2019-08-23 13:15:27 +08:00
Kefu Chai
b3c1c4c1cd
Merge pull request #28488 from liuchang0812/show-pool-id-in-pool-ls-cmd
mon: show pool id in pool ls command

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-08-23 13:13:32 +08:00
Kefu Chai
0bbaa185a5
Merge pull request #24636 from rzarzynski/wip-denc-container_base
denc: slightly optimize container_base::bound_encode

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-23 13:12:04 +08:00
Kefu Chai
bfce110511
Merge pull request #29756 from Aran85/fix-repair-object
osd: clear PG_STATE_CLEAN when repair object

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-08-23 13:08:49 +08:00
Changcheng Liu
ae6759aa52 msg/async/rdma: rename variable to improve readability
Device::binding_port
1. port_id is more meaningful compared to i as variable name.
2. start port_id from 1 instead of 0.

PoolAllocator::malloc
1. make clear relationship among buffer/chunk/block/memory_region with new
variable name.
2. define the variable when it's first being used.

RDMAConnectedSocketImpl::submit
1. use "wait_copy_len" to replace "need_reserve_bytes" which stands for the memory
that is waiting to be copied into chunk.
2. use "copy_start" to replace "copy_it" which stands for the start iterator to be copied.
3. use "total_copied" to replace "total" which stands for the memory that has been copied.

allocate huge page
1. use "HUGE_PAGE_SIZE_2MB" to be used for 2MB page alignment.
2. use "ALIGN_TO_PAGE_2MB" to stands align request size to 2MB.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
a62ce898f3 msg/async/rdma: make clear to get mem_info address
The parameter "block" points to mem_info::chunks space. It's not quite
clear about the function of "reinterpret_cast<mem_info *>(block) - 1;".
Get the mem_info::chunks address and minus the member offset from struct
head to get mem_info address.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
32da5f1d03 msg/async/rdma: use different strategy to reset read/write chunk
When releasing read chunk to pool, the chunk::offset & chunk::bound
should be reset to zero. For write chunk, it's better to reset
chunk::offset to zero and chunk::bound to chunk length which means that
[offset, bound) is writable.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
60a87c9db9 msg/async/rdma: cosmetics initialize ibv_send_wr* var
API usage:
int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr)
Input Parameters:
   qp struct ibv_qp from ibv_create_qp
   wr first work request (WR)
Output Parameters:
   bad_wr pointer to first rejected WR
Return Value:
   0 on success, -1 on error.
   If the call fails, errno will be set to indicate the reason for the failure.
To avoid wrong checking return value, it's better to initialize the
value to be nullptr.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
7078506107 msg/async/rdma: cosmetics RDMAWorker listen & connect & get_reged_mem
1. There's no need to get stack & dispatcher from RDMAStack again
since RDMAWorker has stored the value.
2. cache the Infiniband object to be used in local scope.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
5e53dad5ad msg/async/rdma: cosmetics RDMAConnectedSocketImpl::read_buffers
After refactoring, there's no need to do below judgement
    -  if (c != buffers.end() && (*c)->over())
    -    ++c;

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
282499b77f msg/async/rdma: cosmetics post_chunks_to_rq implementation
1. It's not proper to allocate large space in stack. e.g. rx_queue_len is 4096.
The patch changes to allocate rx_work_request and isge in heap.

2. Set rx_work_request and isge array whole space into zero which could avoid
setting the space into zero one by one in the while loop.

3. Change parameter name "num" to be "rq_wr_num" to improve readiness
rq_wr_num i.e. receive-queue_work-request_number

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:26 +08:00
Changcheng Liu
6a0d3df90a msg/async/rdma: refine Chunk construction function
1. all values are initialized in construction function
   In this way, it's easy to construct Chunk object in
   PoolAllocator::malloc function.
2. For read chunk, member bound is initialized to be 0.
3. For send chunk, member bound is initialzied to be full space size.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:46:16 +08:00
Changcheng Liu
6823d2d8cd msg/async/rdma: avoid long lambda function for readability
Extract the long lambda function to improve readability.
There's no advantage since "this" pointer is also needed
in original lambad function.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:46:16 +08:00
Changcheng Liu
060c5c8e3a msg/async/rdma: define handle_rx_event to handle recv-comple-queue
1. define handle_rx_event to let dispatch handle
recvive-completion-queue
2. simplify RDMADispatcher::polling implementation

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
1c76c13207 msg/async/rdma: deal with all RDMA device async event
1. List all asynchronous event of the RDMA device
2. Output the fatal error events to check RDMA device status

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
0b31f416fa msg/async/Event: simplify EventCenter::process_events implementation
The original implementation makes it's hard to understand:
1) Whether timer event should be executed.
2) How long should epoll wait for timeout.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
c7f87c3ff0 msg/async/Event: simplfy logical implementation
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00