Commit Graph

101143 Commits

Author SHA1 Message Date
Changcheng Liu
c86e927888 msg/async/rdma: show port state with string
Show the port state with string is more easy to be read through
value.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:36:05 +08:00
Changcheng Liu
b61a48c197 msg/async/rdma: convert port_id from type uint8_t to int for output
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 14:35:52 +08:00
Changcheng Liu
ae6759aa52 msg/async/rdma: rename variable to improve readability
Device::binding_port
1. port_id is more meaningful compared to i as variable name.
2. start port_id from 1 instead of 0.

PoolAllocator::malloc
1. make clear relationship among buffer/chunk/block/memory_region with new
variable name.
2. define the variable when it's first being used.

RDMAConnectedSocketImpl::submit
1. use "wait_copy_len" to replace "need_reserve_bytes" which stands for the memory
that is waiting to be copied into chunk.
2. use "copy_start" to replace "copy_it" which stands for the start iterator to be copied.
3. use "total_copied" to replace "total" which stands for the memory that has been copied.

allocate huge page
1. use "HUGE_PAGE_SIZE_2MB" to be used for 2MB page alignment.
2. use "ALIGN_TO_PAGE_2MB" to stands align request size to 2MB.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
a62ce898f3 msg/async/rdma: make clear to get mem_info address
The parameter "block" points to mem_info::chunks space. It's not quite
clear about the function of "reinterpret_cast<mem_info *>(block) - 1;".
Get the mem_info::chunks address and minus the member offset from struct
head to get mem_info address.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
32da5f1d03 msg/async/rdma: use different strategy to reset read/write chunk
When releasing read chunk to pool, the chunk::offset & chunk::bound
should be reset to zero. For write chunk, it's better to reset
chunk::offset to zero and chunk::bound to chunk length which means that
[offset, bound) is writable.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
60a87c9db9 msg/async/rdma: cosmetics initialize ibv_send_wr* var
API usage:
int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr)
Input Parameters:
   qp struct ibv_qp from ibv_create_qp
   wr first work request (WR)
Output Parameters:
   bad_wr pointer to first rejected WR
Return Value:
   0 on success, -1 on error.
   If the call fails, errno will be set to indicate the reason for the failure.
To avoid wrong checking return value, it's better to initialize the
value to be nullptr.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
7078506107 msg/async/rdma: cosmetics RDMAWorker listen & connect & get_reged_mem
1. There's no need to get stack & dispatcher from RDMAStack again
since RDMAWorker has stored the value.
2. cache the Infiniband object to be used in local scope.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
5e53dad5ad msg/async/rdma: cosmetics RDMAConnectedSocketImpl::read_buffers
After refactoring, there's no need to do below judgement
    -  if (c != buffers.end() && (*c)->over())
    -    ++c;

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:55 +08:00
Changcheng Liu
282499b77f msg/async/rdma: cosmetics post_chunks_to_rq implementation
1. It's not proper to allocate large space in stack. e.g. rx_queue_len is 4096.
The patch changes to allocate rx_work_request and isge in heap.

2. Set rx_work_request and isge array whole space into zero which could avoid
setting the space into zero one by one in the while loop.

3. Change parameter name "num" to be "rq_wr_num" to improve readiness
rq_wr_num i.e. receive-queue_work-request_number

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 11:35:26 +08:00
Changcheng Liu
6a0d3df90a msg/async/rdma: refine Chunk construction function
1. all values are initialized in construction function
   In this way, it's easy to construct Chunk object in
   PoolAllocator::malloc function.
2. For read chunk, member bound is initialized to be 0.
3. For send chunk, member bound is initialzied to be full space size.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:46:16 +08:00
Changcheng Liu
6823d2d8cd msg/async/rdma: avoid long lambda function for readability
Extract the long lambda function to improve readability.
There's no advantage since "this" pointer is also needed
in original lambad function.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:46:16 +08:00
Changcheng Liu
060c5c8e3a msg/async/rdma: define handle_rx_event to handle recv-comple-queue
1. define handle_rx_event to let dispatch handle
recvive-completion-queue
2. simplify RDMADispatcher::polling implementation

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
1c76c13207 msg/async/rdma: deal with all RDMA device async event
1. List all asynchronous event of the RDMA device
2. Output the fatal error events to check RDMA device status

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
0b31f416fa msg/async/Event: simplify EventCenter::process_events implementation
The original implementation makes it's hard to understand:
1) Whether timer event should be executed.
2) How long should epoll wait for timeout.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
c7f87c3ff0 msg/async/Event: simplfy logical implementation
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
5ec51a31dd msg/async/rdma: simplify RDMAConnectedSocketImpl::read implementation
After reading one chunk, the chunk could be pushed into buffer list if its
effecitve content size is not zero. In this case, it also means that the
caller has got the required read length. Then all the continuous chunk will
be pushed into buffer list since the effective content size is not zero.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
de471d003b msg/async/rdma: simplify Cluster::get_buffers implementation
Keep same logic:
1. If parameter block_size is zero, then allocate all the free chunks
to parameter std::vector<Chunk*> &chunks. i.e.
   chunk_buffer_number = free_chunks.size()
2. If paramter block_size is not zero, then allocate the requested or
all the free chunks to paramter std::vector<Chunk*> &chunks.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
c946349871 msg/async/rdma: simplify chunk::write implementation
Keep same logic to improve readability

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
947520c336 msg/async/rdma: simplify chunk::read implementation
1. offload chunk::read without managing bound.
2. reset chunk::offset & chunk::bound before releasing to pool.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
f1668da1ce msg/async/rdma: use Chunk::get_size to get chunk size
remove Chunk::over interface and add Chunk::get_size interface
1) It's not clear when reading "over" function name.
2) Some places need know the current chunk block effective content size.
3) "Chunk::over()" could be replaced by "Chunk::get_size() == 0"

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
2d4890580f msg/async/rdma: seperate Device construction if rdma_cm is used
If ms_async_rdma_cm is false, there's no need to call the api
rdma_get_device. If rdma_get_device is called, the devices remain
opened while librdmacm is loaded. This is not what we want when
ms_async_rdma_cm is false.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:22 +08:00
Changcheng Liu
0e3db04d04 msg/async/rdma: operate event fd with event_{read,write}
1. use wrapper function event_read & event_write to access
event file descriptor.
2. change event fd access value name to be event_val.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
49b8ef0746 msg/async/rdma: fix error argument to get right qp state
1. It's wrong to use "-1" as argument to query queue state.
In rdma library, ibv_query_qp will call ibv_cmd_query_qp to query
queue state. If "-1" is used as attr_mask, ibv_cmd_query_qp will
return error EOPNOTSUPP which means query failed.

2. In class QueuePair, is_error() could use member function get_state()
to get the queue pair state.

3. It's better to use qp_state as queue pair state according to
ibv_query_qp manual guide.
   struct ibv_qp_attr {
      enum ibv_qp_state       qp_state;            /* Current QP state */
      enum ibv_qp_state       cur_qp_state;        /* Current QP state - irrelevant for ibv_query_qp */
      ...

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
b2d3f5e097 msg/async/rdma: export RDMAV_HUGEPAGES_SAFE before ibv_fork_init
In rdma-core library, ibv_fork_init will check environment variable
RDMAV_HUGEPAGES_SAFE to decide whether huge page is usable in system.
It doesn't make sense to export RDMAV_HUGEPAGES_SAFE env after
calling ibv_fork_init.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
4810e40d44 msg/async/rdma: use ibv_port_attr object type in Port class
1. Avoid to do memory management without using pointer to operate
operate the allocated space. Or, it could have memory leak.
2. Since member type has been changed in class Device, it need
to use member domain operator "." to access to the sub-member in
object.
3. There's no need to consider experimental API of ibv_query_port.
So, merge ibv_query_port in the prolog.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
b4596011f5 msg/async/rdma: cosmetics by set member value in initialize list
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
cdfcc6b59c msg/async/rdma: define package sequence numbers macro
Refer to Doc: InfiniBandTM Architecture Specification Volume 1 Ver1.2.1
Section: 9.2 BASE TRANSPORT HEADER

bits  |31---------24 | 23-----------16 | 15----------8 | 7---------0 |
bytes |______________________________________________________________|
0 - 3 |____OpCode____|__|SE|M|Pad|Tver_|_________ Partition Key______|
4 - 7 |___Reserved___|______________Destination QP___________________|
8 -11 |A|Reserved 7__|________ PSN - Packet Sequence Number _________|

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
4f5a31ab9d msg/async/rdma: limit buffer size under rdma max memory region size
The allocated buf size should be under hardware's max_mr_size. Or it'll
trigger out-of-bound access problem when calling ibv_reg_mr.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
9c7ba67cc7 msg/async/rdma: check device_attr->max_srq is not zero
Some rdma devices don't support srq(shared receive queue).
Check hardware attribute if ceph is configured to use srq.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
e59a764dd1 msg/async/rdma: check memory region size before tx buffer allocation
It'll trigger out-of-bound access problem in kernel if the required
memory region size is bigger than ibv_device_attr.max_mr_size

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Changcheng Liu
175d52ac29 msg/async/rdma: correct receive queue length info
It will hit below misleading log without this patch:
   Infiniband init requested receive queue length 4095 is too big. Setting 4095

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-08-23 10:45:05 +08:00
Sage Weil
f61b0a21d6 Merge PR #29806 into master
* refs/pull/29806/head:
	mgr/BaseMgrModule: tolerate Int or Long for health 'count'

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-22 14:07:49 -05:00
Ali Maredia
79eb107a14
Merge pull request #29298 from zhangsw/rgw-fix-bug-listobjv2-startafter
rgw: continuationToken or startAfter shouldn't be returned if not specified
2019-08-22 14:12:30 -04:00
Sage Weil
d850edf0f9 Merge PR #29780 into master
* refs/pull/29780/head:
	osd/PeeringState: semi-colon after DECLARE_LOCALS
	osd/PeeringState: on_new_interval on child PG after split

Reviewed-by: Samuel Just <sjust@redhat.com>
2019-08-22 12:52:16 -05:00
Sage Weil
2dca76ac84 Merge PR #29774 into master
* refs/pull/29774/head:
	qa/standalone/scrub/osd-scrub-snaps: snapmapper omap is now 'm'

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-08-22 12:27:26 -05:00
Sage Weil
b7134f4ea9 Merge PR #29807 into master
* refs/pull/29807/head:
	mgr/pg_autoscaler: fix race with pool deletion

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-08-22 12:26:11 -05:00
Casey Bodley
e131d6f496
Merge pull request #17719 from mikulely/fix-usage-stats
rgw: distinguish different get_usage for usage log

Reviewed-by: Robin H. Johnson <rjohnson@digitalocean.com>
2019-08-22 11:04:38 -04:00
Kefu Chai
aab5c451e1
Merge pull request #29544 from tchaikov/wip-doc-search-CSP
doc: always load resources via HTTPS

Reviewed-by: Tiago Melo <tmelo@suse.com>
2019-08-22 17:56:58 +08:00
Kefu Chai
eb247c943a doc: always load resources via HTTPS
Signed-off-by: Tiago Melo <tmelo@suse.com>
2019-08-22 16:19:17 +08:00
Xie Xingguo
ea216e52f6
Merge pull request #29755 from xiexingguo/wip-inc-recovery-4
osd: do not invalidate clear_regions of missing item at boot

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-08-22 13:48:15 +08:00
Sage Weil
69a712ad4b mgr/BaseMgrModule: tolerate Int or Long for health 'count'
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-21 17:06:32 -05:00
Andrew Schoen
f0e2c59e8c
Merge pull request #29804 from alfredodeza/wip-rm41378
ceph-volume tests set the noninteractive flag for Debian

Reviewed-by: Andrew Schoen <aschoen@redhat.com>
2019-08-21 16:59:18 -05:00
Sage Weil
4e7f6b4088 Merge PR #29744 into master
* refs/pull/29744/head:
	qa/run-standalone.sh: fix python path
	qa/standalone/mon/health-mute.sh: fix up rachet test
	qa/standalone/mon/health-mute.sh: s/kill daemons/kill_daemons/

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-08-21 15:02:27 -05:00
Sage Weil
e6b0f2ab0a Merge PR #29749 into master
* refs/pull/29749/head:
	mon/HealthMonitor: remove unused label

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-08-21 15:02:14 -05:00
Sage Weil
d405167fa1 Merge PR #29757 into master
* refs/pull/29757/head:
	osd: always initialize local variable

Reviewed-by: Sage Weil <sage@redhat.com>
2019-08-21 15:02:01 -05:00
Sage Weil
617cdb619e Merge PR #29763 into master
* refs/pull/29763/head:
	qa/suites/rados: whitelist POOL_APP_NOT_ENABLED warning

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-08-21 15:01:50 -05:00
Sage Weil
9d45bd9cc9 mgr/pg_autoscaler: fix race with pool deletion
The pool_stats map comes from a get('df') that may not include a pool
because it was just deleted.

Fixes: https://tracker.ceph.com/issues/41386
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-21 14:56:43 -05:00
Alfredo Deza
89231c9a60 ceph-volume tests set the noninteractive flag for Debian, to avoid prompts in apt
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2019-08-21 14:15:32 -04:00
Patrick Donnelly
dad94db7ae
Merge PR #28378 into master
* refs/pull/28378/head:
	qa/tasks: introduce Thrasher base class
	qa/tasks: Fix typo
	qa/tasks: manage thrashers
	qa/tasks: start DaemonWatchdog when ceph starts
	qa/tasks: make watch and bark handle more daemons
	qa/tasks: move DaemonWatchdog to new file

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-21 10:57:15 -07:00
Mykola Golub
e867804fe7
Merge pull request #29773 from dillaman/wip-41352
pybind/mgr/rbd_support: fix missing variable in error path

Reviewed-by: Mykola Golub <mgolub@suse.com>
2019-08-21 17:26:01 +03:00