In general we shall build missing set (and hence clean_regions)
based on pg log. However, currently there are still 5 cases we might call
missing.add to add a new pg_missing_item into the missing set
explicitly (or replace an existing pg_missing_item entirely):
1. we explicitly build missing set on startup, in which case
we know we are trying to be compatiable with pre-kraken versions,
so it should be ok for us to disable inc-recovery.
2. we are currently processing authoritative log, and there are
some divergent objects detected. For simplicity (and correctness),
we should disable inc-recovery entirly for these objects.
3. we are re-building missing set, e.g., due to the global
CEPH_OSDMAP_RECOVERY_DELETES policy changing.
In this case we know we are at the end of upgrading from a
pervious version that is lack of CEPH_OSDMAP_RECOVERY_DELETES support.
Hence it should be the recommended option to disable inc-recovery
simultaneously since these objects should be lack of inc-recovery support
too.
4. we are adding or re-adding missing object into primary's missing_loc.
It doesn't matter whether we have a correct clean_regions there
since we never actually refer to that field from missing_loc
when we actually start to perform object recovery later.
5. we are auto-repairing a corrupted object and hence the need of
adding it to the corresponding missing set first, e.g, by leveraging
the existing recovery procedure. In this case, we always disable
inc-recovery to make sure this object can be fully (and correctly)
recovered later.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/28855/head:
doc: document scrub summary in ceph status output
test: extend scrub control test to validate mds task status
mds: send scrub state changes to cluster log.
mds: periodically sent mds scrub status to ceph manager
mgr, mon: allow normal ceph services to register with manager
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/29167/head:
client: return -eio when sync file which unsafe reqs has been dropped
Reviewed-by: Zheng Yan <zyan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/28727/head:
test/crimson: resolve name collision
test: switch to ldout; let users specify mon debug level
test: add new ElectionLogic unit test framework
elector: const-ify a bunch of functions
elector: swap order of parameters in ElectionLogic::receive_propose
elector: Update Elector and ElectionLogic function documentation
elector: persist the epoch in bump_epoch()
elector: make some more ElectionLogic members private
elector: fix privacy and restore dout in Elector
elector: don't clear peer_info in bump_epoch()
elector: split ElectionLogic into its own compilation unit
elector: move all the elector callouts into the Elector
elector: make ElectionLogic private to Elector; undo most public shenanigans
elector: create declare_standlone_victory in Elector/Logic for Monitor
elector: make ElectionLogic::declare_victory private
elector: route _bump_epoch through the interface-to-be
elector: rename handle_propose_logic -> receive_propose
elector: hoist handle_victory into ElectionLogic
elector: hoist handle_ack into ElectionLogic
elector: hoist victory into ElectionLogic
elector: hoist expire into ElectionLogic
elector: hoist start into ElectionLogic
elector: hoist participating into ElectionLogic
elector: hoist init into ElectionLogic
elector: hoist defer into ElectionLogic
elector: split handle_propose in two and hoist into ElectionLogic
elector: hoist bump_epoch into ElectionLogic
elector: store accessors for ElectionLogic
elector: hoist Elector data bits out into a new ElectionLogic class
mon: Rearrange Paxos::dispatch to be a little cleaner
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
1. rename var i to be worker_id when creating Worker
"i" is assigned to be Worker::id, it means worker's id
2. rename EventCenter::idx to EventCenter::center_id
"idx" is EventCenter's index in global_centers obj.
rename it to be center_id.
3. rename EventCenter::init API's parameter n to be nevent
"n" is actually assigned to EventCenter::nevent. rename it
to be "nevent".
4. rename EventCenter::init API's paramter t to be type
"t" is corresponding to Epoll Driver's implementation's type.
5. rename EpollDriver::size to be EpollDriver::nevent
"size" is actually epoll events number, rename it to be "nevent"
6. use event_id as index name to get event instead of "j"
7. rename "nw" to be "nowait"
8. Processor::start unify variable name with Processor::accept & Processor::stop
==> auto &l to be auto &listen_socket
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
There's no need to cache stack since RDMAWorker already has
Inifiniband obj ib & RDMADispatcher obj dispatcher.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. Don't use bare pointer to manage RDMADispatcher obj.
2. access RDMADispatcher obj directly instead of accessing it
from RDMAStack. This could avoid caching RDMAStack obj in
RDMAWorker & RDMADispatcher.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. Don't use bare pointer to manage Infiniband obj.
2. access Infiniband obj directly instead of accessing it from
RDMAStack. This could avoid caching RDMAStack obj in RDMAWorker
& RDMADispatcher.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
The original RDMAConnectedSocketImpl::read read date from buffers and
prefertch data into buffers for next round of reading. It makes the
logical a little complex and the code isn't smooth to be read.
In this patch:
1) RDMAConnectedSocketImpl::buffer_prefetch private API is added to
prefetch data into buffers at the head of read_buffers.
2) reduce one time of calling notify() to reduce context switches.
It's really not needed to notify upper layer to read data since current
read operation hasn't finished yet.
3) Simplify RDMAConnectedSocketImpl::read implementation.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. Below three bits are meaningless in pollfd::events field:
POLLERR, POLLHUP, or POLLNVAL.
2. QueuePair::pd is initialized in the initialize list.
There's no need to assign same value to it.
3. Remove the never used function Chunk::set_bound
4. Remove the never used function Chunk::set_offset
5. Remove the never used function QueuePair::is_error
6. Remove SimplePolicyMessenger used vars
7. remove socket_fd() interface since it's never used.
All data write/read is based on ConnectedSocketImpl::fd.
So, there's no need to expose socket_fd since it's never used.
8. Remove RDMAServerSocketImpl::get_fd which is not used.
BTW, RDMAServerSocketImpl::fd has the same function as get_fd.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
os/bluestore: more aggressive deferred submit when onode trim skipping
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Device::binding_port
1. port_id is more meaningful compared to i as variable name.
2. start port_id from 1 instead of 0.
PoolAllocator::malloc
1. make clear relationship among buffer/chunk/block/memory_region with new
variable name.
2. define the variable when it's first being used.
RDMAConnectedSocketImpl::submit
1. use "wait_copy_len" to replace "need_reserve_bytes" which stands for the memory
that is waiting to be copied into chunk.
2. use "copy_start" to replace "copy_it" which stands for the start iterator to be copied.
3. use "total_copied" to replace "total" which stands for the memory that has been copied.
allocate huge page
1. use "HUGE_PAGE_SIZE_2MB" to be used for 2MB page alignment.
2. use "ALIGN_TO_PAGE_2MB" to stands align request size to 2MB.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
The parameter "block" points to mem_info::chunks space. It's not quite
clear about the function of "reinterpret_cast<mem_info *>(block) - 1;".
Get the mem_info::chunks address and minus the member offset from struct
head to get mem_info address.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
When releasing read chunk to pool, the chunk::offset & chunk::bound
should be reset to zero. For write chunk, it's better to reset
chunk::offset to zero and chunk::bound to chunk length which means that
[offset, bound) is writable.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
API usage:
int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr)
Input Parameters:
qp struct ibv_qp from ibv_create_qp
wr first work request (WR)
Output Parameters:
bad_wr pointer to first rejected WR
Return Value:
0 on success, -1 on error.
If the call fails, errno will be set to indicate the reason for the failure.
To avoid wrong checking return value, it's better to initialize the
value to be nullptr.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. There's no need to get stack & dispatcher from RDMAStack again
since RDMAWorker has stored the value.
2. cache the Infiniband object to be used in local scope.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
After refactoring, there's no need to do below judgement
- if (c != buffers.end() && (*c)->over())
- ++c;
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. It's not proper to allocate large space in stack. e.g. rx_queue_len is 4096.
The patch changes to allocate rx_work_request and isge in heap.
2. Set rx_work_request and isge array whole space into zero which could avoid
setting the space into zero one by one in the while loop.
3. Change parameter name "num" to be "rq_wr_num" to improve readiness
rq_wr_num i.e. receive-queue_work-request_number
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. all values are initialized in construction function
In this way, it's easy to construct Chunk object in
PoolAllocator::malloc function.
2. For read chunk, member bound is initialized to be 0.
3. For send chunk, member bound is initialzied to be full space size.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Extract the long lambda function to improve readability.
There's no advantage since "this" pointer is also needed
in original lambad function.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. List all asynchronous event of the RDMA device
2. Output the fatal error events to check RDMA device status
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
The original implementation makes it's hard to understand:
1) Whether timer event should be executed.
2) How long should epoll wait for timeout.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>