Client goes to WAIT state when it is delayed to reconnect, or wants to
be replaced by a newly established socket.
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
We cannot left the pending messages in the out_q, because with lossless
policy, they can be partially sent and even acknowledged.
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
server_cookie, client_cookie, connect_seq and global_seq are identifiers
of a stateful connection.
We already have some related implementations, but they are stub code
when implement lossy policy and cannot work properly.
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
1. always take osd_scrub_sleep for manually initiated
scrubs
2. when scrub_time_permit() return true for scheduled
ones, the existing osd_scrub_sleep is used
3. when scrub_time_permit() return false for scheduled
ones, there may be 2 scenarios
3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
let's take osd_scrub_sleep
3.2 otherwise, let's take osd_scrub_extended_sleep
Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
* abort_in_fault(): a fault is happening and needs to be handled.
* abort_protocol(): abort the current protocol state due to preemptive
state change.
* abort_in_close(): close this connection and abort the current protocol
state due to some fatal error.
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
there is chance that we could use an OSD which does not have incmap of a
certain epoch for rebuilding the monstore. and since OSD does not read
and store the incmap if the MOSDMap message already has the fullmap of
that fullmap, and if an OSD does not have previous fullmap, monitor
will just send it the fullmao. so it's not unusual that an OSD has
a fullmap of some epoch without corresponding incmap.
Fixes: https://tracker.ceph.com/issues/41177
Signed-off-by: Kefu Chai <kchai@redhat.com>
Current the deferred txcs will hung till the amount reach
the bluestore_deferred_batch_ops(default value is 64).
But in some extreme case, the client may long time only generate
non-deferred txc, and meanwhile the meta update,like osdmap,
may generate few deferred txcs with low frequency, so these txcs may hung
too long time.the problem is osdmap updating usually will write
3 objects: full, inc, superblock. So when these txcs hung,
the ref of these onodes will be hold. when bluestore cache trim
onodes,there is an option called bluestore_cache_trim_max_skip_pinned
(default value is 64), so 22 osdmaps update deferred txcs will hold 66 onodes,
if these onodes is on the endian of lru waiting trim, the trim will skipped, and lead to
onode cached more and more..
here is the more aggressive approach to void skipping trim too long time..
Fixes: http://tracker.ceph.com/issues/21531
Signed-off-by: Zengran Zhang <zhangzengran@sangfor.com.cn>
list with delimiter will skip subfile with directory + after_delim_s,
but the code wrongly add after_delim_s to next marker regardless it have directory
Fixes: http://tracker.ceph.com/issues/40905
Signed-off-by: Tianshan Qu <tianshan@xsky.com>
On the aarch64 box I'm testing, this gives us a value of
7378697629483768832, which is not what we want.
I think we are better off relying on this limit being explicitly set via
environment variables (POD_* by kuberentes/rook) or via the command line.
This partially reverts 5c6b533697, but not
all of it, since we wan to keep the option itself, as it is now used by
common/config.cc when dealing with the POD_MEMORY_LIMIT env var.
Signed-off-by: Sage Weil <sage@redhat.com>
BlueStore has its own object size limit (2^32-1). Make sure the cluster
limit is below that or refuse to mkfs or mount.
Signed-off-by: Sage Weil <sage@redhat.com>
similar to https://github.com/ceph/ceph/pull/29538/ we unlock a shared_lock with
unlock causing a crash. Also scope the single line if statements to make the
code more concise
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
* refs/pull/29292/head:
os/bluestore: warn on no per-pool omap
os/bluestore: fsck: warning (not error) by default on no per-pool omap
os/bluestore: fsck: int64_t for error count
os/bluestore: default size of 1 TB for testing
os/bluestore: behave if we *do* set PGMETA and PERPOOL flags
os/bluestore: do not set both PGMETA_OMAP and PERPOOL_OMAP
os/bluestore: fsck: only generate 1 error per omap_head
os/bluestore: make fsck repair convert to per-pool omap
os/bluestore: teach fsck to tolerate per-pool omap
os/bluestore: ondisk format change to 3 for per-pool omap
mon/PGMap: add data/omap breakouts for 'df detail' view
osd/osd_types: separate get_{user,allocated}_bytes() into data and omap variants
mon/PGMap: fix stored_raw calculation
mon/PGMap: add in actual omap usage into per-pool stats
osd: report per-pool omap support via store_statfs_t
os/bluestore: set per_pool_omap key on mkfs
osd/osd_types: count per-pool omap capable OSDs
os/bluestore: report omap_allocated per-pool
os/bluestore: add pool prefix to omap keys
kv/KeyValueDB: take key_prefix for estimate_prefix_size()
os/bluestore: fix manual omap key manipulation to use Onode::get_omap_key()
os/bluestore: make omap key helpers Onode methods
os/bluestore: add Onode::get_omap_prefix() helper
os/bluestore: change _do_omap_clear() args
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
The previous implementation moved extents individually. This caused
problems when moving an extent with a small alloc_size that wasn't
a multiple of the target device's alloc_size.
Instead, identify files with extents that need to be moved, and then read
the file in its entirety and rewrite it in its entirety.
Signed-off-by: Sage Weil <sage@redhat.com>
Keep an alloc_size vector so that we have this value handy at all times.
Allow bluestore to fetch this value directly instead of looking at the
bluefs_* config options since this encapsulates things a bit better, and
also isn't vulnerable to the config setting changing at runtime.
Signed-off-by: Sage Weil <sage@redhat.com>
Since these are sent as a part of a POST request which is usually urlencoded,
the json parser would later return invalid json for jsons containing whitespace
Fixes: https://tracker.ceph.com/issues/41189
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
We should never print log messages to stdout, as this should be reserved
for output of ceph-volume.
Fixes: https://tracker.ceph.com/issues/41158
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
common/Finisher: remove some lock acquisitions.
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Jianpeng Ma <jianpeng.ma@intel.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>