in a large cluster, there are better chances that the OSD fails to trim
the cached osdmap in a timely manner. and sometimes, it is just unable
to keep up with the incoming osdmap if skip_maps, so the osdmap cache
can keep building up to over 250GB in size. in this change
* publish_superblock() before trimming the osdmaps, so other osdmap
consumers of OSDService.superblock won't access the osdmaps being
removed.
* trim all stale osdmaps in batch of conf->osd_target_transaction_size
if skip_maps is true. in my test, it happens when the osd only
receives the osdmap from monitor occasionally because the osd happens
to be chosen when monitor wants to share a new osdmap with a random
osd.
* always use dedicated transaction(s) for trimming osdmaps. so even in
the normal case where we are able to trim all stale osdmaps in a
single batch, a separated transaction is used. we can piggy back
the commits for removing maps, but we keep it this way for simplicity.
* use std::min() instead MIN() for type safety
Fixes: http://tracker.ceph.com/issues/13990
Signed-off-by: Kefu Chai <kchai@redhat.com>
Currently if we fail to read a SystemMetaObj we try to log the
MetaObject id, however this will not be set mostly as read_id has
failed, so we end up logging an empty id, changing this to log
the object name instead
Fixes: http://tracker.ceph.com/issues/15776
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
When applying ceph patches, some warnings reported, e.g.
doc/scripts/gen_state_diagram.py:99: trailing whitespace.
Signed-off-by: Li Peng <lip@dtdream.com>
For directIO requirement, we need check bufferpt whether size aligned
and the address aligned. To do this, we should call is_aligned &&
is_n_align_sized. Every func also list all ptr of bufferlist.
To reduce one list, we add is_aligned_size_and_memroy(align_size,
align_memory) which only need list once.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
For the requirement of directio, the content maybe rebuild.
In fact, rebuild_aligned_size_and_memory first check is_n_align_sized &&
is_algined and if need it rebuild.
So using rebuild_aligned_size_and_memory can remove the check.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Make those func return value from void to bool. Using the return value
we can know whether really rebuild content in order to optimize .
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
One such example is popping the last entry from an object. The next
object will be automatically prefetched. When that object is received,
we do not want to alert the user that entries are available since
try_pop_front already indicated more records were available.
Fixes: http://tracker.ceph.com/issues/15755
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
When replaying a journal flush event, do not start processing the next
journal entry until after the flush is in progress to ensure the barrier
is correctly guarding against future writes.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Using this option control whether preallocate space when bluesotre
block/db_path/wal_path use file instead block device.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Previously, fsx deleted all test data upon successful completion. Add
an option to leave the data behind for further analysis.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Fixes: http://tracker.ceph.com/issues/15745
When complete_writing_data() is called, if pending_data_bl is not empty
we still need to handle stripe transition correctly. If pending_data_bl
has more data that we can allow in current stripe, move to the next one.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
This way we always return a safe upper bound on the amount of time
since we did a check. Among other things, this prevents us from
returning a value of 0, which is confusing.
Fixes: http://tracker.ceph.com/issues/15760
Signed-off-by: Sage Weil <sage@redhat.com>