Re-initializing the same datacryptor, causes a memory leak of the old encryption key.
This commit fixes this issue.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
If DataCryptor fails, either in init_context or update_context,
the encryption context is not returned, which causes a memory leak.
This commit fixes this issue.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
librbd/cache/pwl/ssd: make log entry 64 bit and add ssd version control
Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
In fact, we not only make sure ops in order in func process_writeback_dirty_entries,
but also make sure ops in order between func process_writeback_dirty_entries.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
For wait_for_ops(next_ctx). this next_ctx may run in aio_thread.
Then the next program runs on the aio thread. remove_pool_file()
calls bdev->close(), then calles _aio_stop(), exec aio_thread.join(),
cause assert. Thread can't join itself. Fix it by adding close ctx
to m_work_queue, so close() can run in work queue thread.
At the same time, correct the order of wait_for_ops().
flush_dirty_entries(next_ctx) may call wake_up() and start_op().
so moving wait_for_ops() behind flush_dirty_entries(next_ctx) is more
appropriate.
Fixes: https://tracker.ceph.com/issues/52566
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
finish_op() of ssd cache is not in the end of callback function in
append_op_log_entries(), and after finish_op(), some operation also
need to get m_lock. So, during shutdown, wait_for_ops() thinks all OPs
are over, and no thread will acquire the m_lock, In the subsequent
operation of shutdown, the m_lock is obtained, and _aio_stop() in
bdev->close() waits for all aio_writes() and aio_submit() to end
when the m_lock is held, but the callback function of aio_write() is
waiting for the m_lock, causing a deadlock. Move finish_op() to the
end to fix dead lock.
Fixes: https://tracker.ceph.com/issues/52235
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Consider the following workload:
writeA(0, 4096)
writeB(0, 512).
pwl can makre sure writeA persist to cache before writeB.
But when flush to osd, it use async-read to read data from cache and in
the callback function they issue write to osd.
So although we by order issue aio-read(4096), aio-read(512). But we
can't make sure the return order.
If aio-read(512) firstly return, the write order to next layer is
writeB(0, 512)
writeA(0, 4096).
This is wrong from the user point.
To avoid this occur, we should firstly read all data from cache. And
then send write by order.
Fiexs: https://tracker.ceph.com/issues/52511
Tested-by: Feng Hualong <hualong.feng@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
copy/deep_copy use object_map to judge whether object exist.
If w/ librbdo pwl cache, flush can't flush data to osd which
change objectmap state. So we should send flush w/ FLUSH_SOURCE_INTERNAL
to make data flush to osd.
Fixes:https://tracker.ceph.com/issues/53057
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
For external flush request, it new syncpoint after passing
guardedrequest and before dispatch. Then dispatch bypass deferred
queue But the last write request may still in the deferred queue.
It don't dispatch and not associated with any syncpoint. The
external flush request will bypass the previous write request in
deferred queue now. This does not conform to the semantics of
external flush requests. External flush request should strictly
follow the order of dispath.
But for internal flush request, it will be dispatched after all
write request which associated with previous syncpoint, persisted
in cache. C_gather guarantee it.
It is necessary to distinguish between external and internal
flush requests. Internal flush can and should be dispatched in
advance bypass deferred queue. At the same time, the order of
external requests needs to be kept unchanged. So cancel advance
dispatch of external flush request.
Fixes: https://tracker.ceph.com/issues/52599
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
The obvious use case is an image with a separate data pool but it could
be useful in other places too.
While at it, set_namespace() call in handle_v2_get_data_pool() is
redundant since create_ioctx() already takes care of it.
Fixes: https://tracker.ceph.com/issues/52961
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
WriteAroundObjectDispatch::write_same() should pass op_flags through
to dispatch_io() so that it can bypass the cache if needed.
Fixes: https://tracker.ceph.com/issues/52956
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Using uninitialized number_log_entries cause writesame req space
calculation error. sometimes fail in TestMockCacheSSDWriteLog.writesame.
Fixes: https://tracker.ceph.com/issues/52852
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
We use neorados on the I/O path so data_io_context needs to have
the same setting as data_ctx.
Fixes: https://tracker.ceph.com/issues/52648
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When retiring, m_blocks_to_log_entries doesn't remove
corresponding write_entry (should be `*it` not `entry`)
that will be retired. It leads to read error. And
there should also consider discard entries.
Fixes: https://tracker.ceph.com/issues/52579
Signed-off-by: Feng Hualong <hualong.feng@intel.com>
Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated
(uninitialized) pool trigger a lockup in ValidatePoolRequest state
machine caused by blocking selfmanaged_snap_{create,remove}() calls.
There are two reactor threads by default (librados_thread_count) but we
effectively need N + 1 reactor threads for N concurrent pool validation
requests, especially for small N.
Switch to aio_selfmanaged_snap_{create,remove}(). At the time this
code was initially written, these aio variants weren't available. The
workqueue offload introduced later worked prior to the move to asio in
pacific.
Fixes: https://tracker.ceph.com/issues/52537
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
TestLibRBD.TestFUA descript the following workload:
a)write/read the same image w/ pwl-cache
write_image = open(image_name);
read_image = open(image_name);
b)i/o workload is:
write(write_image)
write need EXLock and require EXLOCK
read(read_image)
in ExclusiveLock<I>::init(), firstly read need EXLOCK
so will require EXLOCK. write_image release EXLOCK(will
flush data to osd and remove cache). read_image init pwl-cache
and read-io firstly enter pwl-cache and missed and then read
from osd.
write(write_image)
write need EXLOCK and require EXLOCK. This make read_image remove
empty cache. write_image init cache pool and write data to cache.
read(read_image)
In send_set_require_lock(), it set write need EXLOCK.
So read don't require EXLOCK and dirtyly read from osd.
Because second-read don't need EXLOCK and make write_image don't
release EXLOCK(flush dirty data to osd and shutdown pwl-cache).
This make second-read don't read the latest data.
So we should make read also need EXLOCK when enable pwl-cache.
Fixes: https://tracker.ceph.com/issues/51438
Tested-by: Feng Hualong <hualong.feng@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Ictx is deleted when "ictx->state->open()" and "ictx->state->close()"
fail, and then "lderr(ictx->cct)" crashes.
Fixes: https://tracker.ceph.com/issues/52522
Signed-off-by: Wang ShuaiChao <wangshuaich@chinatelecom.cn>
Met the following compiler warning message:
>[38/80] Building CXX object
src/librbd/CMakeFiles/librbd_plugin_pwl_cache.dir/cache/pwl/ssd/WriteLog.cc.o
>../src/librbd/cache/pwl/ssd/WriteLog.cc:37:25: warning: unused variable
'ops_appended_together' [-Wunused-const-variable]
>const unsigned long int ops_appended_together = MAX_WRITES_PER_SYNC_POINT;
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
In fact, although in get_cache_bl it use lock to protect, it can't
protect function "list& operator= (const list& other)".
So we should use copy_cache_bl.
Fixes: https://tracker.ceph.com/issues/52400
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
In SyncPointLogOperation::clear_earlier_sync_point(),
sync_point->log_entry->next_sync_point_entry was prematurely set to
nullptr in clear_earlier_sync_point(). It is in write op stage, but
next_sync_point_entry is used in writeback stage in
handle_flushed_sync_point().
handle_flushed_sync_point() may pass a nullptr
cause assert in m_work_queue.The solution is to move the statement
that set next_sync_point_entry to nullptr after it is used.
Fixes: https://tracker.ceph.com/issues/52465
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Currently, it will load existing entries after restart and cacl
m_bytes_allocated based on those entries. But currently there are
the following problems:
1: The allocated of write-same is not calculated for rwl & ssd cache.
2: for ssd cache, it not include the size of log-entry itself and don't
consider data alignment. This will cause less calculation and more
allocatation later. And will overwrite the data which don't flush to osd
and make data lost.
The calculation methods of ssd and rwl are different. So add new api
allocated_and_cached_data() to implement their own method.
For SSD cache, we dirtly use m_first_valid_entry & m_first_free_entry to
calc m_bytes_allocated.
Fixes:https://tracker.ceph.com/issues/52341
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
* refs/pull/42620/head:
mds: switch mds_lock to fair mutex
common/Timer: make SafeTimer a template
common/fair_mutex: add is_locked_by_me support
common: do not compile condition_variable_debug in none debug mode
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Consider one control_block which cotain multi encode(WriteLogCacheEntry):
Log1: WriteLogEntry
Log2: WriteLogEntry
Log3: Non-WriteLogEntry
For this case, currently calc method is: control_block_pos + sizeof(control_block).
But in fact, it should: control_block_pos + sizeof(control_block) +
data_length(Log1 + Log2).
Wrong first_valid_entry will persist to superblock and restart to read.
This cause read wrong position and when decode(WriteLogCacheEntry) it
will report bug.
Fixes: https://tracker.ceph.com/issues/52323
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>