Commit Graph

3237 Commits

Author SHA1 Message Date
Or Ozeri
5de8791da7 librbd/crypto: remove unused member from ShutDownCryptoRequest
m_crypto is not used - remove it.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
9992bbaa53 librbd/crypto: fix memory leak in openssl/DataCryptor
Re-initializing the same datacryptor, causes a memory leak of the old encryption key.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
044280dcbe librbd/crypto: fix memory leak in ShutDownCryptoRequest
If crypto object dispatch does not exist, a context pointer is leaked.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
3af5bb7c61 librbd/crypto: fix memory leak in when DataCryptor fails
If DataCryptor fails, either in init_context or update_context,
the encryption context is not returned, which causes a memory leak.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Deepika Upadhyay
44cd7c7650
Merge pull request #42046 from CongMinYin/align-entry-bit
librbd/cache/pwl/ssd: make log entry 64 bit and add ssd version control

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2021-11-13 17:27:27 +05:30
Deepika Upadhyay
b287de4219
Merge pull request #43837 from majianpeng/librbd-fix-reorder-problem-between-process_writeback_dirty_entries
librbd/cache/pwl: fix reorder issue between func process_writeback_dirty_entries

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2021-11-13 17:26:24 +05:30
Deepika Upadhyay
b3f1af6cc7
Merge pull request #43677 from majianpeng/remove-larger-debug-message
librbd/pwl: don't need print cache_bl contents.

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2021-11-10 01:32:10 +05:30
Jianpeng Ma
76f4d29d92 librbd/cache/pwl: fix reorder issue between func process_writeback_dirty_entries
In fact, we not only make sure ops in order in func process_writeback_dirty_entries,
but also make sure ops in order between func process_writeback_dirty_entries.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-11-08 14:41:53 +08:00
Yin Congmin
dc566a3cd3 librbd/cache/pwl/ssd: add layout version control
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-11-05 11:36:57 +08:00
Deepika Upadhyay
229f5151ff
Merge pull request #43461 from CongMinYin/fix-flush-advance
librbd/cache/pwl: fix external flush dispatch in advance

Reviewed-by: Mykola Golub <mgolub@suse.com>
2021-11-04 13:15:42 +05:30
Deepika Upadhyay
efe1192448
Merge pull request #42950 from CongMinYin/fix-dead-lock-during-shutdown
librbd/cache/pwl/ssd: fix dead lock and assert during shutdown

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2021-11-03 23:55:31 +05:30
Deepika Upadhyay
41d3c831c0
Merge pull request #43659 from majianpeng/send-internal-flush-for-rbd-copy
librbd: send FLUSH_SOURCE_INTERNAL when do copy/deep_copy. 

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2021-11-03 20:25:21 +05:30
Yin Congmin
c091ec3471 librbd/cache/pwl/ssd: make log entry pointers 64 bit (on-disk format change)
Fixes: https://tracker.ceph.com/issues/50675

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-11-02 11:46:04 +08:00
Yin Congmin
94f9873718 librbd/cache/pwl: fix assert in _aio_stop() during shutdown
For wait_for_ops(next_ctx). this next_ctx may run in aio_thread.
Then the next program runs on the aio thread. remove_pool_file()
calls bdev->close(), then calles _aio_stop(), exec aio_thread.join(),
cause assert. Thread can't join itself. Fix it by adding close ctx
to m_work_queue, so close() can run in work queue thread.

At the same time, correct the order of wait_for_ops().
flush_dirty_entries(next_ctx) may call wake_up() and start_op().
so moving wait_for_ops() behind flush_dirty_entries(next_ctx) is more
appropriate.

Fixes: https://tracker.ceph.com/issues/52566

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-11-02 09:36:56 +08:00
Yin Congmin
c531768838 librbd/cache/pwl/ssd: move finish_op() to the end of callback function
finish_op() of ssd cache is not in the end of callback function in
append_op_log_entries(), and after finish_op(),  some operation also
need to get m_lock. So, during shutdown, wait_for_ops() thinks all OPs
are over, and no thread will acquire the m_lock, In the subsequent
operation of shutdown, the m_lock is obtained, and _aio_stop() in
bdev->close() waits for all aio_writes() and aio_submit() to end
when the m_lock is held, but the callback function of aio_write() is
waiting for the m_lock, causing a deadlock. Move finish_op() to the
end to fix dead lock.

Fixes: https://tracker.ceph.com/issues/52235

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-11-01 14:53:49 +08:00
Yin Congmin
066b8a6d2e librbd/cache/pwl: Check the cache is clean
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-11-01 14:53:40 +08:00
Jianpeng Ma
1fc3be2480 librbd/cache/pwl: fix reorder when flush cache-data to osd.
Consider the following workload:
writeA(0, 4096)
writeB(0, 512).
pwl can makre sure writeA persist to cache before writeB.
But when flush to osd, it use async-read to read data from cache and in
the callback function they issue write to osd.
So although we by order issue aio-read(4096), aio-read(512). But we
can't make sure the return order.
If aio-read(512) firstly return, the write order to next layer is
writeB(0, 512)
writeA(0, 4096).
This is wrong from the user point.

To avoid this occur, we should firstly read all data from cache. And
then send write by order.

Fiexs: https://tracker.ceph.com/issues/52511

Tested-by: Feng Hualong <hualong.feng@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-11-01 09:25:52 +08:00
Jianpeng Ma
a2ae83f8aa librbd: send FLUSH_SOURCE_INTERNAL when do copy/deep_copy.
copy/deep_copy use object_map to judge whether object exist.
If w/ librbdo pwl cache, flush can't flush data to osd which
change objectmap state. So we should send flush w/ FLUSH_SOURCE_INTERNAL
to make data flush to osd.

Fixes:https://tracker.ceph.com/issues/53057
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-11-01 08:33:23 +08:00
Yin Congmin
9951868fc3 librbd/cache/pwl: cancel advance dispatch of external flush request
For external flush request, it new syncpoint after passing
guardedrequest and before dispatch. Then dispatch bypass deferred
queue But the last write request may still in the deferred queue.
It don't dispatch and not associated with any syncpoint. The
external flush request will bypass the previous write request in
deferred queue now. This does not conform to the semantics of
external flush requests. External flush request should strictly
follow the order of dispath.

But for internal flush request, it will be dispatched after all
write request which associated with previous syncpoint, persisted
in cache. C_gather guarantee it.

It is necessary to distinguish between external and internal
flush requests. Internal flush can and should be dispatched in
advance bypass deferred queue. At the same time, the order of
external requests needs to be kept unchanged. So cancel advance
dispatch of external flush request.

Fixes: https://tracker.ceph.com/issues/52599

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-10-28 10:37:45 +08:00
Jianpeng Ma
73873e3ef3 librbd/pwl: don't need print cache_bl contents.
It will produce  very very larger message.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-10-27 09:32:59 +08:00
Mykola Golub
03d922ec87
Merge pull request #41657 from sunnyku/wip-rbd-50787
librbd/object_map: rbd diff between two snapshots lists entire image content

Reviewed-by: Mykola Golub <mgolub@suse.com>
2021-10-25 16:19:48 +03:00
Mykola Golub
82c16d39c3
Merge pull request #43573 from idryomov/wip-create-ioctx-preserve-full-try
librbd: preserve CEPH_OSD_FLAG_FULL_TRY in create_ioctx()

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
2021-10-22 09:49:12 +03:00
Ilya Dryomov
7cc7efae2a librbd: preserve CEPH_OSD_FLAG_FULL_TRY in create_ioctx()
The obvious use case is an image with a separate data pool but it could
be useful in other places too.

While at it, set_namespace() call in handle_v2_get_data_pool() is
redundant since create_ioctx() already takes care of it.

Fixes: https://tracker.ceph.com/issues/52961
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-17 17:24:41 +02:00
Ilya Dryomov
0dcea098cf librbd: honor FUA op flag for write_same() in write-around cache
WriteAroundObjectDispatch::write_same() should pass op_flags through
to dispatch_io() so that it can bypass the cache if needed.

Fixes: https://tracker.ceph.com/issues/52956
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-15 18:24:58 +02:00
Ilya Dryomov
f5b9646108
Merge pull request #43182 from CongMinYin/fix-writesame-assert
librbd/cache/pwl: initialize number_log_entries

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-10 17:33:51 +02:00
Yin Congmin
dd33684733 librbd/cache/pwl: initialize number_log_entries
Using uninitialized number_log_entries cause writesame req space
calculation error. sometimes fail in TestMockCacheSSDWriteLog.writesame.

Fixes: https://tracker.ceph.com/issues/52852

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-10-08 10:15:48 +00:00
Ilya Dryomov
4ee20f859b librbd: propagate CEPH_OSD_FLAG_FULL_TRY from IoCtx to IOContext
We use neorados on the I/O path so data_io_context needs to have
the same setting as data_ctx.

Fixes: https://tracker.ceph.com/issues/52648
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-17 20:13:04 +02:00
Ilya Dryomov
6d4535f221
Merge pull request #42555 from hualongfeng/retire_error_change
librbd/cache/pwl/ssd: remove correct m_blocks_to_log_entries entry

Reviewed-by: Jianpeng Ma <jianpeng.ma@intel.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-13 12:04:34 +02:00
Ilya Dryomov
254142a04b
Merge pull request #43038 from majianpeng/librbd-pwl-exclusive-lock
librbd: require exclusive lock for reads if pwl cache is enabled

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-13 11:44:09 +02:00
Feng Hualong
01bb75a105 librbd/cache/pwl/ssd: remove correct m_blocks_to_log_entries entry
When retiring, m_blocks_to_log_entries doesn't remove
corresponding write_entry (should be `*it` not `entry`)
that will be retired. It leads to read error. And
there should also consider discard entries.

Fixes: https://tracker.ceph.com/issues/52579
Signed-off-by: Feng Hualong <hualong.feng@intel.com>
2021-09-13 10:49:18 +02:00
Ilya Dryomov
d54ece56cc
Merge pull request #42984 from majianpeng/pwl-ssd-race-bug
librbd/cache/pwl/ssd: fix a race between get_cache_bl() and remove_cache_bl()

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-10 11:36:45 +02:00
Ilya Dryomov
f455a7a6c3
Merge pull request #42883 from majianpeng/pwl-ssd-calc-allocated-bug
librbd/cache/pwl: fix m_bytes_{allocated,cached} calculation on reopen

Reviewed-by: Yin Congmin <congmin.yin@intel.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-10 09:41:18 +02:00
Ilya Dryomov
12ab78af19
Merge pull request #43086 from idryomov/wip-rbd-validate-pool-async
librbd: fix pool validation lockup

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
2021-09-09 15:22:43 +02:00
Ilya Dryomov
ba04352b8f librbd: drop ValidatePoolRequest::m_op_work_queue
It is unused now.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-08 12:29:20 +02:00
Ilya Dryomov
56d41a9ada librbd: fix pool validation lockup
Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated
(uninitialized) pool trigger a lockup in ValidatePoolRequest state
machine caused by blocking selfmanaged_snap_{create,remove}() calls.
There are two reactor threads by default (librados_thread_count) but we
effectively need N + 1 reactor threads for N concurrent pool validation
requests, especially for small N.

Switch to aio_selfmanaged_snap_{create,remove}().  At the time this
code was initially written, these aio variants weren't available.  The
workqueue offload introduced later worked prior to the move to asio in
pacific.

Fixes: https://tracker.ceph.com/issues/52537
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-08 12:29:20 +02:00
Ilya Dryomov
a9a4cc9abf
Merge pull request #43006 from CongMinYin/fix-assert-in-handle_flushed_sync_point
librbd/cache/pwl: don't clear next_sync_point_entry prematurely

Reviewed-by: Jianpeng Ma <jianpeng.ma@intel.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-08 11:15:18 +02:00
Jianpeng Ma
621facb6e6 librbd: Read request need exclusive-lock when enable pwl-cache.
TestLibRBD.TestFUA descript the following workload:
a)write/read the same image w/ pwl-cache
  write_image = open(image_name);
  read_image  = open(image_name);
b)i/o workload is:
   write(write_image)
      write need EXLock and require EXLOCK

  read(read_image)
     in ExclusiveLock<I>::init(), firstly read need EXLOCK
     so will require EXLOCK. write_image release EXLOCK(will
     flush data to osd and remove cache). read_image init pwl-cache
     and read-io firstly enter pwl-cache and missed and then read
     from osd.

   write(write_image)
     write need EXLOCK and require EXLOCK. This make read_image remove
     empty cache. write_image init cache pool and write data to cache.

   read(read_image)
       In send_set_require_lock(), it set write need EXLOCK.
       So read don't require EXLOCK and dirtyly read from osd.

Because second-read  don't need EXLOCK and make write_image don't
release EXLOCK(flush dirty data to osd and  shutdown pwl-cache).
This make second-read don't read the latest data.

So we should make read also need EXLOCK when enable pwl-cache.

Fixes: https://tracker.ceph.com/issues/51438

Tested-by: Feng Hualong <hualong.feng@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-09-08 09:51:19 +08:00
Ilya Dryomov
22903c3965 librbd: report correct error for ictx->state->close()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-07 21:03:12 +02:00
Wang ShuaiChao
fa5d61ee51 librbd: fix use-after-free on ictx in list_descendants()
Ictx is deleted when "ictx->state->open()" and "ictx->state->close()"
fail, and then "lderr(ictx->cct)" crashes.

Fixes: https://tracker.ceph.com/issues/52522
Signed-off-by: Wang ShuaiChao <wangshuaich@chinatelecom.cn>
2021-09-07 21:03:12 +02:00
Jianpeng Ma
40dad4c30c librbd/cache/pwl/ssd: Remove unused parameter.
Met the following compiler warning message:
>[38/80] Building CXX object
src/librbd/CMakeFiles/librbd_plugin_pwl_cache.dir/cache/pwl/ssd/WriteLog.cc.o
>../src/librbd/cache/pwl/ssd/WriteLog.cc:37:25: warning: unused variable
'ops_appended_together' [-Wunused-const-variable]
>const unsigned long int ops_appended_together = MAX_WRITES_PER_SYNC_POINT;

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-09-07 10:33:56 +08:00
Jianpeng Ma
ca9eb28b4f librbd/cache/pwl/ssd: Remove useless locks.
Return a reference don't need by lock protect.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-09-07 10:33:54 +08:00
Jianpeng Ma
fe72b39537 librbd/cache/pwl/ssd: Fix a race between get_cache_bl() and remove_cache_bl()
In fact, although in get_cache_bl it use lock to protect, it can't
protect function "list& operator= (const list& other)".
So we should use copy_cache_bl.

Fixes: https://tracker.ceph.com/issues/52400
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-09-07 10:33:50 +08:00
Yin Congmin
a1a20041d5 librbd/cache/pwl: don't clear next_sync_point_entry prematurely
In SyncPointLogOperation::clear_earlier_sync_point(),
sync_point->log_entry->next_sync_point_entry was prematurely set to
nullptr in clear_earlier_sync_point(). It is in write op stage, but
next_sync_point_entry is used in writeback stage in
handle_flushed_sync_point().

handle_flushed_sync_point() may pass a nullptr
cause assert in m_work_queue.The solution is to move the statement
that set next_sync_point_entry to nullptr after it is used.

Fixes: https://tracker.ceph.com/issues/52465
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
2021-09-06 16:18:24 +02:00
Jianpeng Ma
a96ca93d69 librbd/cache/pwl: solve the problem of calc m_bytes_allocated when reload entries.
Currently, it will load existing entries after restart and cacl
m_bytes_allocated based on those entries. But currently there are
the following problems:
1: The allocated of write-same is not calculated for rwl & ssd cache.
2: for ssd cache, it not include the size of log-entry itself and don't
consider data alignment. This will cause less calculation and more
allocatation later. And will overwrite the data which don't flush to osd
and make data lost.

The calculation methods of ssd and rwl are different. So add new api
allocated_and_cached_data() to implement their own method.

For SSD cache, we dirtly use m_first_valid_entry & m_first_free_entry to
calc m_bytes_allocated.

Fixes:https://tracker.ceph.com/issues/52341
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-08-31 09:02:56 +08:00
Patrick Donnelly
d64b40fef2 Merge PR #42620 into master
* refs/pull/42620/head:
	mds: switch mds_lock to fair mutex
	common/Timer: make SafeTimer a template
	common/fair_mutex: add is_locked_by_me support
	common: do not compile condition_variable_debug in none debug mode

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-08-27 21:28:02 -04:00
Ilya Dryomov
989e8aa50d
Merge pull request #42843 from majianpeng/pwl-ssd-restart-failed
librbd/cache/pwl/ssd: fix first_valid_entry calculation in retire_entries()

Reviewed-by: Mahati Chamarthy <mahati.chamarthy@intel.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-26 21:49:29 +02:00
Jianpeng Ma
2d337fb122 librbd/cache/pwl/ssd: fix first_valid_entry calculation in retire_entries()
Consider one control_block which cotain multi encode(WriteLogCacheEntry):
Log1: WriteLogEntry
Log2: WriteLogEntry
Log3: Non-WriteLogEntry
For this case, currently calc method is: control_block_pos + sizeof(control_block).
But in fact, it should: control_block_pos + sizeof(control_block) +
data_length(Log1 + Log2).

Wrong first_valid_entry will persist to superblock and restart to read.
This cause read wrong position and when decode(WriteLogCacheEntry) it
will report bug.

Fixes: https://tracker.ceph.com/issues/52323
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2021-08-25 13:07:52 +02:00
Xiubo Li
215b12ae0a common/Timer: make SafeTimer a template
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-08-25 09:42:33 +08:00
Mykola Golub
320e059c95
Merge pull request #41405 from ideepika/wip-rbd-update-feature-test
test/librbd: add unit tests for rbd update features

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
2021-08-23 11:33:41 +03:00
Ilya Dryomov
8aaa6b0fef
Merge pull request #42792 from tchaikov/wip-librbd-cleanups
librbd: trade a map<> for a plain array

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-22 21:01:15 +02:00