Sage Weil
2a7393a446
os/newstore: more conservative default for aio queue depth
...
There appears to be a kernel aio bug when the queue depth is small.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:41 -04:00
Xiaoxi Chen
37da4292b3
os/newstore:close fd after writting with O_DIRECT
...
fix bug in 2b4c60e0a5
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:41 -04:00
Zhiqiang Wang
65055a0207
os/NewStore: need to increase the wal op length when combining overlays
...
Need to add the length of the combining overlays to the length of the
wal op.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:41 -04:00
Xiaoxi Chen
df239f0f62
os/Newstore:Fix collection_list_range
...
We need to rule out hobject_t::max before calling get_object_key
(in which will call get_filestore_key_u32 and get an assert failure)
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:41 -04:00
Sage Weil
4c9e37de8a
os/newstore: fix race in _txc_aio_submit
...
We cannot rely on the iterator pointers being valid after we submit the
aio because we are racing with the completion. Make our loop decision
before submitting and avoid dereferencing txc after that point.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Xiaoxi Chen
117330045f
os/newstore : Do not need to call fdatasync if using direct.
...
skip ::fdatasync if in direct mode.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
c552cd20ab
osd/NewStore: fix for skipping the overlay in _do_overlay_trim
...
When the offset of the write starts at the end of the overlay, that is,
p->first + p->second.length == offset, the overlay could be skipped as
well.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
793dcc396c
os/NewStore: combine contiguous overlays when writing all the overlays
...
Combine contiguous overlay writes to reduce the numbers of WAL writes
and fs writes.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Xiaoxi Chen
29ba720885
os/Nestore: batch cleanup
...
batch cleanup wal.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:40 -04:00
Sage Weil
4eca15a950
os/newstore: fix _txc_aio_submit
...
The aios may complete before _txc_aio_submit completes. In fact, the aio
may complete, commit to the kv store, and then queue more wal aio's before
we finish the loop. Move aios to a separate list to ensure we only submit
them once and do not right another CPU adjusting the list.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Sage Weil
41886c5420
os/newstore: throttle over entire write lifecycle
...
Take a global throttle when we submit ops and release when they complete.
The first throttles cover the period from submit to commit, while the wal
ones also cover the async post-commit wal work. The configs are additive
since the wal ones cover both periods; this should make them reasonably
idiot-proof.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
b1136fbd33
os/NewStore: data_map shouldn't be empty when writing all overlays
...
This should be an assert instead of creating new data_map.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
a165fe81c5
os/NewStore: clear the shared_overlays after writing all the overlays
...
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
dffa43051a
os/NewStore: don't clear overlay in the create/append case of write
...
Shouldn't clear the overlay in the create/append case of write.
Otherwise, this removes the overlay data and leads to data loss.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Sage Weil
f9f9e1b105
os/newstore: debug io_submit EAGAIN
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Sage Weil
dd79b4d832
os/newstore: release wal throttle when wal completes, not when queued
...
If we take the aio path, the io is queued immediately and the resources
are released back to the pool. Instead release them when wal completes.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
715fd3b7a2
os/newstore: todo
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
3b66712598
os/newstore: move toward state-machine
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
2317e446c5
os/newstore: use aio for wal writes, too
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
e580a82729
os/newstore: a few comments about wal
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
5d8e14653d
os/newstore: combined O_DSYNC with O_DIRECT
...
This avoids the need for an explicit fdatasync when doing O_DIRECT.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
b7a53b5874
os/newstore: basic aio support
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
ba0d8d7fdd
os/Newstore: add newstore_db_path option
...
The load of Keyvalue DB is heavy, allow user to put
DB to a seperate(fast) device.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:39 -04:00
Sage Weil
143d48570f
os/newstore: throttle wal work
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
efe218b4aa
os/newstore: show # o_direct buffers in debug output
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
7e1af1e616
os/newstore: use a threadpool for applying wal events
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
dfd389e66a
os/newstore: rebuild buffers to be page-aligned for O_DIRECT
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
552d95213b
ceph_test_objectstore: fix omap test cleanup
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
04f55d8d18
os/newstore: use fdatasync instead of fsync
...
On XFS at least, fdatasync is sufficient to make data readable.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
1321b880cc
os/newstore: update todo
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
65877832f8
os/Newstore: Check onode.omap_head in valid() and next()
...
The db iter will be set to KeyValueDB::Iterator() if onode.omap_head
not present. In that case if we touch the db iter we will get a segmentation
fault.
Prevent to touch the db iter when onode.omap_head is invalid(equals to 0).
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
1a97fd6cb7
Use .str() to output a stringstream.
...
a nit.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
9d0e925566
os/Newstore: Allow gap in _do_write append mode
...
We can allow some gap so we only need to ensure
onode.size <= offset.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
5e9c64b4dd
Implement get_omap_iterator
...
implemented get_omap_iterator
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
c86410239b
os/KeyValueDB: Add raw_key() interface for IteratorImpl
...
raw_key() is useful to split out the prefix.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
b595aac4e1
test/store_test Add get_omap_iterator test cases
...
omap iterator test cases include:
iter aganist omap
lower_bound
upper_bound
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Sage Weil
ca9bc6327d
os/newstore: drop sync()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
d57547f103
os/newstore: drop sync()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
205344d32d
os/newstore: drop flush
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f93856f71a
os/newstore: drop sync_and_flush
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
28bc4ee76e
os/newstore: use FS::zero()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
c67c9a2bee
os/newstore: use O_DIRECT is write is page-aligned
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
5539a75efb
os/newstore: pass flags to _{open,create}_fid
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
48f639beec
os/newstore: drop unused FragmentHandle
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
93fa4f1e30
os/newstore: do not call completions from kv thread
...
Reads may call wait_wal() holding user locks, and so we cannot block
progress on WAL completion/flushing by calling callbacks that may take
user locks.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
86a3f7dd51
os/newstore: let wal cleanup kv txn get batched
...
No need to trigger another sync kv commit here; just let the next KV
commit catch it.
We could possibly do a bit better here by not waking up the kv thread at
all...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
ec21f578a7
os/newstore: fix off-by-one on overlay_max_length
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f9a7fd4e4c
os/newstore: use lower_bound for finding overlay extents in map
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
66aae98277
os/newstore: use overlay even if it is a new object or append
...
This avoids the fsync for small writes.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
0981428123
os/Newstore:Change assert in get_onode
...
db->get will return negtive when key is not found.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00