Xiaoxi Chen
29ba720885
os/Nestore: batch cleanup
...
batch cleanup wal.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:40 -04:00
Sage Weil
4eca15a950
os/newstore: fix _txc_aio_submit
...
The aios may complete before _txc_aio_submit completes. In fact, the aio
may complete, commit to the kv store, and then queue more wal aio's before
we finish the loop. Move aios to a separate list to ensure we only submit
them once and do not right another CPU adjusting the list.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Sage Weil
41886c5420
os/newstore: throttle over entire write lifecycle
...
Take a global throttle when we submit ops and release when they complete.
The first throttles cover the period from submit to commit, while the wal
ones also cover the async post-commit wal work. The configs are additive
since the wal ones cover both periods; this should make them reasonably
idiot-proof.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
b1136fbd33
os/NewStore: data_map shouldn't be empty when writing all overlays
...
This should be an assert instead of creating new data_map.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
a165fe81c5
os/NewStore: clear the shared_overlays after writing all the overlays
...
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
dffa43051a
os/NewStore: don't clear overlay in the create/append case of write
...
Shouldn't clear the overlay in the create/append case of write.
Otherwise, this removes the overlay data and leads to data loss.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Sage Weil
f9f9e1b105
os/newstore: debug io_submit EAGAIN
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Sage Weil
dd79b4d832
os/newstore: release wal throttle when wal completes, not when queued
...
If we take the aio path, the io is queued immediately and the resources
are released back to the pool. Instead release them when wal completes.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
715fd3b7a2
os/newstore: todo
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
3b66712598
os/newstore: move toward state-machine
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
2317e446c5
os/newstore: use aio for wal writes, too
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
e580a82729
os/newstore: a few comments about wal
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
5d8e14653d
os/newstore: combined O_DSYNC with O_DIRECT
...
This avoids the need for an explicit fdatasync when doing O_DIRECT.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
b7a53b5874
os/newstore: basic aio support
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
ba0d8d7fdd
os/Newstore: add newstore_db_path option
...
The load of Keyvalue DB is heavy, allow user to put
DB to a seperate(fast) device.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:39 -04:00
Sage Weil
143d48570f
os/newstore: throttle wal work
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
efe218b4aa
os/newstore: show # o_direct buffers in debug output
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
7e1af1e616
os/newstore: use a threadpool for applying wal events
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
dfd389e66a
os/newstore: rebuild buffers to be page-aligned for O_DIRECT
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
552d95213b
ceph_test_objectstore: fix omap test cleanup
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
04f55d8d18
os/newstore: use fdatasync instead of fsync
...
On XFS at least, fdatasync is sufficient to make data readable.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
1321b880cc
os/newstore: update todo
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
65877832f8
os/Newstore: Check onode.omap_head in valid() and next()
...
The db iter will be set to KeyValueDB::Iterator() if onode.omap_head
not present. In that case if we touch the db iter we will get a segmentation
fault.
Prevent to touch the db iter when onode.omap_head is invalid(equals to 0).
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
1a97fd6cb7
Use .str() to output a stringstream.
...
a nit.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
9d0e925566
os/Newstore: Allow gap in _do_write append mode
...
We can allow some gap so we only need to ensure
onode.size <= offset.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
5e9c64b4dd
Implement get_omap_iterator
...
implemented get_omap_iterator
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
c86410239b
os/KeyValueDB: Add raw_key() interface for IteratorImpl
...
raw_key() is useful to split out the prefix.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
b595aac4e1
test/store_test Add get_omap_iterator test cases
...
omap iterator test cases include:
iter aganist omap
lower_bound
upper_bound
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Sage Weil
ca9bc6327d
os/newstore: drop sync()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
d57547f103
os/newstore: drop sync()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
205344d32d
os/newstore: drop flush
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f93856f71a
os/newstore: drop sync_and_flush
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
28bc4ee76e
os/newstore: use FS::zero()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
c67c9a2bee
os/newstore: use O_DIRECT is write is page-aligned
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
5539a75efb
os/newstore: pass flags to _{open,create}_fid
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
48f639beec
os/newstore: drop unused FragmentHandle
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
93fa4f1e30
os/newstore: do not call completions from kv thread
...
Reads may call wait_wal() holding user locks, and so we cannot block
progress on WAL completion/flushing by calling callbacks that may take
user locks.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
86a3f7dd51
os/newstore: let wal cleanup kv txn get batched
...
No need to trigger another sync kv commit here; just let the next KV
commit catch it.
We could possibly do a bit better here by not waking up the kv thread at
all...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
ec21f578a7
os/newstore: fix off-by-one on overlay_max_length
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f9a7fd4e4c
os/newstore: use lower_bound for finding overlay extents in map
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
66aae98277
os/newstore: use overlay even if it is a new object or append
...
This avoids the fsync for small writes.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
0981428123
os/Newstore:Change assert in get_onode
...
db->get will return negtive when key is not found.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
97bda73ebf
os/newstore: open by handle
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
8f2c2bff30
os/newstore: use fs abstaction layer
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
ef420baf1c
os/newstore: cap fid_max below newstore_max_dir_size
...
Prevent fid_max over the max_dir_size when preallocation.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
59cd761bca
os/newstore: keep smallish overlay extents in kv db
...
If we have a small overwrite, keep the extent in the key/value database.
Only write it back to the file/fragment later, and when we do, write them
all at once.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
2af1e37d7d
os/newstore: assigned unique nid to each new object
...
Use this as the key for omap (omap_head), but keep the omap_head field
so that we can tell when no omap data is present.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
713c69884e
os/newstore: consolite collection_list to a single implementation
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
a4d2a53cf6
Clear removed_collections after reap
...
Previous code forgot to clear the removed_collections queues
after reaped the collections in _reap_collection.
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
d8351a8d9e
os/newstore: ref count OpSequencer
...
Our OpSequencer may live longer than the ObjectStore::Sequencer interface
object does.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00