Commit Graph

44184 Commits

Author SHA1 Message Date
Sage Weil
41886c5420 os/newstore: throttle over entire write lifecycle
Take a global throttle when we submit ops and release when they complete.
The first throttles cover the period from submit to commit, while the wal
ones also cover the async post-commit wal work.  The configs are additive
since the wal ones cover both periods; this should make them reasonably
idiot-proof.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
b1136fbd33 os/NewStore: data_map shouldn't be empty when writing all overlays
This should be an assert instead of creating new data_map.

Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
a165fe81c5 os/NewStore: clear the shared_overlays after writing all the overlays
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Zhiqiang Wang
dffa43051a os/NewStore: don't clear overlay in the create/append case of write
Shouldn't clear the overlay in the create/append case of write.
Otherwise, this removes the overlay data and leads to data loss.

Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-09-01 13:39:40 -04:00
Sage Weil
f9f9e1b105 os/newstore: debug io_submit EAGAIN
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:40 -04:00
Sage Weil
dd79b4d832 os/newstore: release wal throttle when wal completes, not when queued
If we take the aio path, the io is queued immediately and the resources
are released back to the pool.  Instead release them when wal completes.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
715fd3b7a2 os/newstore: todo
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
3b66712598 os/newstore: move toward state-machine
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
2317e446c5 os/newstore: use aio for wal writes, too
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
e580a82729 os/newstore: a few comments about wal
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
5d8e14653d os/newstore: combined O_DSYNC with O_DIRECT
This avoids the need for an explicit fdatasync when doing O_DIRECT.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
b7a53b5874 os/newstore: basic aio support
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
ba0d8d7fdd os/Newstore: add newstore_db_path option
The load of Keyvalue DB is heavy, allow user to put
DB to a seperate(fast) device.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:39 -04:00
Sage Weil
143d48570f os/newstore: throttle wal work
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:39 -04:00
Sage Weil
efe218b4aa os/newstore: show # o_direct buffers in debug output
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
7e1af1e616 os/newstore: use a threadpool for applying wal events
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
dfd389e66a os/newstore: rebuild buffers to be page-aligned for O_DIRECT
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
552d95213b ceph_test_objectstore: fix omap test cleanup
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
04f55d8d18 os/newstore: use fdatasync instead of fsync
On XFS at least, fdatasync is sufficient to make data readable.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Sage Weil
1321b880cc os/newstore: update todo
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
65877832f8 os/Newstore: Check onode.omap_head in valid() and next()
The db iter will be set to KeyValueDB::Iterator() if onode.omap_head
not present. In that case if we touch the db iter we will get a segmentation
fault.

Prevent to touch the db iter when onode.omap_head is invalid(equals to 0).

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
1a97fd6cb7 Use .str() to output a stringstream.
a nit.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
9d0e925566 os/Newstore: Allow gap in _do_write append mode
We can allow some gap so we only need to ensure
onode.size <= offset.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
5e9c64b4dd Implement get_omap_iterator
implemented get_omap_iterator

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
c86410239b os/KeyValueDB: Add raw_key() interface for IteratorImpl
raw_key() is useful to split out the prefix.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
b595aac4e1 test/store_test Add get_omap_iterator test cases
omap iterator test cases include:
  iter aganist omap
  lower_bound
  upper_bound

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Sage Weil
ca9bc6327d os/newstore: drop sync()
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
d57547f103 os/newstore: drop sync()
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
205344d32d os/newstore: drop flush
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f93856f71a os/newstore: drop sync_and_flush
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
28bc4ee76e os/newstore: use FS::zero()
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
c67c9a2bee os/newstore: use O_DIRECT is write is page-aligned
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
5539a75efb os/newstore: pass flags to _{open,create}_fid
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
48f639beec os/newstore: drop unused FragmentHandle
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
93fa4f1e30 os/newstore: do not call completions from kv thread
Reads may call wait_wal() holding user locks, and so we cannot block
progress on WAL completion/flushing by calling callbacks that may take
user locks.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
86a3f7dd51 os/newstore: let wal cleanup kv txn get batched
No need to trigger another sync kv commit here; just let the next KV
commit catch it.

We could possibly do a bit better here by not waking up the kv thread at
all...

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
ec21f578a7 os/newstore: fix off-by-one on overlay_max_length
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f9a7fd4e4c os/newstore: use lower_bound for finding overlay extents in map
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
66aae98277 os/newstore: use overlay even if it is a new object or append
This avoids the fsync for small writes.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
0981428123 os/Newstore:Change assert in get_onode
db->get will return negtive when key is not found.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
97bda73ebf os/newstore: open by handle
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
8f2c2bff30 os/newstore: use fs abstaction layer
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
ef420baf1c os/newstore: cap fid_max below newstore_max_dir_size
Prevent fid_max over the max_dir_size when preallocation.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
59cd761bca os/newstore: keep smallish overlay extents in kv db
If we have a small overwrite, keep the extent in the key/value database.
Only write it back to the file/fragment later, and when we do, write them
all at once.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
2af1e37d7d os/newstore: assigned unique nid to each new object
Use this as the key for omap (omap_head), but keep the omap_head field
so that we can tell when no omap data is present.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
713c69884e os/newstore: consolite collection_list to a single implementation
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
a4d2a53cf6 Clear removed_collections after reap
Previous code forgot to clear the removed_collections queues
after reaped the collections in _reap_collection.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
d8351a8d9e os/newstore: ref count OpSequencer
Our OpSequencer may live longer than the ObjectStore::Sequencer interface
object does.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
fbf3d5528f os/newstore: send complete overwrite to a new fid
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
db87e423b6 os/newstore: clone omap
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00