Commit Graph

44417 Commits

Author SHA1 Message Date
Xiaoxi Chen
65877832f8 os/Newstore: Check onode.omap_head in valid() and next()
The db iter will be set to KeyValueDB::Iterator() if onode.omap_head
not present. In that case if we touch the db iter we will get a segmentation
fault.

Prevent to touch the db iter when onode.omap_head is invalid(equals to 0).

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
1a97fd6cb7 Use .str() to output a stringstream.
a nit.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
9d0e925566 os/Newstore: Allow gap in _do_write append mode
We can allow some gap so we only need to ensure
onode.size <= offset.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
5e9c64b4dd Implement get_omap_iterator
implemented get_omap_iterator

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
c86410239b os/KeyValueDB: Add raw_key() interface for IteratorImpl
raw_key() is useful to split out the prefix.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Xiaoxi Chen
b595aac4e1 test/store_test Add get_omap_iterator test cases
omap iterator test cases include:
  iter aganist omap
  lower_bound
  upper_bound

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:38 -04:00
Sage Weil
ca9bc6327d os/newstore: drop sync()
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
d57547f103 os/newstore: drop sync()
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
205344d32d os/newstore: drop flush
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f93856f71a os/newstore: drop sync_and_flush
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
28bc4ee76e os/newstore: use FS::zero()
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
c67c9a2bee os/newstore: use O_DIRECT is write is page-aligned
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
5539a75efb os/newstore: pass flags to _{open,create}_fid
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
48f639beec os/newstore: drop unused FragmentHandle
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
93fa4f1e30 os/newstore: do not call completions from kv thread
Reads may call wait_wal() holding user locks, and so we cannot block
progress on WAL completion/flushing by calling callbacks that may take
user locks.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
86a3f7dd51 os/newstore: let wal cleanup kv txn get batched
No need to trigger another sync kv commit here; just let the next KV
commit catch it.

We could possibly do a bit better here by not waking up the kv thread at
all...

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
ec21f578a7 os/newstore: fix off-by-one on overlay_max_length
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:37 -04:00
Sage Weil
f9a7fd4e4c os/newstore: use lower_bound for finding overlay extents in map
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
66aae98277 os/newstore: use overlay even if it is a new object or append
This avoids the fsync for small writes.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
0981428123 os/Newstore:Change assert in get_onode
db->get will return negtive when key is not found.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
97bda73ebf os/newstore: open by handle
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
8f2c2bff30 os/newstore: use fs abstaction layer
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
ef420baf1c os/newstore: cap fid_max below newstore_max_dir_size
Prevent fid_max over the max_dir_size when preallocation.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
59cd761bca os/newstore: keep smallish overlay extents in kv db
If we have a small overwrite, keep the extent in the key/value database.
Only write it back to the file/fragment later, and when we do, write them
all at once.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
2af1e37d7d os/newstore: assigned unique nid to each new object
Use this as the key for omap (omap_head), but keep the omap_head field
so that we can tell when no omap data is present.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Sage Weil
713c69884e os/newstore: consolite collection_list to a single implementation
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:36 -04:00
Xiaoxi Chen
a4d2a53cf6 Clear removed_collections after reap
Previous code forgot to clear the removed_collections queues
after reaped the collections in _reap_collection.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2015-09-01 13:39:36 -04:00
Sage Weil
d8351a8d9e os/newstore: ref count OpSequencer
Our OpSequencer may live longer than the ObjectStore::Sequencer interface
object does.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
fbf3d5528f os/newstore: send complete overwrite to a new fid
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
db87e423b6 os/newstore: clone omap
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
d0a4bbaf69 newstore: initial version
This includes a bunch of new ceph_test_objectstore tests, and a ton of fixes
to existing tests so that objects actually live inside the collections they
are written to.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
10c0bfeadd vstart.sh: debug newstore 2015-09-01 13:39:18 -04:00
Sage Weil
be93b09fd2 Revert "os/Makefile.am: add os/fs/XFS.cc"
This reverts commit 32331ede41.

Doh, this is in a conditional below.
2015-09-01 13:38:03 -04:00
David Zafman
78e784d202 Merge pull request #5173 from ceph/wip-12000-12200
Fast read for erasure coding pool and erasure code error handling

Error handling
Reviewed-by: Loic Dachary <ldachary@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Fast Read
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2015-09-01 10:25:08 -07:00
Sage Weil
32331ede41 os/Makefile.am: add os/fs/XFS.cc
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:20:03 -04:00
Loic Dachary
32446ffb00 tests: ceph-disk: dmcrypt simplification
* Get rid of the cryptsetup calls that are redundant with what ceph
  prepare already does
* Do not use the --dmcrypt-key-dir option. This is less coverage but it
  interferes with the udev logic and is expected to be refactored soon.

Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-09-01 19:04:19 +02:00
Sage Weil
b226fad968 ceph-disk: systemctl restart the ceph-disk@ service
Otherwise the second time around activating something it will do nothing.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:45:56 -04:00
Sage Weil
00e653440c ceph-disk: be a bit more verbose
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:40:41 -04:00
Loic Dachary
35c9962e7a ceph-disk: only check partition_type if partition
The multipath sanity checks of get_journal_osd_uuid must not try to
verify the partition type when the device is not a partition.

Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-09-01 11:33:34 -04:00
David Disseldorp
fcae1458bf ceph-disk: fix dmcrypt_map() usage for LUKS activate
29431944c7 added a call to dmcrypt_map()
during disk activation. The change is not suitable for use alongside
the recently added dmcrypt LUKS support, because:
- The callers don't correctly provide cryptsetup_parameters or luks
  arguments.
- dmcrypt_map() calls LuksFormat, which should never be performed
  during disk activation.
- The key file paths don't carry the luks suffix when required.

This commit addresses these issues. Corresponding tests and a udev file
update will follow.

Signed-off-by: David Disseldorp <ddiss@suse.de>

Conflicts:
	src/ceph-disk
2015-09-01 11:33:34 -04:00
Sage Weil
c14c3172bb ceph-disk: add trigger subcommand
Either trigger a systemd event, or do it synchronously if there is no
systemd.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:33:33 -04:00
Sage Weil
3662a225b8 udev: use ceph-disk trigger ... with single set of udev rules
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:22:25 -04:00
Sage Weil
f1b80e99b0 systemd: consolidate into a single ceph-disk@.service
This simple service will 'ceph-disk trigger DEV --sync'.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 11:22:25 -04:00
tianshan
08296dc0fe rados: make 'rados bench' support json format output
Fixes: #12864
rados bench add '[--format json]' and '[-o | --output outfile]' support.
output option only take effect in json format.
now we can use the bench result draw performance graph easily.

Signed-off-by: Tianshan Qu <qutianshan@gmail.com>
2015-09-01 21:33:00 +08:00
John Spray
f420fe4683 mds: fix shutdown while in standby
Fixes: #12776
Signed-off-by: John Spray <john.spray@redhat.com>
2015-09-01 13:30:00 +01:00
Kefu Chai
82acd5e9ee Merge pull request #5695 from tchaikov/wip-12012
osd: translate sparse_read to read for ecpool

Reviewed-by: Sage Weil <sage@redhat.com>
2015-09-01 19:00:46 +08:00
Kefu Chai
076bad955d ceph_test_rados_api_aio: add a test for aio_sparse_read
Signed-off-by: Kefu Chai <kchai@redhat.com>
2015-09-01 13:52:37 +08:00
Kefu Chai
4d4920610e ceph_test_rados_api_io: add tests for sparse_read
Signed-off-by: Kefu Chai <kchai@redhat.com>
2015-09-01 13:49:22 +08:00
Kefu Chai
5ae2e7a185 ceph_test_rados: also send sparse_read in ReadOp
Signed-off-by: Kefu Chai <kchai@redhat.com>
2015-09-01 13:49:22 +08:00
Kefu Chai
a5bfde69a9 osd: should use ec_pool() when checking for an ecpool
we were using pool.info.require_rollback() in do_osd_ops() when
handling OP_SPARSE_READ to tell if a pool is an ecpool. should
use pool.info.ec_pool() instead.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2015-09-01 13:49:21 +08:00