Commit Graph

64 Commits

Author SHA1 Message Date
Sage Weil
a5564a664c os/ObjectStore: make device uuid probe output something friendly
Otherwise, all you see is errors about the probes that failed (e.g., a
failure to decode a non-bluestore superblock as bluestore).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-05 11:10:54 -04:00
Somnath Roy
331e90f450 Use make_shared while creating shared_ptr
make_shared() will get rid of one extra 'new' call during shared_ptr
creation.It is also 20-30% faster than creating shared_ptr directly
by 'new' call.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2016-02-17 20:16:39 -05:00
Somnath Roy
5e4eb3fcdd OSD: Deleting transaction object right after applying transaction
Presently, the transaction object is been deleted by the Finisher
thread asynchronously. In heavy load scenario specially if we
unleash journal throttle more, we are seeing high memory usage by
the OSDs because of this. In this new scheme, with the help of
move semantics transaction objects will be deleted synchronously
from the filestore worker threads. We are seeing very much
controllable memory growth now as well as ~3 to 4% cpu usage
benefit because of some reduction of 'new' , 'delete' calls.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2016-01-27 17:43:37 -05:00
Haomai Wang
19c39faead KeyValueStore: Kill this
Signed-off-by: Haomai Wang <haomai@xsky.com>
2016-01-22 11:15:10 +08:00
Mykola Golub
e3d45f0b58 os/bluestore: don't include when building without libaio
Fixes: #14207
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
2016-01-12 20:51:13 +02:00
Sage Weil
98a0e107aa os/keyvaluestore: move KeyValueStore into os/keyvaluestore/*
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:08:52 -05:00
Sage Weil
82cbc079d7 os/memstore: move MemStore into os/memstore/*
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:08:52 -05:00
Sage Weil
ba2cc1eb6e os/filestore: move FileStore to os/filestore/*
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:08:52 -05:00
Sage Weil
669bec7b7e os/kstore: add new KStore backend
This is based on BlueStore, but with all of the block-related code
and complexity ripped out, and a simple striping strategy added
in its place.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:07:16 -05:00
Sage Weil
31307a5ab8 os/bluestore: label all block devices
Label all of our block devices with a simple label
that includes the osd_uuid.  Wire this into the
ObjectStore and OSD probe mechanism.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:57 -05:00
Sage Weil
a62ffb0d03 newstore -> bluestore
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
880a59d1b7 osd: make block device fsid probing generic
Currently the option name and invocation assume that the block device
is a journal (and FileStore journal, managed by FileJournal).  Rework
the interface so that we can probe any block device and other ObjectStore
implementations will have a chance to identify the device (and return the
osd fsid).

Switch to a static method while we are at it so we avoid instantiating
each backend.

Note that only FileStore is probed at the moment; that will change soon!

Signed-off-by: Sage Weil <sage@redhat.com>
2015-12-01 17:16:11 -05:00
Sage Weil
7fc05b4821 os/ObjectStore: helpers for validating map<string,string> and set<string> to bl
Test/validate the encoding, and reference the resulting (encoded) data in
a bufferlist.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-10-19 15:05:49 -04:00
Yan, Zheng
648c7041a2 os: disable newstore when configure --without-libaio
newstore makes extensive use of aio. So disable it when configure
--without-libaio

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-11 10:11:43 +08:00
Sage Weil
d0a4bbaf69 newstore: initial version
This includes a bunch of new ceph_test_objectstore tests, and a ton of fixes
to existing tests so that objects actually live inside the collections they
are written to.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-01 13:39:35 -04:00
Sage Weil
4d6ee79a04 os/ObjectStore: kill hobject_t convenience wrappers
These are dangerous and no longer needed.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-06-18 17:02:49 -07:00
Sage Weil
44ce7cc1de os: rename keyvaluestore-dev -> keyvaluestore; mark experimental
Use a clean name for keyvaluestore (no -dev suffix), but mark as
experimental to ensure users know what they are signing up for.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-12-29 14:19:14 -08:00
Sage Weil
cfa22900d7 os/ObjectStore: remove collection_{add,remove}
Move the add+remove a move normally translates to directly into
that method.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-10-27 16:59:29 -07:00
David Zafman
3d9fde9d92 os: Add optional flags to generic ObjectStore creation (SKIP_JOURNAL_REPLAY
and SKIP_MOUNT_OMAP)

Only FileStore cares about these flags, so passed on during create()

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-08-28 16:21:27 -07:00
Sage Weil
1c170776cb libosd_types, libos_types, libmon_types
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-18 22:33:42 -07:00
Guang Yang
228760ce3a Fix the PG listing issue which could miss objects for EC pool (where there is object shard and generation).
Backport: firefly
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-07-10 01:05:40 +00:00
Sage Weil
aede83281f os: rename get_*() -> decode_*()
These methods have side-effects: they move the decode iterator *and*
return a value.  Rename them to avoid confusion with typical get_*
accessors.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-06-05 09:19:05 -07:00
Sage Weil
237f0fb455 os/ObjectStore: dump COLL_MOVE_RENAME
This got missed way back in ef7cffc34f
(pre-0.71).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-06 13:44:39 -08:00
Ilya Dryomov
6456802394 osd: add SETALLOCHINT operation
This is primarily for librbd/krbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.

xfs is hooked up in the subsequent commits.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-03-03 20:33:44 +02:00
Samuel Just
06ec9bd42b ObjectStore: fix OP_COLL_ADD dump output
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-02-17 14:24:55 -08:00
Sage Weil
5476b4b6eb keyvaluestore: name to keyvaluetore-dev for now
This helps warn the user that the ondisk format may be subject to change.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-29 06:40:29 -08:00
Haomai Wang
972d4b24c4 Add KeyValueStore implementation
KeyValueStore is another ObjectStore implementation with FileStore. It
uses KV store wrapper(StripObjectMap) which inherited GenericObjectMap
to implement ObjectStore APIs.

Each object has a header key in KV backend, which encapsulated the metadata
of object such as size, the status of keys. A complete object data maybe spread
around multi keys. The CRUD operation of object need to access the header key
of object to know the details, then the actual data keys will be get.

Now the actual KV backend of KeyValueStore is only LevelDB, more KV backend
(RocksDB, NVM API) will be introduced in the near future.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-01-29 21:50:15 +08:00
Noah Watkins
4c4e1d0d47 libc++: use ceph:: namespaced data types
Switches the implemetnation of smart pointers and unordered map/set to
use the ceph:: versions.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2014-01-18 14:03:20 -08:00
Sage Weil
aa63d6730a os/MemStore: implement reference 'memstore' backend
This is (as near to) a trivial ObjectStore backend for the OSD as we can
get at the moment.  Everything is stored in memory.  We are slightly
tricky with the locking, but not overly so.

On umount we dump everything out to disk, and on mount we load it all in
again, so we have some very coarse persistence/durability... just enough
to make this usable in a non-failure environment.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-05 23:13:28 -08:00
Sage Weil
a70200e329 os/ObjectStore: pass cct to ctor
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-04 14:46:40 -08:00
Sage Weil
4d140a71a1 os/ObjectStore: add {read,write}_meta
Move these from the OSD.  Use a generic implementation in ObjectStore that
hopefully all backends can share (so that it can remain in sync with the
start/stop scripts, ceph-disk, and other orchestration machinery).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-11-29 22:28:36 -08:00
Sage Weil
237d6b8375 os/ObjectStore: add static create() method
Generic way to create an ObjectStore implementation of the required type,
so that users don't need to know anything about it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-11-29 22:28:35 -08:00
Danny Al-Gaaf
a8e10d3d0a os/ObjectStore.cc: prefer prefix ++operator for non-primitive types
Prefer prefix ++operator for non-primitive types like iterators for
performance reasons. Prefix ++/-- operators avoid creating a temporary
copy.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-11-07 23:31:13 +01:00
David Zafman
bab72ed394 os: Simplify collection_list* funcs by removing dynamic_cast
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-09-26 14:26:52 -07:00
David Zafman
4a757eb8e0 os/ObjectStore: Interim collection_list* functions in ObjectStore
Handle ghobject_t to hobject_t conv of collection_list* funcs
Temporary code so that this branch doesn't break master

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-09-26 11:29:05 -07:00
David Zafman
aba6efda13 common, os, osd, test, tools: FileStore must work with ghobjects rather than hobjects
Add ghobject_t to hboject.h header
Add constants NO_SHARD/NO_GEN and change gen_t/shard_t
Convert other headers from hobject_t to ghobject_t
Mostly straight hobject_t to ghobject_t for src/os cc files
Fix tools and tests and enable ceph-dencoder
Add filename generation and parsing including unittest addition
Get ceph-filestore-dump to build
Add gen/shard to DBObjectMap::ghobject_key() and update test case
Add CEPH_FS_FEATURE_INCOMPAT_SHARDS new FileStore feature
Add CEPH_OSD_FEATURE_INCOMPAT_SHARDS new osd feature

Fixes: #5862

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-09-26 11:29:05 -07:00
Danny Al-Gaaf
6e6ef01591 os/ObjectStore.cc: don't fallthrough after OP_OMAP_RMKEYRANGE
CID 1054829 (#1 of 1): Missing break in switch (MISSING_BREAK)
  unterminated_case: This case (value 37) is not terminated by a
  'break' statement.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-07-24 18:30:16 +02:00
Samuel Just
1999fa2c6c ObjectStore: add omap_rmkeyrange to dump
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-16 15:30:11 -07:00
David Zafman
b10848e212 Merge branch 'master' into wip-4982-4983-oloc-rebase 2013-07-09 14:10:42 -07:00
David Zafman
e761e4e55f librados, os, osd, osdc, test: Add support for client specified namespaces
Add rados_ioctx_namespace_set_key() and librados::IoCtx::namespace_set_key()
Add namespace to admin-daemon operations
Support namespace in osd map command
Add namespace to object_locator_t and hobject_t
Add random namespaces to psim program

Feature: #4982 (OSD: namespaces pt 1 (librados/osd, not caps))

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-07-09 14:09:02 -07:00
Samuel Just
daee9dbe89 ObjectStore,Context: add register_on_complete
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-03 13:58:11 -07:00
Danny Al-Gaaf
6e241b97bb ObjectStore.cc: add missing break in switch
Fix switch handling for case OP_SPLIT_COLLECTION2, add break after
the case to prevent fall through into default case.

CID 1019562 Missing break in switch (CWE-484)

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-05-11 00:02:54 +02:00
Sage Weil
9d85d67e77 os/ObjectStore: add missing break in dump()
CID 751331 (#1 of 1): Missing break in switch (MISSING_BREAK)
unterminated_case: This case (value 35) is not terminated by a 'break' statement.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-06 17:15:14 -07:00
Samuel Just
b184ff581a FileStore: _split_collection should not create the collection
This will simplify adding a replay guard to create_collection.

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-12 10:15:03 -08:00
Samuel Just
4d6ba06309 ObjectStore: add queue_transactions with oncomplete
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-23 11:50:21 -08:00
Sage Weil
bc994045ad os: move apply_transactions() sync wrapper into ObjectStore
This has nothing to do with the backend implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-18 15:44:41 -08:00
Samuel Just
f2a23916d4 os/: add filestore collection_split
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-05 11:34:18 -08:00
Samuel Just
d5ab87798b src/: Add namespace and pool fields to hobject_t
From this point, hobjects in the ObjectStore will be globally unique.  This
will allow us to avoid including the collection in the ObjectMap key encoding
and thereby enable efficient collection renames and, eventually, collection
splits.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-05 16:09:49 -07:00
Sage Weil
816a512827 objectstore: tweak dump() a bit
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-12 10:59:23 -07:00
Joao Eduardo Luis
9565a8bae3 ObjectStore: Remove code duplication when dumping transactions.
By using OStreamFormatter, we can have a single function responsible for
dumping a transaction. We keep the same old functions for outputting
directly to a Formatter and to an ostream, but these are only wrappers
for the function that will really handle the dumping.

The "real" dump() function will now take only a Formatter as an argument,
to which we will output. We keep the 'dump(std::ostream& out)' function,
although now it simply creates an OStreamFormatter and passes it to the
dump() function.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-12 10:59:23 -07:00