Otherwise, all you see is errors about the probes that failed (e.g., a
failure to decode a non-bluestore superblock as bluestore).
Signed-off-by: Sage Weil <sage@redhat.com>
make_shared() will get rid of one extra 'new' call during shared_ptr
creation.It is also 20-30% faster than creating shared_ptr directly
by 'new' call.
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
Presently, the transaction object is been deleted by the Finisher
thread asynchronously. In heavy load scenario specially if we
unleash journal throttle more, we are seeing high memory usage by
the OSDs because of this. In this new scheme, with the help of
move semantics transaction objects will be deleted synchronously
from the filestore worker threads. We are seeing very much
controllable memory growth now as well as ~3 to 4% cpu usage
benefit because of some reduction of 'new' , 'delete' calls.
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
This is based on BlueStore, but with all of the block-related code
and complexity ripped out, and a simple striping strategy added
in its place.
Signed-off-by: Sage Weil <sage@redhat.com>
Label all of our block devices with a simple label
that includes the osd_uuid. Wire this into the
ObjectStore and OSD probe mechanism.
Signed-off-by: Sage Weil <sage@redhat.com>
Currently the option name and invocation assume that the block device
is a journal (and FileStore journal, managed by FileJournal). Rework
the interface so that we can probe any block device and other ObjectStore
implementations will have a chance to identify the device (and return the
osd fsid).
Switch to a static method while we are at it so we avoid instantiating
each backend.
Note that only FileStore is probed at the moment; that will change soon!
Signed-off-by: Sage Weil <sage@redhat.com>
This includes a bunch of new ceph_test_objectstore tests, and a ton of fixes
to existing tests so that objects actually live inside the collections they
are written to.
Signed-off-by: Sage Weil <sage@redhat.com>
Use a clean name for keyvaluestore (no -dev suffix), but mark as
experimental to ensure users know what they are signing up for.
Signed-off-by: Sage Weil <sage@redhat.com>
These methods have side-effects: they move the decode iterator *and*
return a value. Rename them to avoid confusion with typical get_*
accessors.
Signed-off-by: Sage Weil <sage@inktank.com>
This is primarily for librbd/krbd's benefit and is supposed to combat
fragmentation:
"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks. We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."
SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.
xfs is hooked up in the subsequent commits.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
KeyValueStore is another ObjectStore implementation with FileStore. It
uses KV store wrapper(StripObjectMap) which inherited GenericObjectMap
to implement ObjectStore APIs.
Each object has a header key in KV backend, which encapsulated the metadata
of object such as size, the status of keys. A complete object data maybe spread
around multi keys. The CRUD operation of object need to access the header key
of object to know the details, then the actual data keys will be get.
Now the actual KV backend of KeyValueStore is only LevelDB, more KV backend
(RocksDB, NVM API) will be introduced in the near future.
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
This is (as near to) a trivial ObjectStore backend for the OSD as we can
get at the moment. Everything is stored in memory. We are slightly
tricky with the locking, but not overly so.
On umount we dump everything out to disk, and on mount we load it all in
again, so we have some very coarse persistence/durability... just enough
to make this usable in a non-failure environment.
Signed-off-by: Sage Weil <sage@inktank.com>
Move these from the OSD. Use a generic implementation in ObjectStore that
hopefully all backends can share (so that it can remain in sync with the
start/stop scripts, ceph-disk, and other orchestration machinery).
Signed-off-by: Sage Weil <sage@inktank.com>
Generic way to create an ObjectStore implementation of the required type,
so that users don't need to know anything about it.
Signed-off-by: Sage Weil <sage@inktank.com>
Prefer prefix ++operator for non-primitive types like iterators for
performance reasons. Prefix ++/-- operators avoid creating a temporary
copy.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Handle ghobject_t to hobject_t conv of collection_list* funcs
Temporary code so that this branch doesn't break master
Signed-off-by: David Zafman <david.zafman@inktank.com>
Add ghobject_t to hboject.h header
Add constants NO_SHARD/NO_GEN and change gen_t/shard_t
Convert other headers from hobject_t to ghobject_t
Mostly straight hobject_t to ghobject_t for src/os cc files
Fix tools and tests and enable ceph-dencoder
Add filename generation and parsing including unittest addition
Get ceph-filestore-dump to build
Add gen/shard to DBObjectMap::ghobject_key() and update test case
Add CEPH_FS_FEATURE_INCOMPAT_SHARDS new FileStore feature
Add CEPH_OSD_FEATURE_INCOMPAT_SHARDS new osd feature
Fixes: #5862
Signed-off-by: David Zafman <david.zafman@inktank.com>
CID 1054829 (#1 of 1): Missing break in switch (MISSING_BREAK)
unterminated_case: This case (value 37) is not terminated by a
'break' statement.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Add rados_ioctx_namespace_set_key() and librados::IoCtx::namespace_set_key()
Add namespace to admin-daemon operations
Support namespace in osd map command
Add namespace to object_locator_t and hobject_t
Add random namespaces to psim program
Feature: #4982 (OSD: namespaces pt 1 (librados/osd, not caps))
Signed-off-by: David Zafman <david.zafman@inktank.com>
Fix switch handling for case OP_SPLIT_COLLECTION2, add break after
the case to prevent fall through into default case.
CID 1019562 Missing break in switch (CWE-484)
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 751331 (#1 of 1): Missing break in switch (MISSING_BREAK)
unterminated_case: This case (value 35) is not terminated by a 'break' statement.
Signed-off-by: Sage Weil <sage@inktank.com>
From this point, hobjects in the ObjectStore will be globally unique. This
will allow us to avoid including the collection in the ObjectMap key encoding
and thereby enable efficient collection renames and, eventually, collection
splits.
Signed-off-by: Samuel Just <sam.just@inktank.com>
By using OStreamFormatter, we can have a single function responsible for
dumping a transaction. We keep the same old functions for outputting
directly to a Formatter and to an ostream, but these are only wrappers
for the function that will really handle the dumping.
The "real" dump() function will now take only a Formatter as an argument,
to which we will output. We keep the 'dump(std::ostream& out)' function,
although now it simply creates an OStreamFormatter and passes it to the
dump() function.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>