"rbd bench-write" eject all write operations with the same offset at the same
time. It will result in non-objective performance result from this command.
fix#7066
Co-Author: Rongze Zhu <rongze@unitedstack.com>
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Add a -o / --options option, which would allow users to specify
rbd-specific and generic ceph client and osd options available at
mapping time in a comma separated list (similar to mount(8) mount
options).
Exposed options are:
- fsid=%s
- ip=%s
- share
- noshare
- crc
- nocrc
- osdkeepalive=%d
- osd_idle_ttl=%d
- rw
- ro (equivalent to existing --read-only flag)
The rw/ro < 3.7 kernels compatibility kludge added in commit
fb0f198644 is preserved.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.
The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.
Fixes: #5426
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
The ruleset --osd_pool_default_crush_erasure_ruleset is created to be
suitable for erasure coded pools when OSDMap::build_simple is required
to build the default OSD map of a new cluster.
Signed-off-by: Loic Dachary <loic@dachary.org>
--osd-pool-default-crush-replicated-ruleset replaces
--osd-pool-default-crush-rule
If --osd-pool-default-crush-rule is set it takes precedence over
--osd-pool-default-crush-replicated-ruleset and a deprecation warning is
displayed.
The CrushWrapper::get_osd_pool_default_crush_replicated_ruleset helper is
used to implement this behaviour.
Signed-off-by: Loic Dachary <loic@dachary.org>
Replace the manually crafted ruleset in OSDMap::build_simple_crush_map*
with calls to add_simple_ruleset. The generated ruleset do not have the
same behavior but that presumably do not cause any backward
compatibility problem because they are only created when a new cluster
is being initialized.
The prototypes of OSDMap::build_simple* are modified to allow for a
return code and display of a human readable error message.
The --osd-min-rep and --osd-max-rep configuration options are removed :
they were only used in the code that was removed.
Signed-off-by: Loic Dachary <loic@dachary.org>
The three rules created by build_simple are identical. They are replaced
by a single rule named replicated_rule which is set to be used by the
data, rbd and metadata pools.
Instead of hardcoding the ruleset number to zero, it is read from
osd_pool_default_crush_ruleset which defaults to zero.
The CEPH_DEFAULT_CRUSH_REPLICATED_RULESET enum is moved from osd_type.h to
config.h because it may be needed when osd_type.h is not included.
Signed-off-by: Loic Dachary <loic@dachary.org>
Assuming firstn is for replica and indep is for erasure. This is a
strong constraint but it is unlikely to make the resulting ruleset unfit
to be used in most cases.
Signed-off-by: Loic Dachary <loic@dachary.org>
New features appeared during the Havana cycle.
This patch offers a general update of the doc.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Creating an erasure pool will crash the OSD because OSD::_make_pg
asserts if the type is not replicated. The tests related to erasure
coded pool creation are removed from qa/workunits/cephtool/test.sh.
The osd-create-pool.sh unit test covers the cases removed from test.sh
more extensively. The intent is to check the interactions with the MON
only, therefore it does not run an OSD and the absence of erasure code
placement group backend implementation is not an issue.
Signed-off-by: Loic Dachary <loic@dachary.org>
Looping forever on kill does not serve any useful purpose.
Reduce the verbosity of the exit trap to help diagnose error
conditions.
Signed-off-by: Loic Dachary <loic@dachary.org>
The MDS assumes pool 0 and 1 are suitable for data and metadata
respectively. Instead of relying on the CEPH_DATA_RULE and
CEPH_METADATA_RULE constants that only match by chance, set a hardcoded
value specific to MDS to reduce the fragility of the hardcoded
assumption.
Signed-off-by: Loic Dachary <loic@dachary.org>
New ceph_osd.cc code did ObjectStore init work before global_init_daemonize(),
and WBThrottle thread is created when objectstore constructed. So after
daemon(), WBThrottle thread won't exist in new process. It will result in
deadlock.
When "cur_ios" which is member of WBThrottle hits hard limit, there exists two
ways to decrease "cur_ios". The first is WBThrottle thread which is dead if
deamonize, another is SyncThread. SyncThread will block at op_tp.pause()
because thread in op_tp(threadpool) block at
wbthrottle.throttle(FileStore::doop). So no thread will continue process jobs
in filestore layer and all threads is waiting.
Fix#7062 (http://tracker.ceph.com/issues/7062)
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Commit c76bbc2e6d, which introduced _daemon versions of some of the
argparse calls, also changed the behaviour of non-_daemon versions.
The change resulted in incorrect error messages, e.g.
$ ./rbd create b0 --size
rbd: extraneous parameter --size
instead of what should have been
$ ./rbd create b0 --size
Option --size requires an argument.
The users of _daemon versions were added in commit be801f6c50 and
removed in commit f26bd55e57, so just kill the _daemon versions and
restore the old behaviour. (This effectively reverts commit
c76bbc2e6df1.)
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
The deprecated attribute argument was introduced in gcc 4.5
http://gcc.gnu.org/gcc-4.5/changes.html and centos6 has a lower version.
Signed-off-by: Loic Dachary <loic@dachary.org>
(currently only in some librados operations)
First create the op, only then lock and submit so that we reduce lock
contention.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Use the newly-discovered (for me) deprecated attribute to mark the old
get_version() method and point users toward get_version64(). And fix a
couple of users in the kvstore code!
Signed-off-by: Sage Weil <sage@inktank.com>
The parent is always a snapshot. We may want to treat it differently
than other snaps by virtue of it (likely) being a more highly-shared
image.
By default, localize parent reads.
Signed-off-by: Sage Weil <sage@inktank.com>