Commit Graph

30554 Commits

Author SHA1 Message Date
Sage Weil
b88af07ef5 libcephfs: get osd location on -1 should return EINVAL
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-28 10:25:00 -08:00
Sage Weil
250ecf6655 qa/workunits/mon/crush_ops.sh: fix in-use rule rm test
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-28 10:22:18 -08:00
Sage Weil
d4f07cd90b crush: fix get_full_location_ordered
This should return -ENOENT when an id is not present.  Broken by
746069ee62.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-28 08:55:02 -08:00
Haomai Wang
fd57d99b6b Fix rbd bench-write improper behavior
"rbd bench-write" eject all write operations with the same offset at the same
time. It will result in non-objective performance result from this command.

fix #7066

Co-Author: Rongze Zhu <rongze@unitedstack.com>
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2013-12-28 18:33:37 +08:00
Yehuda Sadeh
23f715ba82 Merge pull request #1005 from ceph/wip-rgw-leak
rgw: fix leak of RGWProcess

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-12-27 16:52:30 -08:00
Sage Weil
f9f5c37149 rgw: fix leak of RGWProcess
Introduced by a3e50b09a1.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-27 16:36:21 -08:00
Sage Weil
96fe80dbd7 osd: preserve user_version in snaps/clones
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-27 16:32:22 -08:00
Sage Weil
80b5487671 ceph_test_rados: test read from snapshots
This was disabled back in 2011, c54aa7db3b.
Whoops!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-27 16:32:21 -08:00
Sage Weil
2f8b602910 osd/OSDMap: observe 'osd crush chooseleaf type' option for initial rules
This option was dropped by 2a7fcc35b8.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-27 13:45:34 -08:00
Josh Durgin
9c068939c4 Merge branch 'rbd-map-options'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-27 09:56:55 -08:00
Ilya Dryomov
9b7364d245 rbd: expose options available to rbd map
Add a -o / --options option, which would allow users to specify
rbd-specific and generic ceph client and osd options available at
mapping time in a comma separated list (similar to mount(8) mount
options).

Exposed options are:

- fsid=%s
- ip=%s
- share
- noshare
- crc
- nocrc
- osdkeepalive=%d
- osd_idle_ttl=%d
- rw
- ro (equivalent to existing --read-only flag)

The rw/ro < 3.7 kernels compatibility kludge added in commit
fb0f198644 is preserved.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2013-12-27 09:56:24 -08:00
Sage Weil
b8f42b6b4d Merge pull request #1001 from dachary/wip-forward-tid
messages: add tid to string form of MForward

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-27 07:59:43 -08:00
Loic Dachary
542f8d307c Merge pull request #1002 from yuyuyu101/wip-7062
Lack of "start" member function declare in WBThrottle.h
make check runs ok

Reviewed-by: Loic Dachary <loic@dachary.org>
2013-12-27 04:28:59 -08:00
Haomai Wang
b3bda085b6 Lack of "start" member function declare in WBThrottle.h
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2013-12-27 18:11:09 +08:00
Loic Dachary
4a9c770953 messages: add tid to string form of MForward
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 07:30:04 +01:00
Sage Weil
b8fb366eab Merge pull request #837 from ceph/port/fallocate
FileJournal: zero-fill in-lieu of posix_fallocate

We may want to change that to a #warning later...

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-26 21:33:39 -08:00
Sage Weil
f5b698cb1d Merge pull request #982 from dachary/wip-default-crush-rule
osd: add default crush rule for erasure pools

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-26 21:29:36 -08:00
Sage Weil
39a9c323dc Merge pull request #974 from dachary/wip-build-depends
packaging: make check needs argparse and uuidgen

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-26 21:26:02 -08:00
Sage Weil
a39226ac9a Merge pull request #994 from yuyuyu101/wip-7062
Fix WBThrottle thread disappear problem

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-26 21:25:08 -08:00
Loic Dachary
67f99f3455 packaging: make check needs argparse and uuidgen
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 06:14:02 +01:00
Josh Durgin
8f3ad4e3b9 Merge pull request #1000 from ceph/wip-rbd-tinc-5426
fix #5426 race in librbd

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-26 18:53:02 -08:00
Josh Durgin
4cea7895da librbd: call user completion after incrementing perfcounters
The perfcounters (and the ictx) are only valid while the image is
still open.  If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.

The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.

Fixes: #5426
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-26 17:40:34 -08:00
Loic Dachary
f8a4001440 osd: create default ruleset for erasure pools
The ruleset --osd_pool_default_crush_erasure_ruleset is created to be
suitable for erasure coded pools when OSDMap::build_simple is required
to build the default OSD map of a new cluster.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 00:28:04 +01:00
Loic Dachary
8b2b5a33bd mon: implement --osd-pool-default-crush-erasure-ruleset
It must be different from the replicated default.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 00:13:47 +01:00
Loic Dachary
dd81858ca6 mon: implement --osd-pool-default-crush-replicated-ruleset
--osd-pool-default-crush-replicated-ruleset replaces
--osd-pool-default-crush-rule

If --osd-pool-default-crush-rule is set it takes precedence over
--osd-pool-default-crush-replicated-ruleset and a deprecation warning is
displayed.

The CrushWrapper::get_osd_pool_default_crush_replicated_ruleset helper is
used to implement this behaviour.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 00:13:47 +01:00
Loic Dachary
2a7fcc35b8 osd: use CrushWrapper::add_simple_ruleset
Replace the manually crafted ruleset in OSDMap::build_simple_crush_map*
with calls to add_simple_ruleset. The generated ruleset do not have the
same behavior but that presumably do not cause any backward
compatibility problem because they are only created when a new cluster
is being initialized.

The prototypes of OSDMap::build_simple* are modified to allow for a
return code and display of a human readable error message.

The --osd-min-rep and --osd-max-rep configuration options are removed :
they were only used in the code that was removed.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 00:13:47 +01:00
Loic Dachary
a10fc025d7 osd: build_simple creates a single rule
The three rules created by build_simple are identical. They are replaced
by a single rule named replicated_rule which is set to be used by the
data, rbd and metadata pools.

Instead of hardcoding the ruleset number to zero, it is read from
osd_pool_default_crush_ruleset which defaults to zero.

The CEPH_DEFAULT_CRUSH_REPLICATED_RULESET enum is moved from osd_type.h to
config.h because it may be needed when osd_type.h is not included.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 00:13:47 +01:00
Loic Dachary
15b695937b crush: set min_rep and max_rep depending on mode
Assuming firstn is for replica and indep is for erasure. This is a
strong constraint but it is unlikely to make the resulting ruleset unfit
to be used in most cases.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-27 00:10:55 +01:00
Loic Dachary
da67f7c317 crush: add rule_type argument to add_simple_ruleset
Instead of hardcoded pg_pool_t::TYPE_REPLICATED

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 19:50:37 +01:00
Loic Dachary
2ae9c1c049 partially rename rule to ruleset
Where code is changed, get the opportunity to rename rule to ruleset to
improve naming consistency.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 18:56:50 +01:00
Josh Durgin
e244be1846 Merge branch 'leseb-doc-rbd-havana'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-26 09:55:17 -08:00
Sébastien Han
8b0b32bd1b doc: Add OpenStack Havana documentation
New features appeared during the Havana cycle.
This patch offers a general update of the doc.

Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
2013-12-26 09:55:03 -08:00
Loic Dachary
6e92ed1ea2 osd: factorize build_simple and build_simple_from_conf
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 10:01:19 +01:00
Loic Dachary
5cf2cdc073 qa: remove osd pool create erasure tests
Creating an erasure pool will crash the OSD because OSD::_make_pg
asserts if the type is not replicated. The tests related to erasure
coded pool creation are removed from qa/workunits/cephtool/test.sh.

The osd-create-pool.sh unit test covers the cases removed from test.sh
more extensively. The intent is to check the interactions with the MON
only, therefore it does not run an OSD and the absence of erasure code
placement group backend implementation is not an issue.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 10:01:19 +01:00
Loic Dachary
c6d876aa5f mon: osd-pool-create must not loop forever on kill
Looping forever on kill does not serve any useful purpose.
Reduce the verbosity of the exit trap to help diagnose error
conditions.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 10:01:19 +01:00
Loic Dachary
272eed3583 client: SyntheticClient uses the first available pool
It is unrelated to CEPH_DATA_RULE which is replaced by
SYNCLIENT_FIRST_POOL.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 09:12:03 +01:00
Loic Dachary
20b3da059d mon: MDS data and metadata pool numbers are hardcoded
The MDS assumes pool 0 and 1 are suitable for data and metadata
respectively. Instead of relying on the CEPH_DATA_RULE and
CEPH_METADATA_RULE constants that only match by chance, set a hardcoded
value specific to MDS to reduce the fragility of the hardcoded
assumption.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-26 09:12:03 +01:00
Haomai Wang
bf24317bcd Fix WBThrottle thread disappear problem
New ceph_osd.cc code did ObjectStore init work before global_init_daemonize(),
and WBThrottle thread is created when objectstore constructed. So after
daemon(), WBThrottle thread won't exist in new process. It will result in
deadlock.

When "cur_ios" which is member of WBThrottle hits hard limit, there exists two
ways to decrease "cur_ios". The first is WBThrottle thread which is dead if
deamonize, another is SyncThread. SyncThread will block at op_tp.pause()
because thread in op_tp(threadpool) block at
wbthrottle.throttle(FileStore::doop). So no thread will continue process jobs
in filestore layer and all threads is waiting.

Fix #7062 (http://tracker.ceph.com/issues/7062)

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2013-12-26 11:33:05 +08:00
Sage Weil
21a64c172c Merge pull request #995 from dachary/wip-deprecated
rados: deprecated attribute has no argument
2013-12-25 16:09:31 -08:00
Ilya Dryomov
87b8e54fae ceph_argparse: kill _daemon versions of argparse calls
Commit c76bbc2e6d, which introduced _daemon versions of some of the
argparse calls, also changed the behaviour of non-_daemon versions.
The change resulted in incorrect error messages, e.g.

  $ ./rbd create b0 --size
  rbd: extraneous parameter --size

instead of what should have been

  $ ./rbd create b0 --size
  Option --size requires an argument.

The users of _daemon versions were added in commit be801f6c50 and
removed in commit f26bd55e57, so just kill the _daemon versions and
restore the old behaviour.  (This effectively reverts commit
c76bbc2e6df1.)

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2013-12-25 21:41:16 +02:00
Loic Dachary
ea4724d5aa rados: deprecated attribute has no argument
The deprecated attribute argument was introduced in gcc 4.5
http://gcc.gnu.org/gcc-4.5/changes.html and centos6 has a lower version.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-25 10:44:07 +01:00
Loic Dachary
ab75df3c00 Merge pull request #988 from ceph/wip-crush-location
add 'crush location' config option

make check is ok

Reviewed-by: Loic Dachary <loic@dachary.org>
2013-12-25 01:07:04 -08:00
Sage Weil
7c9638f24b Merge pull request #993 from ceph/wip-librados-lock
Wip librados lock

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-24 10:51:01 -08:00
Yehuda Sadeh
e7bf5b2970 librados: lockless get_instance_id()
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-12-24 09:00:11 -08:00
Yehuda Sadeh
771da13b66 objecter, librados: create Objecter::Op in two phases
(currently only in some librados operations)
First create the op, only then lock and submit so that we reduce lock
contention.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-12-24 09:00:03 -08:00
Sage Weil
5ff30d6cf3 crush/CrushWrapper: note about get_immediate_parent()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-24 08:01:15 -08:00
Sage Weil
0cdbc97614 librados: mark old get_version() as deprecated
Use the newly-discovered (for me) deprecated attribute to mark the old
get_version() method and point users toward get_version64().  And fix a
couple of users in the kvstore code!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-24 07:58:08 -08:00
Sage Weil
006449ddb5 librados: deprecate aio_operate() read variant that takes snapid
The argument was ignored.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-24 07:58:07 -08:00
Sage Weil
909f8a42b6 librbd: localize or distribute parent (snap) reads
The parent is always a snapshot.  We may want to treat it differently
than other snaps by virtue of it (likely) being a more highly-shared
image.

By default, localize parent reads.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-24 07:58:07 -08:00
Sage Weil
22df773251 osdc/Objecter: use crush location and distance for LOCALIZE_READS
Use the hierarchy in the CRUSH map to determine what the closest
replica is.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-24 07:58:07 -08:00