Commit Graph

32728 Commits

Author SHA1 Message Date
Samuel Just
ebb865b12c Merge pull request #1603 from ceph/wip-7983
osd/ReplicatedPG: do not hit_set_persist while potentially backfilling hit_set_*

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-04 15:17:00 -07:00
Dan Mick
f2edd959fc Merge pull request #1604 from ceph/wip-7992
ceph-post-file: fix installation of ssh key files
2014-04-04 14:41:02 -07:00
Sage Weil
2f6a62b457 ceph-post-file: fix installation of ssh key files
Fixes: #7992
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 14:39:56 -07:00
Sage Weil
e02b7f93ab osd/ReplicatedPG: do not hit_set_persist while potentially backfilling hit_set_*
The hit_set transactions may include both a modify of the new hit_set and
deletion of an old one, spanning the backfill boundary, and we may end up
sending a backfill target a blank transaction that does not correctly
remove the old object.  Later it will notice the stray object and
throw an assertion.

Fix this by skipping hit_set_persist() if any of the backfill targets are
still working on the very first hash value in the PG (which is where all
of the hit_set objects live).  This is coarse but simple.

Another solution would be to send separate ops for the trim/deletion and
new hit_set update, but that is a bit more complex and a bit more
runtime overhead (twice the messages).

Fixes: #7983
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 13:56:33 -07:00
Sage Weil
4aef403dbc doc/release-notes: note about emperor backport of mon auth fix
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 12:59:41 -07:00
Joao Eduardo Luis
db266a3fb2 mon: MonCommands.h: have 'auth' read-only operations require 'x' cap
This reintroduces the same semantics that were in place in dumpling prior
to the refactoring of the cap/command matching code.

We haven't added this requirement to auth read-write operations as that
would have the potential to break a lot of well-configured keyrings once
the users upgraded, without any significant gain -- we assume that if
they have set 'rw' caps on a given entity, they are indeed expecting said
entity to be sort-of-privileged entities with regard to monitor access.

Fixes: #7919

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 12:51:27 -07:00
Samuel Just
82d2551c8c Merge pull request #1602 from ceph/wip-cache-create-fix
ReplicatedPG: fix CEPH_OSD_OP_CREATE on cache pools

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-04 10:34:40 -07:00
Ilya Dryomov
b219c8f917 ReplicatedPG: fix CEPH_OSD_OP_CREATE on cache pools
The following

./ceph osd pool create data-cache 8 8
./ceph osd tier add data data-cache
./ceph osd tier cache-mode data-cache writeback
./ceph osd tier set-overlay data data-cache

./rados -p data create foo
./rados -p data stat foo

results in

  error stat-ing data/foo: No such file or directory

even though foo exists in the data-cache pool, as it should.  STAT
checks for (exists && !is_whiteout()), but the whiteout flag isn't
cleared on CREATE as it is on WRITE and WRITEFULL.  The problem is
that, for newly created 0-sized cache pool objects, CREATE handler in
do_osd_ops() doesn't get a chance to queue OP_TOUCH, and so the logic
in prepare_transaction() considers CREATE to be a read and therefore
doesn't clear whiteout.  Fix it by allowing CREATE handler to queue
OP_TOUCH at all times, mimicking WRITE and WRITEFULL behaviour.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-04 20:23:14 +04:00
Sage Weil
2bd548e915 Merge pull request #1600 from ceph/wip-7922
Wip 7922

Passes my manual testing and the new teuthology test case.

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 09:22:42 -07:00
David Zafman
be8b228140 osd: Send REJECT to all previously acquired reservations
When getting a REJECT from a backfill target, tell already GRANTed targets to
go back to RepNotRecovering state by sending a REJECT to them.

Fixes: #7922

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-03 22:13:17 -07:00
Sage Weil
18201efd65 doc/release-notes: v0.79 release notes
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-03 18:28:15 -07:00
Dan Mick
4dc62669ec Fix byte-order dependency in calculation of initial challenge
Fixes: #7977
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 18:28:15 -07:00
Sage Weil
80a1ed8a74 Merge pull request #1599 from ceph/wip-7978
rgw: only look at next placement rule if we're not at the last rule

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 17:44:13 -07:00
Yehuda Sadeh
0552ecbabb rgw: only look at next placement rule if we're not at the last rule
Fixes: #7978
We tried to move to the next placement rule, but we were already at the
last one, so we ended up looping forever.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-04-03 15:15:41 -07:00
Sage Weil
31df91e091 osd: add 'osd debug reject backfill probability' option
This will make the OSD randomly reject backfill reservation requests.  This
exercises the failure code paths but does not break overall behavior
because the primary will back off and retry later.

This should help us reproduce #7922.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-03 12:06:08 -07:00
Sage Weil
90c4540b5b Merge pull request #1598 from ceph/wip-test-alloc-hint-ec-fix
qa: test_alloc_hint: set ec ruleset-failure-domain to osd

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 11:45:21 -07:00
Sage Weil
9f41975c40 Merge pull request #1581 from ceph/wip-init
a few deb changes
2014-04-03 11:44:29 -07:00
Ilya Dryomov
d323634024 qa: test_alloc_hint: set ec ruleset-failure-domain to osd
Create a custom profile with ruleset-failure-domain=osd.  (The default
ruleset-failure-domain=host won't do because this script assumes and
works only if all osds are on the same host.)  While at it, set k and m
explicitly to avoid troubles in the future.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-03 21:16:14 +04:00
Sage Weil
60d1975682 Merge pull request #1593 from dachary/wip-vstart-erasure-code-default
vstart: set a sensible default for ruleset-failure-domain

Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-03 09:57:49 -07:00
Sage Weil
cdcd8368a7 Merge pull request #1596 from ceph/wip-vstop-unmap
Unmap rbd images when stopping the whole cluster

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 09:57:04 -07:00
Ilya Dryomov
8e46fe00fa stop.sh: unmap rbd images when stopping the whole cluster
Unmap rbd images when stopping the whole cluster.  Not doing so results
in images that cannot be unmapped until the same cluster is brought
back up.  Issue a warning if we failed to unmap all images.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-03 18:14:57 +04:00
Ilya Dryomov
afc5dc530c stop.sh: do not trace commands
Command tracing here doesn't bring any value and simply pollutes the
terminal, as the script always runs to completion.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-03 18:14:57 +04:00
Ilya Dryomov
0110a19b50 stop.sh: indent 4 spaces universally
Currently there is a mix between tabs and 4 spaces indent.  Switch to
4 spaces indent.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-03 18:03:23 +04:00
Loic Dachary
e4a8535ad1 vstart: set a sensible default for ruleset-failure-domain
Set ruleset-failure-domain=osd so that

  ./ceph osd pool create ecpool 12 12 erasure
  ./rados --pool ecpool put SOMETHING /etc/group

works by default. When using a vstart cluster the default failure
domain (host) won't work because all OSDs are in "localhost".

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-04-03 14:07:19 +02:00
Josh Durgin
89f38c09f8 Merge pull request #1592 from ceph/wip-7965
lockdep: fix when instantiated multiple times (bug 7965)

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-02 17:03:09 -07:00
Sage Weil
c43822cdaf lockdep: reset state on shutdown
If we shut down, clear out all of the lockdep state.  This ensures that if
we start up again on another cct, we will not be confused by old type ids
and dependency state.

Possibly contributed to #7965.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-02 16:46:33 -07:00
Sage Weil
7a49f3da55 lockdep: do not initialize if already started
If we have already registered a cct for lockdep, do not accept another one.
We already check that the cct matches when we shut down.  This we will run
for the life span of a single cct and no longer.

Fixes: #7965
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-02 16:46:30 -07:00
Samuel Just
eae5a37779 Merge pull request #1591 from ceph/wip-7915
mon: bump snap_epoch when adding a tier (fixes 7915)

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-02 16:13:59 -07:00
Sage Weil
6bf46e23e0 OSDMap: bump snap_epoch when adding a tier
When we make an existing pool a tier, we start copying the snap metadata
from the base tier.  That includes removed_snaps.  In order for the OSD
to recognize that this value is changing for the first time, we need to
set snap_epoch, or else the OSD doesn't update it's in-memory PGPool
with removed snaps and we eventually hit an assertion failure because
PGPool::cached_remove_snaps is incorrect (e.g., empty).

Fix this by bumping snap_epoch when we add the new tier.

Fixes: #7915
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-02 16:03:37 -07:00
Samuel Just
27e353ccc1 Merge pull request #1580 from ceph/wip-7937
osd: fix scrub logic for snapdir object

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-02 15:15:56 -07:00
Samuel Just
01445d5c62 ReplicatedPG::_scrub: don't bail early for snapdir
Fixes: #7937
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-02 15:12:41 -07:00
Mohammad Salehe
7909262f21 debian: fix control to allow upgrades
Signed-off-by: Mohammad Salehe <salehe+dev@gmail.com>
2014-04-02 11:29:38 -07:00
Sage Weil
250a10296b Merge pull request #1590 from ceph/wip-7939
PG: set role for replicated even if role != shard

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-02 10:52:11 -07:00
Samuel Just
d6258b63e5 Merge pull request #1579 from ceph/wip-7907
osd/ReplicatedPG: mark_unrollbackable when _rollback_to head

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-02 10:35:38 -07:00
Sage Weil
17732dc0c8 debian: move rbdmap config and sysvinit/upstart scripts into ceph-common
Fixes: #7171
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-02 10:29:08 -07:00
Sage Weil
86a032f2c2 Merge pull request #1586 from ceph/wip-dirfrag
mds: fix check for merging/spliting dirfrag

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-02 08:44:33 -07:00
Sage Weil
84e62e9f0e Merge pull request #1587 from onlyjob/debian
init.d: correcting rbdmap LSB header / init order:

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-02 08:43:02 -07:00
Dmitry Smirnov
1d42de5446 init.d: correcting rbdmap init order:
* Require "$remote_fs" since it guarantees /usr availability
   (rbd executable is in /usr/bin/rbd)
 * Speed-up init.d rbd mapping on machines acting as MON/OSD
   by starting rbdmap after /init.d/ceph (when possible) and
   shutting down rbd before ceph.
 * Map rbd devices before starting X (helpful when /home is mounted from rbd).
2014-04-03 01:25:28 +11:00
Yan, Zheng
771e88a401 mds: fix check for merging/spliting dirfrag
check actual number of items instead of number of cached items

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 15:32:33 +08:00
Sage Weil
edb8a5965e Merge pull request #1583 from ceph/wip-largedir
Wip largedir

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-01 21:47:51 -07:00
Yan, Zheng
43bc39beab mds: ignore CDir::check_rstats() when debug_scatterstat is off
It uses lots of CPU when dirfrag is large

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 12:19:34 +08:00
Yan, Zheng
5a9b99aa91 mds: initialize bloom filter according to dirfrag size
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 12:19:34 +08:00
Yan, Zheng
16af25fba3 mds: add dentries in dirfrag to LRU in reverse order
Files in a dirfrag are usually processed in the order of readdir
results. Files at the beginning of are more likely to be used in
the future than files at the last.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 12:19:26 +08:00
Sage Weil
d351e5fb12 Merge pull request #1584 from ceph/wip-multimds
Wip multimds

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-01 21:01:07 -07:00
Yan, Zheng
06ecb2c74c mds: handle freeze authpin race
For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 11:03:11 +08:00
Yan, Zheng
d1967f3251 mds: treat cluster as degraded when there is clientreplay MDS
This forbids exporting subtrees and fragmenting dirfrags when there
is MDS in clientreplay state. During replaying client requests, the
MDS may need to authpin some remote objects. Exporting subtrees and
fragmenting dirfrags slow down replaying client requests.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 11:03:10 +08:00
Yan, Zheng
b65a818407 mds: don't start new segment while finishing disambiguate imports
This avoid inserting ESubtreeMap among EImportFinish events that
finish disambiguate imports. Because the ESubtreeMap reflects the
subtree state when all EImportFinish events are replayed.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 11:03:10 +08:00
Yan, Zheng
ff44a99a59 mds: trim non-auth subtree more aggressively
When a non-auth dirfrag is pinned by uncommitted slave update,
there still can be non-auth child dirfrags that are trimmable.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-02 11:03:00 +08:00
Sage Weil
e095b1d493 debian: make ceph-common own etc/ceph, var/log/ceph
Clients can make use of these directories, and ceph-common is required by
ceph, so nothing should break here.

Change the purge postrm script to be for ceph-common (it does nothing else).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-01 14:42:30 -07:00
Sage Weil
d4d39a01ca osd/ReplicatedPG: mark_unrollbackable when _rollback_to head
We fell into the case in _rollback_to where we just set ctx->modify = true
and don't explicitly mark the ctx and unrollbackable.  Later, we screw up
in proc_replica_log as a result because we think we can rollback this
update to the head when in reality we cannot.

Fixes: #7907
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-01 14:27:31 -07:00