Commit Graph

25607 Commits

Author SHA1 Message Date
John Wilkins
f24dbdefa4 doc: Added Create a Cluster section.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-26 14:04:16 -07:00
John Wilkins
b631cc6713 doc: Added ceph-deploy package management (install | uninstall ) section.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-26 14:03:44 -07:00
John Wilkins
d85c6904db doc: Added new quick start preamble and index.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-26 14:02:42 -07:00
John Wilkins
3ff7eef99d doc: Added ceph-deploy preflight.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-26 14:01:46 -07:00
John Wilkins
9365674036 doc: Added ceph-deploy quick start.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-26 14:01:20 -07:00
David Zafman
e0c39c1e21 Merge branch 'wip-4822' into next
Reviewed-by: Sam Just <sam.just@inktank.com>
2013-04-26 13:31:48 -07:00
Greg Farnum
ebbdef29fa monitor: squash signed/unsigned comparison warning
This is a safe range to do comparisons against, and we compare
against the signed rank inside the loop.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-26 12:37:03 -07:00
Greg Farnum
5fa3cbf520 mon: use brute force to find a sync provider if our first one fails
We try and select a random monitor first, but if that fails we should
make sure that nobody's available before asserting.

Fixes #4812

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-26 12:32:10 -07:00
Yehuda Sadeh
56ac098b88 Merge branch 'wip-4760' into next 2013-04-26 12:33:03 -07:00
Sage Weil
a92b4c7558 Merge branch 'wip-mon-fwd' into next
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-26 12:24:31 -07:00
Yehuda Sadeh
c880e9578e rgw: fix compilation for certain architectures
Casting.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Yehuda Sadeh
a8b1bfa1cc rgw: fix list buckets limit
There was an issue when limit was being set, we didn't
break from the iterating loop if limit was reached. Also,
S3 does not enforce any limit, so keep that behavior.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Yehuda Sadeh
f2df87625c rgw: fix bucket listing when reaching limit
Bucket listing was broken when limit was set.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Yehuda Sadeh
2264078a61 rgw: swift list containers can return 204
In order to keep compatibility with swift, if a plain formatter
is being used, we should return 204 when there are no containers.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Yehuda Sadeh
960eac2600 rgw: fix plain formatter flush
The plain formatter flush needs to append eol if needed, and
not to clear the sections stack.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Yehuda Sadeh
7144ae8624 rgw: fix bucket count when stating account
We need to add up the num of buckets and not just set it
as we don't read the entire list of buckets in one operation.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Yehuda Sadeh
1670a2bffb rgw: trivial cleanups post code review
Following code review of #4760.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-26 12:10:14 -07:00
Dan Mick
98f532e800 Makefile.am: Add -lpthread to fix build on newer ld in Raring Ringtail
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-26 12:05:22 -07:00
Sage Weil
cbc3b91cef mon: mark PaxosServiceMessage forward fields deprecated
These are no longer used; we manage forward state explicitly via the
Monitor sessions instead.  Mark them deprecated so we don't accidentally
rely on them.  Also, fix the annoying "mon.-1" garbage debug output that
is confusing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-26 10:48:49 -07:00
Sage Weil
77c068d1af mon: fix double-forwarding check
The PaxosServiceMessage fields are no longer filled in.  Use Session::proxy_con
instead.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-26 10:48:10 -07:00
David Zafman
e3b602adf7 osd: Fix logic in OSDMap::containing_subtree_is_down()
Check for up OSDs as we walk up the crushmap hierarchy

fixes: #4822

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-04-26 10:24:43 -07:00
Dan Mick
a2a23ccd95 debian/rules: use multiline search to look for Build-Depends
When Build-Depends was split into multiple lines (in commit
8f5c665744), the grep for
libgoogle-perftools-dev broke.  Replace grep with perl for multiline
matching.

Fixes: #4818
Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 89692e099f)
2013-04-26 10:19:04 -07:00
Sage Weil
f768fbba24 client: re-fix cap releases
Encode cap releases if NOT replay.  <facepalm>  Thanks, Greg!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-26 10:12:37 -07:00
Sam Lang
5121e56c25 client: don't embed cap releases in clientreplay
If the client is sending replay requests, avoid sending embedded caps,
since the mds already has the client's caps from the reconnect.
This matches the behavior of the kernel client.

Fixes #4742.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-26 09:51:45 -07:00
Sage Weil
2146930ef0 mon: do not forward other mon's requests to other mons
The request forwarding infrastructure is there for client requests.
However, we (ab)use it for mon's sending MLog messages: LogClient sends an
MLog message to itself, and that is either handled locally (if leader) or
forwarded to the leader.

If that races with an election, we were forwarding an MLog from another mon
to the leader.  This is not necessary; the original MLog sender will resend
the request on election_finish() to the latest leader.

The fix is to adjust forward_request_leader() to only forward messages from
a mon if that mon is itself.

This was reproduced while testing the fix for #4748.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-25 16:47:15 -07:00
Samuel Just
a5cade1fe7 PG: clear want_acting when we leave Primary
This is somewhat annoying actually.  Intuitively we want to
clear_primary_state when we leave primary, but when we restart
peering due to a change in prior set status, we can't afford
to forget most of our peering state.  want_acting, on the
other hand, should never persist across peering attempts.
In fact, in the future, want_acting should be pulled into
the Primary state structure.

Fixes: #3904
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-04-25 16:24:41 -07:00
Sage Weil
3ce35a6743 mon: get own entity_inst_t via messenger, not monmap
There are intervals during bootstrap(*) during which we are part of the
monmap, but our name (mon->name) does not match the monmap's.  This means
that calling monmap->get_inst(mon->name) is not a safe way to get our own
entity_inst_t.

Instead, use messenger->get_myinst().  This includes our addr (obviously)
and an up-to-date entity_name_t, too: in bootstrap we adjust the messenger
name at the same time as mon->rank, based on the contents of the monmap.

monmap->get_inst(mon->rank) would work too.

* During mkfs, the monmap may have noname-foo instead of the name if it was
  generated from the mon_host lines or dns or whatever by
  MonMap::build_initial().  This was the case for #4811.

Fixes: #4811
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-25 15:18:55 -07:00
Sage Weil
b0ba41235a Merge pull request #239 from ceph/wip-4760
#4760

Second patch Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-25 13:11:59 -07:00
Sage Weil
42ab1f4561 Merge pull request #246 from ceph/wip-4793
#4793

Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-25 11:52:30 -07:00
Li Wang
303e739e5b radosgw: receiving unexpected error code while accessing an non-existing object by authorized not-owner user
This patch fixes a bug in radosgw swift compatibility code,
that is, if a not-owner but authorized user access a non-existing
object in a container, he wiil receive unexpected error code,
to repeat this bug, do the following steps,

1 User1 creates a container, and grants the read/write permission to user2

curl -X PUT -i -k -H "X-Auth-Token: $user1_token" $url/$container
curl -X POST -i -k -H "X-Auth-Token: $user1_token" -H "X-Container-Read:
$user2" -H "X-Container-Write: $user2" $url/$container

2 User2 queries the object 'obj' in the newly created container
by using HEAD instruction, note the container currently is empty

curl -X HEAD -i -k -H "X-Auth-Token: $user2_token" $url/$container/obj

3 The response received by user2 is '401 Authorization Required',
rather than the expected '404 Not Found', the details are as follows,

HTTP/1.1 401 Authorization Required
Date: Tue, 16 Apr 2013 01:52:49 GMT
Server: Apache/2.2.22 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 12
Vary: Accept-Encoding

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-25 11:36:50 -07:00
Sage Weil
cd7e52cc76 init-ceph: use remote config when starting daemons on remote nodes (-a)
If you use -a to start a remote daemon, assume the remote config is present
instead of pushing the local config.  This makes more sense and simplifies
things.

Note that this means that -a in concert with -c foo means that foo must
also be present on the remote node in the same path.  That, however, is a
use case that I don't particularly care about right now.  :)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-04-25 11:13:33 -07:00
Sage Weil
ea54e6603f Merge branch 'wip-4748-b' into next
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-25 10:21:11 -07:00
David Zafman
f4804849b7 Merge branch 'wip-4778' into next
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-24 17:33:00 -07:00
David Zafman
ac3dda214d scrub clears inconsistent flag set by deep scrub
Add new num_deep_scrub_errors and num_shallow_scrub_errors to object_stat_sum_t
Show deep-scrub error count when outputing regular scrub errors
Set invalid size in case of a stat error which sets read_error
For now do deep-scrub after repair (see #4783)

fixes: #4778
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-04-24 17:32:39 -07:00
Josh Durgin
4322522028 Merge pull request #242 from ceph/wip-objectcacher-enoent
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-04-24 16:20:59 -07:00
Josh Durgin
82d5cd601e ObjectCacher: remove all buffers from a non-existent object
Once we're sure an object doesn't exist, we retry all the waiters in
order, and they return -ENOENT immediately. If there were a bunch of
BufferHeads waiting for data (rx state), they would be left behind
while the reads that triggered them were complete from the cache
user's perspective. These extra rx BufferHeads would pin the object in
the lru, so they wouldn't be removed by release_set(). This meant that
the assert during shutdown of the cache would be triggered.

To fix this, remove any BufferHeads in this state immediately when we
find out the object doesn't exist. Use the same condition as readx for
determining whether this is safe - if we got -ENOENT and all
BufferHeads for the object are clean or rx.

Fixes: #3664
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-24 15:54:07 -07:00
Greg Farnum
fcaabf1a22 mon: when electing, be sure acked leaders have new enough stores to lead
In general anybody participating in an election should be new enough to
lead thanks to the bootstrap process, but we've observed situations in
which a monitor is leader but gets so busy that it gets booted out
without noticing for a while, then processes the election messages
which were spawned, responds to them, and the other monitors kick those
up to a new election epoch. Then the old and behind monitor gets
elected as the new leader, which does bad things to our sync.

To deal with this, add the paxos first and last committed versions
to the MMonElection messages, and consider those values when deciding
whether to defer to a peer. Only defer to them if their newest value
is newer than our oldest, but also *do* defer to them if their oldest
value is newer than our newest even if we out-rank them otherwise.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-24 15:40:13 -07:00
Greg Farnum
fb8bad3105 mon: be more careful about making sure we're up-to-date on sync check
We were looking at our own paxos_max_join_drift and using that to
calculate whether we were new enough to join without syncing, but
if those numbers don't match across monitors they might have trimmed. Use
the number they provide us as their first version and compare to that
as well.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-24 15:40:13 -07:00
Sage Weil
290b5eb0f1 rgw: fix i386 compile error
error: rgw/rgw_op.cc:665:63: no matching function for call to ‘min(uint64_t, size_t&)’

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-24 15:07:28 -07:00
Samuel Just
14f2392263 FileStore::_split_collection: src or dest may be removed on replay
If the collection is subsequently removed, the _split_collection
might get replayed and find either src or dest removed.

Fixes: #4806
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-24 15:05:41 -07:00
Sage Weil
3604c98232 librados: fix calc_snap_set_diff interval calculation
When calculating the [a,b] interval over which a given clone is valid, do
not assume that b == the clone id; that is *not* true when the original
end snap has been deleted/trimmed.

While we are here, make the code a bit cleaner to read.

Fixes: #4785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-24 14:34:40 -07:00
Samuel Just
5668e5b5a4 Merge remote-tracking branch 'upstream/wip_2476' into next
Fixes: #2476
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-24 14:04:07 -07:00
Samuel Just
81a6165c13 PG: call check_recovery_sources in remove_down_peer_info
If we transition out of peering due to affected
prior set, we won't trigger start_peering_interval
and check_recovery_sources won't get called.  This
will leave an entry in missing_loc_sources without
a matching missing set.  We always want to
check_recovery_sources with remove_down_peer_info.

Fixes: 4805
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-24 13:20:44 -07:00
Sage Weil
a9791dae1b mon: send clients away while sychronizing
When we are out of quorum, we waitlist client messages or (eventually)
send them elsewhere.  If we are synchronizing, do the same.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-24 12:26:37 -07:00
Sage Weil
12bc9a7aa9 mkcephfs: give mon. key 'allow *' mon caps
This will ease the transition from mkcephfs to ceph-deploy by allowing
ceph-create-keys to use the mon. keyring file in $mon_data and get the
caps it needs.

Fixes: #4756
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-24 11:23:25 -07:00
Josh Durgin
cce1c91ae8 PendingReleaseNotes: note about rbd resize --allow-shrink
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-24 10:16:03 -07:00
Yehuda Sadeh
9abec309e8 rgw: list container only shows stats if needed
Fixes: #4759
Add a new request param 'stats' for the swift list containers
request. If set to 'false' it disables stats retrieval, which
makes it go faster. Also, don't dump stats if format is plain,
as they're not going to be dumped.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-24 08:49:28 -07:00
Sage Weil
c7a0477bad rbd: fix cli-integration tests for striping change
We don't set the striping feature when we are using backward-compatible
(default) striping now; fix the test accordingly.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-24 08:36:06 -07:00
Gary Lowell
446641aa34 95-ceph-osd-alt.rules: Fix missing parent parameter
Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-04-24 08:22:04 -07:00
Samuel Just
1f7ff412ab ReplicatedPG: timeout watches based on last_became_active
This way a notify on an object with a single defunct watcher
won't necessarily have to wait the full timeout if the pg
has been active for a while.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-23 20:54:57 -07:00