Commit Graph

27705 Commits

Author SHA1 Message Date
Yehuda Sadeh
7cd0bd85d4 rgw: bucket entry point object ver fixes
Multiple fixes:
 - sync master, secondary entry point ver on creation
 - use correct entry point version when removing entry point
 - check correct version on bucket removal

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-19 13:21:49 -07:00
Yehuda Sadeh
89ecba209b rgw: remove s->objv_tracker
was never initialized correctly anyway. It was only supposed to
be used for buckets, but it was never initialized in that case.
Using s->bucket_info.objv_tracker instead.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-19 13:21:49 -07:00
Yehuda Sadeh
85f3f09b0a rgw: forward delete bucket request to master after removal
We can only forward the bucket removal to the master if it was
successfully removed locally.
The master region has no knowledge about whether the
bucket can be removed or not, e.g., there are still objects in the
bucket. If we send it to the master first, then it'll happily remove it
even though it might fail in the end.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-19 13:21:49 -07:00
Yehuda Sadeh
989a4d93d8 rgw: adjust error for bucket removal on secondary region
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-19 13:21:49 -07:00
Yehuda Sadeh
2e51823563 rgw: forward x_amz_meta headers when forwarding a request
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-19 13:21:49 -07:00
Yehuda Sadeh
4f4bdbd5cb rgw: fix bucket re-creation on secondary region
We had a problem with bucket recreation, where we identified
that bucket has already existed, but missed the fact that it's
the same bucket, so removal of the bucket index was wrong.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-19 13:21:49 -07:00
Sage Weil
0de708516c mon/MonClient: fix small leak
We need to delete the version_req_d here.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:48 -07:00
Sage Weil
d1b83be14c msgr: mark addr-based [lazy_]send_message and get_connection deprecated
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:48 -07:00
Sage Weil
11c47cc4e3 client: mark_down by con
We have the con handy; use it.  This avoids generate a spurious RESET
event, which we do not need or do anything useful with.  Note that in this
case we are not attaching anything to the Connection priv field.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:48 -07:00
Sage Weil
000d4d38a4 mon: mark_down session by con, not addr
We have the ConnectionRef here; use it.  This avoids generating a spurious
RESET event for the connection.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:48 -07:00
Sage Weil
30de04066d mon: break con <-> session ref cycle in mon even if shutting down
If we get a reset during shutdown, we should still break the cycle to avoid
tripping the valgrind leak detection.  Note that we are touching no
internal Monitor state here and the locking has not changed.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:48 -07:00
Sage Weil
564075c9ad msg/SimpleMessenger: remove duplicated interface docs
Document these in the interface, not the implementation; having two copies
clutters the header and invites them to get out of sync.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:47 -07:00
Sage Weil
27868ca5ac msgr: update docs for mark_down, mark_down_all semantics
* RESET events
* note that the reset detection only happens if it is enabled in the
  policy.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:47 -07:00
Sage Weil
8dcf0b199a msgr: generate reset event on mark_down to addr (not con)
If the caller is marking down an addr, they presumably don't have the
Connection* handy, so we should generate a reset event to help them
clean up con <-> session ref cycles.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-19 13:21:47 -07:00
Sage Weil
bfadcd2a0e osd/ReplicatedPG: fix obc leak on invalid LIST_SNAPS op
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-19 13:21:47 -07:00
Sage Weil
561ac0b173 osd: break con <-> session cycle when marking down old peers
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-19 13:21:47 -07:00
Sage Weil
41c67e0236 osd: make ms_handle_reset debug more useful
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-19 13:21:47 -07:00
Sage Weil
4ed7942997 init-ceph: don't activate-all for vstart clusters
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-18 17:10:51 -07:00
Sage Weil
f9e9f9cb19 mon/PGMonitor: fix 'pg map' output key names
This got lost in a big file of fixes a while back.  :/

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-18 16:53:23 -07:00
Samuel Just
9ab539eaae PG: add perf counter for peering latency
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-18 15:13:50 -07:00
Sage Weil
921a4aac8a cls_lock: fix duration test
It's possible for us to just be really slow when getting the reply to the
first op or doing the second op, resulting in a successful lock.  If we
do get a success, assert that at least that amount of time has passed to
avoid any false positives.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-18 14:08:41 -07:00
Yan, Zheng
dd0246d229 mds: tracedn should be NULL for LOOKUPINO/LOOKUPHASH reply
Fixes: #5658
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-18 14:08:23 -07:00
Samuel Just
f3f92fe210 FileStore: add global replay guard for split, collection_rename
In the event of a split or collection rename, we need to ensure that
we don't replay any operations on objects within those collections
prior to that point.  Thus, we mark a global replay guard on the
collection after doing a syncfs and make sure to check that in
_check_replay_guard() for all object operations.

Fixes: #5154
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-18 13:24:02 -07:00
Sage Weil
723d691f7a msg/Pipe: do not hold pipe_lock for verify_authorizer()
We shouldn't hold the pipe_lock while doing the ms_verify_authorizer
upcalls.

Fix by unlocking a bit earlier, and verifying our state is still correct
in the failure path.

This regression was introduced by ecab4bb951.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-07-18 12:17:08 -07:00
Sage Weil
29c0252dc4 mon: fix off-by-one in check for when sync falls behind
This is what e213b1bc25 intended to do
but managed to bungle by using >= instead of >.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-18 08:38:59 -07:00
Sage Weil
59f3455e48 Merge pull request #444 from ceph/wip-osd-latency
osd: include op queue age histogram in osd_stat_t

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-17 22:03:07 -07:00
Sage Weil
07dfb6f4af rgw: drop unused assignment
rgw/rgw_rados.cc: In member function 'virtual int RGWPutObjProcessor_Atomic::handle_data(ceph::bufferlist&, off_t, void**)':
rgw/rgw_rados.cc:648:5: warning: parameter 'ofs' set but not used [-Wunused-but-set-parameter]

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-17 21:36:58 -07:00
Sage Weil
aa460c414f mon: make 'health' warn about slow requests
Currently we see slow request warnings go by in the cluster log, but they
are not reflected by 'ceph health'.  Use the new op queue histograms to
raise a flag there as well.

For example:

HEALTH_WARN 59 requests are blocked > 32 sec; 2 osds have slow requests
21 ops are blocked > 65.536 sec
38 ops are blocked > 32.768 sec
16 ops are blocked > 65.536 sec on osd.1
23 ops are blocked > 32.768 sec on osd.1
5 ops are blocked > 65.536 sec on osd.2
15 ops are blocked > 32.768 sec on osd.2
2 osds have slow requests

Fixes: #5505
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 18:21:12 -07:00
Sage Weil
82722efaea osd: include op queue age histogram in osd_stat_t
This includes a simple power-of-2 histogram of op ages in the op queue
inside osd_stat_t.  This can be used for a coarse view of overall cluster
performance (it will get summed by the mon), to identify specific outlier
osds who have a higher latency than the others, or to identify stuck ops.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-17 18:21:12 -07:00
Sage Weil
2e216b5474 qa/workunits/cephtool/test.sh: test 'osd create <uuid>'
Make sure it gives us back the same id.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-07-17 18:18:51 -07:00
Samuel Just
b41f1ba485 PG: start flush on primary only after we process the master log
Once we start serving reads, stray objects must have already
been removed.  Therefore, we have to flush all operations
up to the transaction writing out the authoritative log.
On replicas, we flush in Stray() if we will not eventually
be activated and in ReplicaActive if we are in the acting
set.  This way a replica won't serve a replica read until
the store is consistent.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-17 18:11:39 -07:00
Samuel Just
278c7b5922 ReplicatedPG: replace clean_up_local with a debug check
Stray objects should have been cleaned up in the merge_log
transactions.  Only on the primary have those operations
necessarily been flushed at activate().

Fixes: 5084
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-17 18:11:27 -07:00
Greg Farnum
1a84411209 msgr: fix a typo/goto-cross from dd4addef2d
We didn't build or review carefully enough!

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-17 15:23:12 -07:00
Sage Weil
ea1c623406 Merge pull request #441 from ceph/wip-5626
msgr fixes for lossless peer sessions

Reviewed-by: Greg Farnum <greg@inktank.com>
2013-07-17 14:50:41 -07:00
Sage Weil
57bd6fd51b osd: make 'from dead osd' message more informative
I thought I saw some weirdness here.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:39:04 -07:00
Sage Weil
16568d9e1f msg/Pipe: a bit of additional debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:39:04 -07:00
Sage Weil
ecab4bb951 msg/Pipe: hold pipe_lock during important parts of accept()
Previously we did not bother with locking for accept() because we were
not visible to any other threads.  However, we need to close accepting
Pipes from mark_down_all(), which means we need to handle interference.

Fix up the locking so that we hold pipe_lock when looking at Pipe state
and verify that we are still in the ACCEPTING state any time we retake
the lock.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:39:04 -07:00
Sage Weil
687fe888b3 msgr: close accepting_pipes from mark_down_all()
We need to catch these pipes too, particularly when doing a rebind(),
to avoid them leaking through.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:39:04 -07:00
Sage Weil
dd4addef2d msgr: maintain list of accepting pipes
New pipes exist in a sort of limbo before we know who the peer is and
add them to rank_pipe.  Keep a list of them in accepting_pipes for that
period.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:39:04 -07:00
Sage Weil
994e2bf224 msgr: adjust nonce on rebind()
We can have a situation where:

 - we have a pipe to a peer
 - pipe goes to standby (on peer)
 - we rebind to a new port
 - ....
 - we rebind again to the same old port
 - we connect to peer

and get reattached to the ancient pipe from two instances back.  Avoid that
by picking a new nonce each time we rebind.

Add 1,000,000 each time so that the port is still legible in the printed
output.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:38:57 -07:00
Sage Weil
07a0860a18 msgr: mark_down_all() after, not before, rebind
If we are shutting down all old connections and binding to new ports,
we want to avoid a sequence like:

 - close all prevoius connections
 - new connection comes in on old port
 - rebind to new ports
 -> connection from old port leaks through

As a first step, close all connections after we shut down the old
accepter and before we start the new one.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:36:37 -07:00
Sage Weil
ad548e72fd msg/Pipe: unlock msgr->lock earlier in accept()
Small cleanup.  Nothing needs msgr->lock for the previously larger
window.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:34:40 -07:00
Sage Weil
9f1c272618 msg/Pipe: avoid creating empty out_q entry
We need to maintain the invariant that all sub queues in out_q are never
empty.  Fix discard_requeued_up_to() to avoid creating an entry unless we
know it is already present.

This bug leads to an incorrect reconnect attempt when

 - we accept a pipe (lossless peer)
 - they send some stuff, maybe
 - fault
 - we initiate reconnect, even tho we have nothing queued

In particular, we shouldn't reconnect because we aren't checking for
resets, and the fact that our out_seq is 0 while the peer's might be
something else entirely will trigger asserts later.

This fixes at least one source of #5626, and possibly #5517.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:34:40 -07:00
Sage Weil
579d858aab msg/Pipe: assert lock is held in various helpers
These all require that we hold pipe_lock.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 14:34:39 -07:00
Joao Eduardo Luis
0ebf23cee8 ceph_mon: obtain backup monmap if store is marked with 'force_sync'
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-17 14:12:13 -07:00
Sage Weil
d1501938f5 mon/OSDMonitor: make 'osd pool mksnap ...' not expose uncommitted state
We were returning success without waiting if the pending pool state had
the snap.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-17 09:44:50 -07:00
Sage Weil
56c5b83589 qa/workunits/cephtest/test.sh: put 'osd ls' before any 'osd create' tests
A monc/mon connection fault or the dup command test flag may mean an extra
osd id is created that we isn't actually up; reorder so that doesn't screw
up 'osd ls'.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-17 09:36:36 -07:00
Joao Eduardo Luis
ad9a1044db mon: MonCommands: remove obsolete 'sync status' command
Obsoleted by the sync refactor from
da0aff28ab

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-17 09:26:20 -07:00
Samuel Just
884fa2fcb6 OSD::_try_resurrect_pg: fix cur/pgid confusion
This bug prevented resurrection of ancestor pgs where
necessary.

Fixes: #5269
This may result in pg A being created just before pg B
is resurrected and split into A and B resulting in one
or the other operations getting and EEXIST.

Backport: cuttlefish
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-16 17:33:27 -07:00
Sage Weil
7e16b72dc3 mon/AuthMonitor: make 'auth del ...' idempotent
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-16 17:21:33 -07:00