Commit Graph

29547 Commits

Author SHA1 Message Date
Samuel Just
9ec35d5ccf ReplicatedPG: replace backfill_pos with last_backfill_started
last_backfill_started reflects what pinfo.last_backfill will be
once all currently outstanding backfills complete.  backfill_pos
was tricky since we couldn't correctly inialize it without
doing the first backfill scan pair.

In recover_backfill, we rescan from last_backfill_started rather
than from backfill_pos.  This ensures that we capture all clones
created between last_backfill_started and what previously had been
backfill_pos without special handling in make_writeable.  The main
downside is that we will tend to "rescan" last_backfill_started.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 16:03:59 -07:00
Samuel Just
8774f03d39 PG::BackfillInfo: introduce trim_to
We'll use this to trim off last_backfill_started since it'll
often be included in rescans.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 16:03:58 -07:00
Samuel Just
46dfd91975 PG::BackfillInterval: use trim() in pop_front()
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 16:03:58 -07:00
Samuel Just
0a9a2d7b9c ReplicatedPG::prepare_transaction: info.last_backfill is inclusive
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 16:03:58 -07:00
Sage Weil
5939eaceb0 upstart: fail osd start if crush update fails
If the update for the CRUSH position fails for some reason, do not
start the OSD.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-28 15:58:29 -07:00
Sage Weil
177e2ab1ca init-ceph: make crush update on osd start time out
If the monitor is not currently available, this crush update would block
forever, preventing the OSD and (potentially) the rest of the system
from starting up.  Instead, make it time out after 10 seconds and then
abort startup.  This prevents startup of an OSD if we failed to update
the CRUSH position for some reason.

In fact, do not start up the OSD if the CRUSH update fails for any
reason--not just a timeout!

Works-around: #5612
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-28 15:58:29 -07:00
Sage Weil
ac8dcdbeed Merge pull request #778 from ceph/wip-6621
radosgw-admin: accept negative values for quota params

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-28 14:28:25 -07:00
Yehuda Sadeh
d5d36d0baa radosgw-admin: accept negative values for quota params
and document that in the usage output.

Fixes: #6621

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-10-28 14:15:43 -07:00
athanatos
7cbfdbf38d Merge pull request #760 from ceph/wip-6585
Wip 6585

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-10-28 13:50:34 -07:00
Samuel Just
8db03ed027 ReplicatedBackend: don't hold ObjectContexts in pull completion callback
We need flushing the sequencer to ensure that all Contexts which hold
ObjectContextRefs have been run or deleted.
C_ReplicatedBackend_OnPullComplete, however, gets queued in a second
work queue in order to avoid performing expensive push related reads
in the FileStore finisher.

Rather than keep the objects contexts around, we instead put off
removing the object from the pulling map until the call back
fires and read the object context out of the pulling map.  This
way the ObjectContextRef will be cleaned up along with the rest
of the pulling map in on_change.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:35:17 -07:00
Samuel Just
5a416dab6e ReplicatedPG: put repops even in TrimObjects
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:35:17 -07:00
Samuel Just
420182a1e8 ReplicatedPG: improved on_flushed error output
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:35:17 -07:00
Samuel Just
ce33892271 PG: call on_flushed on FlushEvt
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:35:10 -07:00
Samuel Just
6f975e35a1 PG,ReplicatedPG: remove the waiting_for_backfill_peer mechanism
See previous patch.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:34:17 -07:00
Samuel Just
3d0d69fed0 ReplicatedPG: have make_writeable adjust backfill_pos
If we are writing to backfill_pos and create a clone, we end
up failing to send the transaction creating the clone to the
backfill peer.  This is fine as long as we end up backfilling
the clone.  To that end, we simply add the clone to
backfill_info and adjust backfill_pos accordingly.  This is less
brittle than the waiting_for_backfill_pos mechanism since it
works even if we wait between that check and issuing the repop,
which can happen for copy_from.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:34:16 -07:00
Samuel Just
3de32bd368 ReplicatedBackend: fix failed push error output
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:34:16 -07:00
Samuel Just
807dde4814 ReplicatedPG,osd_types: move rw tracking from its own map to ObjectContext
We also modify recovering to hold a reference to the recovering obc
in order to ensure that our backfill_read_lock doesn't outlive the
obc.

ReplicatedPG::op_applied no longer clears repop->obc since we need
it to live until the op is finally cleaned up.  This is fine since
repop->obc is now an ObjectContextRef and can clean itself up.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:32:56 -07:00
Samuel Just
2cadc231ae osd_types,OpRequest: move osd_req_id into OpRequest
This way I can have OpRequest included from osd_types.h.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:31:08 -07:00
Samuel Just
9b003b327e OpRequest: move method implementations into cc
I need to remove the osd_types.h include.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:31:08 -07:00
Samuel Just
c4442d70ed ReplicatedPG: reset new_obs and new_snapset in execute_ctx
This way, if execute_ctx is rerun on the same OpContext, we
won't erroneously reuse a stale snapset/object_info.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-28 13:30:42 -07:00
huangjun
8a62bf1c04 fix the bug if we set pgp_num=-1 using "ceph osd pool set data|metadata|rbd -1"
will set the pgp_num to a hunge number.

   Signed-off-by: huangjun  <hjwsm1989@gmail.com>
(cherry picked from commit bf198e673f)
2013-10-28 13:29:50 -07:00
Greg Farnum
5eb836f23a ReplicatedPG: take and drop read locks when doing backfill
All our interfaces are in place, so now we can actually take and
drop the locks.
1) Take locks in ReplicatedPG::recover_backfill. This is the entry
into the backfill code path, and covers all objects which are
added to backfills_in_flight (via prep_backfill_object_push()). If we
can't get the lock right away, we stop the backfill movement there
until we can do so.
2) Drop the locks in ReplicatedPG::on_peer_recover(), called when the
push is completed.
2b) Further drop the locks on all backfills_in_flight objects in
_clear_recovery_state(), for when we cancel peering.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-27 10:40:32 -07:00
Greg Farnum
058c74ab23 PG: switch the start_recovery_ops interface to specify work to do as a param
We previously inferred whether there was useful work to be done
by looking at the number of ops started, but with the upcoming
introduction of the rw_manager read locking on backfill, we could
start no ops while still having work to do. Switch around the
interfaces to specify these as separate pieces of information.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-27 10:40:32 -07:00
Greg Farnum
87daef76cd ReplicatedPG: implement the RWTracker mechanisms for backfill read locking
We want backfill to take read locks on the objects it's pushing. Add
a get_backfill_read(hobject_t) function, a corresponding drop_backfill_read(),
and a backfill_waiting_on_read member in ObjState. Check that member when
getting a write lock, and in put_write(). Tell callers to requeue the recovery
if necessary, and clean up the backfill block when its read lock is dropped.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-27 10:40:32 -07:00
Greg Farnum
96ed5b8c38 ReplicatedPG: separate RWTracker's waitlist from getting locks
This way we can try and get locks which aren't associated with
an OpRequest.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-27 10:40:32 -07:00
Greg Farnum
f0f67507dd common: add an hobject_t::is_min() function
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-27 10:40:32 -07:00
Sage Weil
c2cd460950 Merge pull request #765 from ceph/wip-6635
mon: OSDMonitor: Make 'osd pool rename' idempotent

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-25 17:53:30 -07:00
Sage Weil
8282e24dd6 mon/OSDMonitor: make racing dup pool rename behave
If we get dup pool rename requests that are racing, make sure the second
one comes back with 'success' if the rename entry already exists in the
pending_inc map.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-25 17:45:06 -07:00
Joao Eduardo Luis
c14c98d3f0 mon: OSDMonitor: Make 'osd pool rename' idempotent
'ceph osd pool rename' takes two arguments: source pool and dest pool.
If by chance 'source pool' does not exist and 'destination pool' does,
then, in order to assure it's idempotent, we want to assume that if
'source pool' no longer exists is because it was already renamed.

However, while we will return success in such case, we want to make sure
to let the user know that we made such assumption.  Mostly to warn the
user of such a thing in case of a mistake on the user's part (say, the
user didn't notice that the source pool didn't exist, while the dest did),
but also to make sure that the user is not surprised by the command
returning success if the user expected an ENOENT or EEXIST.

Fixes: #6635

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-26 01:28:10 +01:00
Gregory Farnum
0f1fed6fe7 Merge pull request #769 from ceph/wip-copy-get
With this branch we make copy-get significantly easier to extend by applying our standard encode/decode stuff to it, instead of doing an inline encode-onto-the-payload. We also add some infrastructure for dealing with completion of RepGathers.

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-25 13:57:21 -07:00
Greg Farnum
aea985c142 Objecter: expose the copy-get()'ed object's category
In the OSD, store the category in the CopyOp using this.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:52:57 -07:00
Greg Farnum
06b5bf675a osd: add category to object_copy_data_t
We don't bump the encoding version -- and stick it in the middle --
since it's still brand-new. For simplicity, we encode it unconditionally
rather than trying to embed it alongside the attrs or with its own
"complete" flag in the cursor.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:52:56 -07:00
Greg Farnum
61f2e5d994 OSD: add back CEPH_OSD_OP_COPY_GET, and use it in the Objecter
This one is encoded with version information. We are not doing anything
to control which op gets sent by the client, but after discussion with
Sam we think this op isn't accessible enough to clients (right now it's
only triggered by a client sending copy-from, which can only happen via
ceph-test-rados) to require compatibility versioning.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:52:56 -07:00
Greg Farnum
15c8267e34 OSD: rename CEPH_OSD_OP_COPY_GET -> CEPH_OSD_OP_COPY_GET_CLASSIC
In order to introduce versioning of copy-get, we need to make it a
different op that has the versioning infrastructure from the start.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:52:56 -07:00
Greg Farnum
b75b7ad679 ReplicatedPG: copy: move the COPY_GET implementation into its own function
It was getting long, isn't terribly dependent on access to do_osd_ops()
state, and will be easier to make generic as its own function.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:52:56 -07:00
Greg Farnum
80f36963b7 osd: Add a new object_copy_data_t, and use it in the OSD/Objecter
Right now this is very primitive, but we're about to extend it to
deal with request versioning appropriately, and adding in some
extra fields.
Sadly we are doing a little extra copying in the Objecter as a result, but
too bad -- being able to do updates will be worth it.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:52:56 -07:00
Greg Farnum
808fa9ad39 ReplicatedPG: cache: don't handle cache if the obc is blocked
Right now the only way that can happen is if we're in the middle of a
promote!

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:36:45 -07:00
Greg Farnum
91b589fb1f ReplicatedPG: copy: add a C_KickBlockedObject
As the name says, you give it an obc and it kicks the block list
when finish()ed.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:36:45 -07:00
Greg Farnum
ade8f19650 ReplicatedPG: add a Context *ondone to RepGathers
Make a few changes to make sure we trigger it when appropriate. We'll use
this shortly for object promotion, and perhaps for other things in future.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:36:45 -07:00
Greg Farnum
b403ca80d9 ReplicatedPG: copy: rename CopyOp::version -> user_version
This version is a user version, and since we're in the OSD we
should call it such. (In particular, we may want to keep track
of the internal version too when doing cache promotes.)

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:36:44 -07:00
Greg Farnum
4e139fc318 ReplicatedPG: copy: do not let start_copy() return error codes
There's no failure it can actually run into, and handling error
codes in some of its callers is going to be a pain.
While we're here, document the parameters.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:36:44 -07:00
Greg Farnum
178f9a2a45 ObjectStore: add a bufferlist-based getattrs() function
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-25 13:36:44 -07:00
Sage Weil
4f7114a945 Merge branch 'wip-osd-fixes' into next
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-10-25 12:56:02 -07:00
Sage Weil
e17ff196a4 osd/osd_types: init SnapSet::seq in ctor
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-25 12:50:17 -07:00
Sage Weil
d2b661d0ef os/FileStore: fix getattr return value when using omap
The return value should be the length of the value, even when it was
stored in omap.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-25 12:49:57 -07:00
Sage Weil
3a469bb2ae os/ObjectStore: fix RMATTRS encoding
Apparently nobody uses this!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-25 12:49:51 -07:00
Samuel Just
847ea60592 PGLog::read_log: don't add items past backfill line to missing
Fixes: #6574
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-10-25 12:34:13 -07:00
Sage Weil
4be4abe932 Merge pull request #764 from ceph/wip-rbd-parent-info
rbd.py: increase parent name size limit

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-25 10:09:28 -07:00
Josh Durgin
3c0042cde5 rbd.py: increase parent name size limit
64 characters isn't all that long. 4096 ought to be enough for anyone.

Fixes: #6072
Backport: dumpling, cuttlefish
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-10-24 17:31:04 -07:00
Samuel Just
87d3f88742 PGMap::dirty_all should be asserting about osd_epochs, not in.osd_epochs
Fixes: #6627
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-24 16:44:04 -07:00