Commit Graph

17453 Commits

Author SHA1 Message Date
Samuel Just
fd3231c6a3 Merge remote branch 'upstream/wip-backfill-ordering' into wip-backfill 2011-12-20 17:05:29 -08:00
Samuel Just
7eb287308c PG: add some documentation
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-20 16:52:38 -08:00
Samuel Just
ffd1b437ed ReplicatedPG: delay op while snapdir is missing/degraded
We cannot get/update a snapcontext if snapdir is missing/degraded.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-20 16:50:35 -08:00
Samuel Just
45b9659fea ReplicatedPG: don't manage waiting_on_backfill in start/finish_recovery_op
Set waiting_on_backfill in recover_backfill and clear in do_scan.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-20 15:28:11 -08:00
Samuel Just
5e9d1019a8 ReplicatedPG: apply_repop: apply local_t before op_t
We create snap_collections in local_t and clone into them in op_t.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-20 10:53:36 -08:00
Samuel Just
a798a85fc2 PG: Do not update_snap_collections for log entries > last_backfill
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-19 16:26:56 -08:00
Samuel Just
2401176beb PG: Fix stat debug output
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-19 16:26:31 -08:00
Samuel Just
1362d3e10d calc_acting: Prefer up[0] as primary if possible
Previously, we could get into a state where although up[0] has been
fully backfilled, acting[0] could be selected as a primary if it is able
to pull another peer into the acting set.  This also collects the logic
of choosing the best info into a helper function.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-19 16:23:48 -08:00
Samuel Just
b5c3259012 ReplicatedPG: fix backfill mismatch error output
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-15 13:15:50 -08:00
Samuel Just
41f64be039 ReplicatedPG: calc_clone_subsets fix other clone_overlap case
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-15 13:15:50 -08:00
Samuel Just
5b41c470db OSD: use disk_tp.pause() without osd_lock
Previously, we called disk_tp.pause_new().  This can cause a race
where snap_trimmer queues more transactions after we flush the
store.  Calling disk_tp.pause() under the osd_lock causes a
deadlock with pg removal.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-15 13:15:45 -08:00
Sage Weil
7a7aab259c osd: wait for src_oid if it on other side of last_backfill from oid
If the target object is before last_backfill, then the backfill_target
will be asked to apply the operation.  If one of the src objects is past
last_backfill, that will fail, so we need to wait for the src object to
be not degraded.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-14 19:08:27 -08:00
Sage Weil
ca2e8e5a25 osd: EINVAL on mismatched locator without waiting for degraded
No reason to recover before returning an error.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-14 19:08:27 -08:00
Sage Weil
0c5470465d osd: preserve write order when waiting on src_oids
We need to preserve the order of write operations on each object.  If we
have a write on X that needs to read from Y, and Y is degraded, then we
need to wait for Y to repair.  Doing so blindly will allow other writes
to X to proceed while our clone op is still waiting, violating the
ordering.

Fix this by adding blocked_by and blocking vars to the ObjectContext.  If
we wait on a src_oid, the oid is "blocked" by that object, and any
subsequent writes should also wait on the same queue.

Use a helper to do the cleanup when we complete recovery, or when the
pg resets.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-14 19:08:27 -08:00
Samuel Just
62c830f0ac ReplicatedPG: add_object_context_to_pg_stat, obc->ssc may be null
obc->ssc is not necessarily filled in by get_object_context.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 15:51:17 -08:00
Samuel Just
cda5f0d3f4 PG: clear waiting_on_backfill during clear_recovery_state
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 15:19:15 -08:00
Samuel Just
d32fd8c521 ReplicatedPG: list snapid 0 on collection_list_partial for backfill
0 will list all objects, CEPH_NO_SNAP will list only head objects.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 15:17:44 -08:00
Samuel Just
d9d05117e2 Merge remote branch 'upstream/master' into wip_backfill_merged 2011-12-14 11:40:15 -08:00
Samuel Just
07b3ba813e ReplicatedPG: collection_list_partial also takes a snapid
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:39:21 -08:00
Samuel Just
7213c4577d PG: Ask for digest at most once at a time
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:32:22 -08:00
Sage Weil
940a55e0ef osd: track backfill target pg stats
Maintain backfill target pg stats to be the summation over objects to
the left of last_backfill.  Reflect this in the degraded stats we report
to the monitor.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:22 -08:00
Sage Weil
b9eea7090e osd: object_stat_sum_t::clear()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:22 -08:00
Samuel Just
7832e17e51 PG: activate, backfill replica can have last_complete < log_tail
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:32:22 -08:00
Samuel Just
51deeef67a ReplicatedPG: calc_*_subsets must consider last_backfill
Objects yet to be backfilled do not show up in the missing set.  Thus,
we cannot use an object past last_backfill to clone into the object we
are pushing/pulling.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:32:22 -08:00
Samuel Just
9d633a4f1f PG: A backfill osd can have last_complete < log_tail
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:32:22 -08:00
Samuel Just
f483df1589 PG: there may now be backfill entries in the acting set
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:32:21 -08:00
Samuel Just
999846f7e9 PG: fix phantom entry in peer_info
In GetLog, do not call pg->peer_info[newest_update_osd] if
newest_update_osd is osd->whoami.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
71893b0e4d osd: remove bad !is_incomplete() assert
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
57baf9efa9 osd: fix signed/unsigned comp
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
f1caaa37dd osd: fix calc_acting()
Look at usable, not want.size(), so we don't count backfill targets.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
f83a787ea9 osd: some recover_backfill() comments
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
cd0c8fb324 osd: add incomplete, backfill states; simplify calculation
Set/clear states in peering state machine state ctor/dtors where possible.

Set degraded if the number of non-backfilling replicas is lower than the
target replication factor.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
af7536d0aa hobject_t: fix hobject(sobject_t) constructor
Initialize max

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
e1006d76c6 osd: more backfill changes
Always ship log for updates to backfill targets to preserve the repgather
ordering.

Fix up recover_backfill() bounds.  Re-scan the local collect on every pass
in case there were concurrent modifications.  (This could be optimized.)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
9bb77b494c osd: observe last_backfill in merge_log() and helpers
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:21 -08:00
Sage Weil
f1ae9ed55f objectstore: make list by hash *next > instead of >=
This means we should set it to a hash boundary or the last item of our
result set (not the next item we didn't include).

It means that during backfill we can set our last_backfill to the last
object we did recover and be sure that any new files locally will be
included in the next result set, and we can bound that result set by that
last object recovered and not include it in the resulting range.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:32:19 -08:00
Sage Weil
c03c49ca94 osd: initialize repop gather set in issue_repop instead of new_repop
Simpler.  It will also make the last_backfill correction live in one
place.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:34 -08:00
Sage Weil
baa21c9bfa osd: implement PG::copy_range()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:34 -08:00
Sage Weil
de19a6bb41 Revert "osd: don't keep push state on replicas"
This reverts commit 69c77e33f8530993dbc280525bd21218ea6f9ddb.

sub_op_pull() calls send_push_op directly, does not pass push_start().
2011-12-14 11:31:34 -08:00
Sage Weil
b99e135848 osd: make backfill (basically) work again
Still need to handle concurrent updates, log recovery vs backfill, etc.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:34 -08:00
Sage Weil
88ee86d0bf osd: keep backfill targets in acting set
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:34 -08:00
Sage Weil
9288f0e0f3 osd: advance last_backfill by keys only
This ensures that transactions are never split by last_backfill.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:34 -08:00
Sage Weil
f7a0b9c5c5 hobject_t: fix sorting by hash key
Use get_effective_key() to return key (if explicit) or object name.  Sort
by that within each hash value.

Clean up operator<< so that it prints things in sort order.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:34 -08:00
Sage Weil
91ee3375f0 osd: osd_kill_backfill_at
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00
Sage Weil
400c27da9d osd: track backfill with last_backfill, not interval_set<>
We always fill from the bottom up anyway.  Using an hobject_t also gives us
a precise bound.  It also makes things conceptually simpler: last_complete
and last_backfill bounding each of the two dimensions of updatedness.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00
Sage Weil
0e7f4affa5 osd: pg whitespace
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00
Sage Weil
b5de19b51c osd: kill unused PG::trim_write_ahead
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00
Sage Weil
e63c595a33 osd: kill unused PG::Log::copy_after_unless_divergent
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00
Sage Weil
693950bfb3 osd: cleanup lingering backlog refs
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00
Sage Weil
d84a9f6f2e osd: cleanup
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-12-14 11:31:33 -08:00