Commit Graph

19259 Commits

Author SHA1 Message Date
Sage Weil
2a22ff4385 mon: thrash pg_temp mapping, too
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-26 16:50:03 -07:00
Joao Eduardo Luis
6910d83897 filestore: fix a journal replay issue with collection_add()
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 16:36:03 -07:00
Joao Eduardo Luis
96108c657c filestore: fix a journal replay issue with collection_add()
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-27 00:31:55 +01:00
Sage Weil
ead5d2a813 osd: filter osds removed from probe set from peer_info_requested
Peef_info_requested should be a strict subset of the probe set.  Filter
osds that are dropped from probe from peer_info_requested.  We could also
restart peering from scratch here, but this is less expensive, because we
don't have to re-probe everyone.

Once we adjust the probe and peer_info_requested sets, (re)check if we're
done: we may have been blocedk on a previous peer_info_requested entry.

The situation I saw was:

  "recovery_state": [
        { "name": "Started\/Primary\/Peering\/GetInfo",
          "enter_time": "2012-04-25 14:39:56.905748",
          "requested_info_from": [
                { "osd": 193}]},
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2012-04-25 14:39:56.905748",
          "probing_osds": [
                79,
                191,
                195],
          "down_osds_we_would_probe": [],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2012-04-25 14:39:56.905742"}]}

Once in this state, cycling osd.193 doesn't help, because the prior_set
is not affected.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-26 16:03:10 -07:00
Samuel Just
201bace5db Merge branch 'next' 2012-04-26 15:53:27 -07:00
Samuel Just
3e880174dd PG: get_infos() should not post GotInfo
The MNotifyRec handler also posts GotInfo under the same conditions
after calling get_infos().

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-26 15:45:35 -07:00
Samuel Just
7fe45fd65d Revert "PG: whitelist MNotifyRec in started"
This reverts commit 9579365720.
2012-04-26 15:38:42 -07:00
Josh Durgin
cbe795a7fe test_librbd: rollback when mapped to a snapshot should fail
Rollback is effectively a write, and returns -EROFS when mapped to a
snapshot since 3ef3ab8a15.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-26 12:41:13 -07:00
Joao Eduardo Luis
f873a771ee workload_generator: get rid of our lock.
We don't need the lock in the WorkloadGenerator class. Everything that does
need a lock is handled by TestFileStoreState, and all that remains can be
handled by an atomic_t.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 20:19:59 +01:00
Joao Eduardo Luis
436f5d656c TestFileStoreState: make 'm_in_flight' var an atomic_t.
This allows us to increase, decrease and retrieve its value without the
need to lock the class.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 20:18:28 +01:00
Samuel Just
841cfcfd0d Merge branch 'next' 2012-04-26 10:51:34 -07:00
Samuel Just
9579365720 PG: whitelist MNotifyRec in started
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-26 10:51:08 -07:00
Samuel Just
be9b38ec9c RefCountedObject: fix constructor warning
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-26 10:38:45 -07:00
Joao Eduardo Luis
35dc2dea58 workload_generator: specify number of ops to run, or 0 to run forever.
New option '--test-num-ops VAL' -- if (VAL == 0) then run forever; fi

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 17:00:45 +01:00
Joao Eduardo Luis
823afcdfb7 workload_generator: Delegate store tracking to TestFileStoreState.
We had a lot of duplicate code between the WorkloadGenerator and the
TestFileStoreState classes, and the last one is far more versatile than
what we initially had in the WorkloadGenerator. Therefore, delegate
everything we can to the TestFileStoreState class.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 16:58:38 +01:00
Joao Eduardo Luis
3903b5a666 TestFileStoreState: Fix issues affecting proper behavior when inherited.
Fix wait_for_ready() and make the C_OnFinished class' member variables
protected instead of private (to allow proper inheritance).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 16:29:50 +01:00
Joao Eduardo Luis
22ade4ae4b Makefile.am: test_filestore_workloadgen doesn't need gtests lib.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-26 16:26:43 +01:00
Yehuda Sadeh
81a248a11c Merge branch 'wip-2342' 2012-04-25 15:23:34 -07:00
Yehuda Sadeh
39f9935416 RefCountedObject: relocate from msg/Message.h to common/RefCountedObj.h
Following a popular request.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-04-25 15:22:52 -07:00
Yehuda Sadeh
70f70d803a librados: call notification under different thread context
This fixes #2342. We shouldn't call notify on the dispatcher
context. We should also make sure that we don't hold
the client lock while waiting for the responses.
Also, pushed the client_lock locking into the
ctx->notify().

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-04-25 14:14:29 -07:00
Sage Weil
8bc818c4e3 mon: 'osd thrash <num epochs>'
Thrash the osdmap for N iterations.  Randomly mark OSDs up, down, in, out,
and up_thru in order to generate a difficult osdmap history for peering
to chew through.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-25 13:43:41 -07:00
Sage Weil
fa9847986e osd: filter osds removed from probe set from peer_info_requested
Peef_info_requested should be a strict subset of the probe set.  Filter
osds that are dropped from probe from peer_info_requested.  We could also
restart peering from scratch here, but this is less expensive, because we
don't have to re-probe everyone.

Once we adjust the probe and peer_info_requested sets, (re)check if we're
done: we may have been blocedk on a previous peer_info_requested entry.

The situation I saw was:

  "recovery_state": [
        { "name": "Started\/Primary\/Peering\/GetInfo",
          "enter_time": "2012-04-25 14:39:56.905748",
          "requested_info_from": [
                { "osd": 193}]},
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2012-04-25 14:39:56.905748",
          "probing_osds": [
                79,
                191,
                195],
          "down_osds_we_would_probe": [],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2012-04-25 14:39:56.905742"}]}

Once in this state, cycling osd.193 doesn't help, because the prior_set
is not affected.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-25 13:07:34 -07:00
Sage Weil
f022a94956 mon: add 'mon osd min up ratio' and 'mon osd min in ratio'
Prevent the monitor from marking osds down or out when too many are already
in that state.  At this point the cluster is already broken and there is
little point in continuing to mark things down/out.

Setting these to 0 obviously disables the feature (by setting a minimum
of 0).

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-25 11:15:37 -07:00
Sage Weil
ba1d3b1da0 mon: use can_mark_*() helpers
So we can generalize beyond NO* flags.  We'll soon be adding other reasons
to not mark things up/down/in/out.  This lets us keep all though checks in
one place.

The helper methods will tell us why we can't do the thing (e.g., "NODOWN
flag is set").  The callers will generally tell us exactly what didn't
happen (e.g., "failure report of X ignored").

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-25 11:15:34 -07:00
Joao Eduardo Luis
75ccd8172c DeterministicOpSequence: add 'ceph_asserts()' where we expect != NULL.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-25 16:27:41 +01:00
Joao Eduardo Luis
196640207b TestFileStoreState: distinguish between 'get_coll()' and 'get_coll_at()'
get_coll_at(int pos) should return the collection at the map's position
'pos', but 'pos' was being used as a map key. Therefore, we add a new
function 'get_coll(int key)' to mimic this behavior, and we make
'get_coll_at()' follow its intended behavior.

This patch may affect the test_filestore_idempotent_sequence tester, since
it uses the 'get_coll_at()' function a lot, and we changed this function's
behavior.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-25 16:26:32 +01:00
Joao Eduardo Luis
44dafc8702 run_seed_to.sh: Add valgrind support.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-25 15:35:03 +01:00
Joao Eduardo Luis
4430c01cfc TestFileStoreState: free memory on terminus.
So far, it hasn't triggered any segfault, but I'm not yet convinced there
is no problem whatsoever.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-25 15:34:39 +01:00
Greg Farnum
4bfcbe6ab8 mon: decode old PGMap Incrementals differently from new ones
We need to distinguish between the old 0 (meaning undefined) and
the new 0 (meaning switch to 0 and disable the flags). So rev the
encoding version on PGMap::Incremental, and if you decode an old
version with [near]full_ratio == 0, set the ratio to -1 instead. Then
when applying the Incremental interpret -1 as no change.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-04-24 16:44:23 -07:00
Sage Weil
59957da257 mon: do not mark osds out if NOOUT flag is set
Do not mark down osds out when NOOUT flag is set.  This is more or less
equivalent to setting a very long 'mon osd down out interval', but
reversible and less annoying.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 15:47:40 -07:00
Sage Weil
2673875f0c mon: do not mark booting osds in if NOIN flag is set
If the NOIN osdmap flag is set, do not mark booting osds in.  Normally
we would for a range of reasons (always, new, auto-marked-out), but block
them all.

Do not limit manual 'ceph osd in N' commands.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 15:47:40 -07:00
Sage Weil
9ff535ad2c mon: always remove booting osds from down_pending_out
The down_pending_out tracks OSDs that are down that we may want to
auto-mark out.  If an osd boots, it should be removed from this list
because it is no longer down; it doesn't matter whether it is marked in
or not.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 15:28:36 -07:00
Sage Weil
addfb2c670 mon: prevent osd mark-down with NODOWN flag
If the NODOWN osdmap flag is set,

 - ignore osd failure reports
 - do not mark osds down due to lack of osd/pg stats

We *do* still allow explicit admin 'ceph osd down N' commands, and a
booting OSD to mark the previous instance of itself down.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 14:31:42 -07:00
Sage Weil
f0773863ce osd: do not attempt to boot if NOUP
If NOUP is set, do not send the boot message.

We already send onetime subscriptions to the osdmap, so we will find out
about osdmap flag changes.  If it is cleared later, we'll pass into
start_boot() and _got_boot_version() again and send it then.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 14:31:42 -07:00
Sage Weil
d84255ecba mon: prevent osd from booting if NOUP
Do not add an osd attempting to boot to the map if NOUP is sent.  Instead,
send it the latest osdmap so it knows that it's not allowed to boot.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 14:16:42 -07:00
Sage Weil
d3f55dd917 mon: 'osd {set,unset} {noin,noout,noup,nodown}'
Move the set/unset flag code into a helper, and also use that for the
pause/unpause commands.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 14:10:47 -07:00
Sage Weil
6003325d24 osdmap: add NOUP, NODOWN, NOIN, NOOUT flags
These prevent OSDs from being marked up, down, in, or out, respectively.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 14:10:46 -07:00
Josh Durgin
34ef3f3765 Merge remote branch 'origin/wip-rbd-snapid' into next
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 13:59:09 -07:00
Sage Weil
e51772ca19 librbd: pass errors removing head back to user
In particular, the OSD may return EBUSY if there are still watchers.
Ignore ENOENT, as that may indicate we are cleaning up a previously
aborted removal.

Fixes: #2311
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 13:39:38 -07:00
Sage Weil
ccf7d9309e mon: clean up handle_osd_timeouts a bit
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 10:55:22 -07:00
Sage Weil
36ffed45e8 mon: fix pg stats timeout
We clear out the osd entry when an osd goes up or down.  Thus, if we find
it missing from an up osd, we should start the timer.  Otherwise we get
behavior like this

2012-04-24 13:22:47.888291 7fa5bc587700 mon.peon5752@0(leader).osd e21633 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:22:50.076394 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:22:52.903558 7fa5bc587700 mon.peon5752@0(leader).osd e21638 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:15.144532 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:17.967118 7fa5bc587700 mon.peon5752@0(leader).osd e21663 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:22.173778 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:22.981556 7fa5bc587700 mon.peon5752@0(leader).osd e21668 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:45.245380 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot

when the pg stats message doesn't arrive quickly enough.

Fixes: #2341
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-04-24 10:55:18 -07:00
Sage Weil
7b832f4266 mon: fix whitespace
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 10:49:30 -07:00
Greg Farnum
2b302015e7 mon: fix pgmonitor ratio commands
The indices were set incorrectly when I whipped thi sup. That's what
you get for not testing nor being careful enough in review.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-04-24 10:35:32 -07:00
Josh Durgin
d28f850fe1 test_rbd: add tests for snap_set and more complicated resizing
* snap_set to a deleted (and recreated) snapshot
* resizing down (truncating) and back up
* resizing to non-object-aligned sizes

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Josh Durgin
7add136f90 librbd: reset needs_refresh flag before re-reading header
This way we can't miss an update if we get a notify during ictx_refresh.
Specifically, a race like this:

Thread 1               Thread 2              Process 2

ictx_refresh()
read_header()
                                             snap_create()
                       notify()
                       need_refresh = true
process header...
need_refresh = false

If this happened, we would not re-read the header with the new
snapshot, so the snapshot would not happen at the intended point
in time, but only after we re-read the header again.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Josh Durgin
3ef3ab8a15 librbd: clean up snapshot handling a bit
* snapid should determine whether our mapped snapshot is gone, not snapname
* snap_set(<nonexistent_snap>) shouldn't reset us to CEPH_NOSNAP
* snapname should be set before using the it in the perfcounter name
* snapname and image name don't need to be passed as arguments since an
  ImageCtx already contains that info
* ictx_check() doesn't need to check for non-existent snaps - only I/Os care,
  so check in check_io() instead

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Josh Durgin
e17b5a85be librbd: clarify handle_sparse_read condition
The earlier condition is >. != means < at this point, and the nesting
is unnecessary.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Sage Weil
2bdaba4f01 run_seed_to.sh: rework the script, make it more flexible and broaden the tests.
Allow for '-h' and other options such as disabling the journal sync tests,
defining it is to be run on a btrfs FS, enabling exit on error (default is
now 'off'), and allow certain env variables to specify additional options
to each store.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-23 20:31:02 -07:00
Sage Weil
e65b797164 librbd: rev version for discard addition
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 19:43:25 -07:00
Sage Weil
19ba34753e osdmaptool: fix clitests for lack of localized pgs
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-23 14:48:02 -07:00