Commit Graph

20365 Commits

Author SHA1 Message Date
John Wilkins
8a89d40e6b doc: remove last reference to ceph-cookbooks.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-13 14:16:08 -07:00
John Wilkins
2011956745 doc: cookbooks issue resolved, so changed 'ceph-cookbooks' back to 'ceph.'
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-13 14:08:41 -07:00
Samuel Just
53600798f7 OSD: send_still_alive when we get a reply if we reported failure
When we get a ping reply, remove the peer from the failure_queue
and send a still alive message if the peer is in the failure_pending
map.

Otherwise, the monitor could slowly accumulate sporadic failure reports
leading to an osd being incorrectly marked out.

This bug may have been contributing to the wrongly-marked-down
thrashing observed on some systems.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-13 12:18:46 -07:00
Samuel Just
5924f8e4a8 PG: merge_log always use stats from authoritative replica
If the osd recieving the log has divergent entries, it will
also have a "divergent" stat structure.  In general, it suffices
to simply trust the stat structure shipped with the authoritative
log and info since merge_log is only used to merge an authoritative
log.

Probably fixes #2769.

In cases like #2769, this bug can result in a primary with a stat
structure which double counts an operation: once for the
divergent operation, and once for the replay.  It turned up
in a regression suite run as a scrub stat mismatch.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-13 10:19:24 -07:00
Josh Durgin
3dd65a897b qa: download tests from specified branch
These python tests aren't installed, so they need to be downloaded

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-13 09:42:27 -07:00
Samuel Just
bcfa573f5f ReplicatedPG: don't mark repop done until apply completes
Consider the following sequence:
1. issue, apply repop
2. replicas and primary commit
  Here, repop->waitfor_(ack|disk) are empty, so we mark
  repop->done and remove_repop.
3. interval change, repops still in queue are marked aborted
4. activate, last_update_applied = last_update
5. the repop from one enters apply_repop, is not aborted,
   and finds that last_update_applied has passed it by.

Fixes #2749

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-12 16:52:37 -07:00
Sage Weil
10ec5926c3 test_librbd: fix warnings
test/test_librbd.cc: In member function ‘virtual void LibRBD_TestClone_Test::TestBody()’:
warning: test/test_librbd.cc:1040:111: format ‘%ld’ expects argument of type ‘long int’, but argument 2 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat]
warning: test/test_librbd.cc:1040:111: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat]
warning: test/test_librbd.cc:1040:111: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘int64_t {aka long long int}’ [-Wformat]

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-12 16:14:41 -07:00
Samuel Just
5450567a67 ReplicatedPG,PG: dump recovery/backfill state on pg query
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-12 14:06:03 -07:00
Sage Weil
b133c49017 Merge remote-tracking branch 'gh/wip-2101' 2012-07-12 13:11:33 -07:00
Josh Durgin
508bf3fb96 rbd: enable layering when using the new format
We'll add options for different features later.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-12 11:27:32 -07:00
John Wilkins
dfe29aff7f doc: reverted file and role names.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-12 11:46:43 -07:00
Tommi Virtanen
f8478d4c56 upstart: Make ceph-osd always set the crush location.
This used to be conditional on config having osd_crush_location set,
but with that, minimal configuration left the OSD completely out of
the crush map, and prevented the OSD from starting properly.

Note: Ceph does not currently let this mechanism automatically move
hosts to another location in the CRUSH hierarchy. This means if you
let this run with defaults, setting osd_crush_location later will not
take effect. Set up your config file (or Chef environment) fully
before starting the OSDs the first time.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
2012-07-12 10:47:29 -07:00
Sage Weil
5ceb7c734a doc: fix config metavariables discussion
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-12 10:00:41 -07:00
Sage Weil
d1054df6be doc: perf counters
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-12 10:00:41 -07:00
John Wilkins
09c60b4399 doc: added :: to code example.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-12 09:00:19 -07:00
John Wilkins
ad8beeb407 doc: minor edits.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-12 08:55:15 -07:00
John Wilkins
63a1799853 doc: cookbook name change broke some things in doc. Fixed.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-12 08:47:47 -07:00
Yehuda Sadeh
cc8df29e19 rados tool: bulk objects removal
Issue #2776. Allow the removal of multiple objects in a single
rados tool command:

  # rados -p pool rm obj1 [obj2 [...]]

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-07-11 20:06:32 -07:00
Sage Weil
762a5b6383 Merge remote-tracking branch 'gh/wip-cct'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-07-11 19:59:32 -07:00
Sage Weil
f20b602296 Merge branch 'next'
Conflicts:
	src/rados.cc
2012-07-11 18:56:00 -07:00
Sage Weil
99a048d882 rados: more usage cleanup
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 18:54:30 -07:00
Dan Mick
0081c8e420 rados: usage message
Bad linebreaks, wrapping, stringification, missing doc for bench args

    Signed-off-by: Dan Mick <dan.mick@inktank.com>
    Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-07-11 18:53:35 -07:00
John Wilkins
0782db3694 doc: changed role file names as part of update to roles.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-11 17:35:38 -07:00
John Wilkins
e5997f4e11 doc: added DHO config.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-11 17:35:01 -07:00
Yehuda Sadeh
173d592a4e rados tool: remove -t param option for target pool
Bug #2772. This fixes an issue that was introduced when we
added the 'rados cp' command. The -t param was already used
for rados bench. With this change the only way to specify
a target pool is using --target-pool.
Though this problem is post argonaut, the 'rados cp' command
has been backported, so we need this fix there too.

Backport: argonaut

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-11 17:11:15 -07:00
Sage Weil
31c8dcc11c crush: sum and check quantized weights for bucket
Sum the quantized weights for each bucket, and check that for overflow.

This could change the results of a compile marginally if the map is using
non-divisible weight values that quantize funny.  The old code might
calculate a bucket sum that is not the actual sum of the quantized weights.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 16:36:47 -07:00
caleb miles
675a1b7bbc crush: Set maximum device/bucket weights.
Signed-off-by: caleb miles <caleb.miles@inktank.com>
2012-07-11 16:03:44 -07:00
caleb miles
c9fc5a2477 crush: prevent integer overflow on reweight
Disallow setting OSD weights to a value over 10,000 and cap bucket weight
at 10,000,000 in a CRUSH map. Addresses issue #2101.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
2012-07-11 16:03:13 -07:00
Dan Mick
d29ec1e24d rados: usage message
Bad linebreaks, wrapping, stringification, missing doc for bench args

    Signed-off-by: Dan Mick <dan.mick@inktank.com>
    Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-07-11 15:32:04 -07:00
Sage Weil
2c001b28fb Makefile: don't install crush headers
This is leftover from when we built a libcrush.so.  We can re-add when we
start doing that again.

Reported-by: Laszlo Boszormenyi <gcs@debian.hu>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 09:19:16 -07:00
Sage Weil
22d0648db2 librados: simplify cct refcounting
get() in ctor, put() in dtor.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 09:04:50 -07:00
Sage Weil
c5bcb04b9a lockdep: stop lockdep when its cct goes away
When a cct is destroyed, tell lockdep so that it can shut down if it needed
it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 08:58:22 -07:00
Sage Weil
7adc6c08f1 mon: simplify logmonitor check_subs; less noise
* simple helper to translate name to id
 * verify sub type is valid in caller
 * assert sub type is valid in method
 * simplify iterator usage

Among other things, this gets rid of this noise in the logs:

2012-07-10 20:51:42.617152 7facb23f1700  1 mon.a@1(peon).log v310 check_sub sub monmap not log type

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 21:27:50 -07:00
Sage Weil
fa96e19f4d Merge branch 'stable' into next 2012-07-10 18:21:29 -07:00
Sage Weil
0f917c2f14 osd: guard class call decoding
Backport: argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:21:06 -07:00
Sage Weil
0ff6c97983 test_stress_watch: just one librados instance
This was creating a new cluster connection/session per iteration, and
along with it a few service threads and sockets and so forth.

Unfortunately, librados leaks like a sieve, starting with CephContext
and ceph::crypto::init().  See #845 and #2067.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:21:00 -07:00
Samuel Just
ee1c029da4 ReplicatedPG: don't warn if backfill peer stats don't match
pinfo.stats might be wrong if we did log-based recovery on the
backfilled portion in addition to continuing backfill.

bug #2750

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-10 18:19:57 -07:00
Sage Weil
d3c97dae78 librados: take lock when signaling notify cond
When we are signaling the cond to indicate that a notify is complete,
take the appropriate lock.  This removes the possibility of a race
that loses our signal.  (That would be very difficult given that there
are network round trips involved, but this makes the lock/cond usage
"correct.")

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:18:28 -07:00
Sage Weil
ec490d878d client: fix locking for SafeCond users
Need to wait on flock, not client_lock.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:17:43 -07:00
Tommi Virtanen
38e2de3d20 doc: No ssh -t -t, forcing a pty allocation there makes it hang.
Earlier, this was a single -t, and that is overridden by the fact that
stdin is not a tty, so that did nothing.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
2012-07-10 16:13:48 -07:00
John Wilkins
79e3416cf6 doc: removed the ceph directory per tommi's update to the chef-cookbooks.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-10 16:03:05 -07:00
John Wilkins
5c84f01349 doc: Adding apt update message. VM users didn't get the package otherwise.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-07-10 15:23:56 -07:00
Dan Mick
83339a0cbb Merge branch 'wip-rbd-clone-dmick' into master
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-10 14:20:18 -07:00
Sage Weil
fe5c0cd9a9 osd: guard class call decoding
Backport: argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 14:03:06 -07:00
Dan Mick
2a6af20863 rbd: update manpage for clone command
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-07-10 14:00:06 -07:00
Dan Mick
e3531497d4 rbd: update cli test reference files
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-07-10 13:59:58 -07:00
Dan Mick
7b0c71cca4 librados: pool_get_name handles "not found" wrong
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-07-10 13:59:34 -07:00
Dan Mick
6ad5961043 rbd, librbd: add tests for cloning
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-07-10 13:59:34 -07:00
Dan Mick
64949d429d librbd, rbd, rbd.py: Add parent info reporting
split out new parent info into separate retrieval methods;
structure packing on rbd_image_info_t was becoming a problem.
Deprecate old parent fields in favor of new ones.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-07-10 13:59:34 -07:00
Dan Mick
a94fc8c81f rbd, librbd, rbd.py: cloning (copy-on-write child image of snapshot)
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-07-10 13:59:33 -07:00