Commit Graph

23343 Commits

Author SHA1 Message Date
Josh Durgin
8fea6dee76 rbd: add --pretty-format option
This is the same option the rados and radosgw-admin tool use for more
human-readable json/xml.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-16 13:14:49 -08:00
Josh Durgin
6934ac3f81 rbd: move Formatter construction to main
Each method that uses a formatter is doing the same thing.
Simplify by constructing and handling errors only once.
Also use a scoped_ptr for easy clean up.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-16 13:14:49 -08:00
Josh Durgin
98487b5622 rbd: fix long lines
Several >80 characters have crept in recently.
The older ones generally don't have very useful history,
so I'm not worried about obscuring the history any more.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-16 13:14:48 -08:00
Stratos Psomadakis
84c5d85764 rbd: support plain/json/xml output formatting
This patch renames the --format option to --image-format, for
specifying the RBD image format, and uses --format to specify the
output formatting (to be consistent with the other ceph tools). To
avoid breaking backwards compatibility with existing scripts, rbd will
still accept --format [1|2] for the image format, but will print a
warning message, noting its use is deprecated.

The rbd subcommands that support the new --format option are : ls, info, snap
list, children, showmapped, lock list.

Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-16 13:14:48 -08:00
Sage Weil
8e33a8b9e1 mon: note scrub errors in health summary
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-16 11:21:45 -08:00
Sage Weil
a586966a3c osd: fix rescrub after repair
We were rescrubbing if INCONSISTENT is set, but that is now persistent.
Add a new scrub_after_recovery flag that is reset on each peering interval
and set that when repair encounters errors.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-16 11:21:45 -08:00
Gary Lowell
476eb24be9 Merge branch 'wip-rpm-update'
Merges work around for odd AS_IF behaviour in configure.ac.
2013-01-16 11:17:11 -08:00
Danny Al-Gaaf
c1a86ab142 configure.ac: fix problem with --enable-cephfs-java
The AS_IF used to cover java related checks via --enable-cephfs-java
didn't work correctly. Use a plain 'if/fi' instead to make sure this
section is only executed if --enable-cephfs-java is used.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-01-16 09:41:13 -08:00
Sam Lang
1d50affc75 mds: fix usage typo for ceph-mds
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-16 09:43:58 -06:00
Sage Weil
2dc2b4808b mds: use #defines for bits per cap
Hard-coding 0xff in SimpleLock.h is too far away from where we add new cap
bits.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-15 22:43:42 -08:00
Gary Lowell
cf149c8c83 Merge branch 'wip-rpm-update'
Clean-up the handling of ceph java bindings in the rpm specfile and
configure.ac.
2013-01-15 12:41:54 -08:00
Sage Weil
d56af797f9 osd: note must_scrub* flags in PG operator<<
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 19:20:54 -08:00
Sage Weil
2baf1253ee osd: based INCONSISTENT pg state on persistent scrub errors
This makes the state persistent across PG peering and OSD restarts.

This has the side-effect that, on recovery, we rescrub any PGs marked
inconsistent.  This is new behavior!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 19:20:53 -08:00
Sage Weil
26a63df97b osd: fix scrub scheduling for 0.0
The initial value for pair<utime_t,pg_t> can match pg 0.0, preventing it
from being manually scrubbed.  Fix!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 19:20:53 -08:00
Sage Weil
389bed5d33 osd: note last_clean_scrub_stamp, last_scrub_errors
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:40 -08:00
Sage Weil
2475066c32 osd: add num_scrub_errors to object_stat_t
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:40 -08:00
Sage Weil
d738328488 osd: add last_clean_scrub_stamp to pg_stat_t, pg_history_t
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:40 -08:00
Sage Weil
6f6a41937f osd: fix object_stat_sum_t dump signedness
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:40 -08:00
Sage Weil
299548024a osd: change scrub min/max thresholds
The previous 'osd scrub min interval' was mostly meaningless and useless.
Meanwhile, the 'osd scrub max interval' would only trigger a scrub if the
load was sufficiently low; if it was high, the PG might *never* scrub.

Instead, make the 'min' what the max used to be.  If it has been more than
this many seconds, and the load is low, scrub.  And add an additional
condition that if it has been more than the max threshold, scrub the PG
no matter what--regardless of the load.

Note that this does not change the default scrub interval for less-loaded
clusters, but it *does* change the meaning of existing config options.

Fixes: #3786
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:40 -08:00
Sage Weil
16d67c798b osd/PG: remove useless osd_scrub_min_interval check
This was already a no-op: we don't call PG::scrub_sched() unless it has
been osd_scrub_max_interval seconds since we last scrubbed.  Unless we
explicitly requested in, in which case we don't want this check anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:40 -08:00
Sage Weil
a148120776 osd: move scrub schedule random backoff to seperate helper
Separate this from the load check, which will soon vary dependon on the
PG.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:39 -08:00
Sage Weil
62ee6e099a osd/PG: trigger scrub via scrub schedule, must_ flags
When a scrub is requested, flag it and move it to the front of the
scrub schedule instead of immediately queuing it.  This avoids
bypassing the scrub reservation framework, which can lead to a heavier
impact on performance.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:39 -08:00
Sage Weil
1441095d6b osd/PG: introduce flags to indicate explicitly requested scrubs
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:39 -08:00
Sage Weil
796907e215 osd/PG: move scrub schedule registration into a helper
Simplifies callers, and will let us easily modify the decision of when
to schedule the PG for scrub.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-14 18:24:39 -08:00
Gary Lowell
be0c4b34bc ac_prog_javah.m4: Use AC_CANONICAL_TARGET instead of AC_CANONICAL_SYSTEM. 2013-01-14 14:11:54 -08:00
Noah Watkins
e182c1fd31 Merge branch 'wip-java-sync'
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Joe Buck <jbbuck@gmail.com>
2013-01-14 13:23:49 -08:00
Noah Watkins
13cb196ea7 java: add fine grained synchronization
Adds r/w lock to protect against some races.

1. Mutual exclusion for mount/unmount prevents races between the two in
libcephfs, which isn't safe (access to ceph_mount_info state).

2. An extremely narrow race between unmount and ceph_* calls in
libcephfs. ThreadA calls ceph_xxx, is_mounted test passes, then ThreadB
calls unmount and destroys the client. ThreadA resumes with a bad client
pointer.

3. Race between unmount and ceph_* calls in JNI. In JNI we hold the
CephContext reference across ceph_* calls. If the ceph mount were to be
released while a thread was returning from a ceph_* call then an attempt
to write to the log (e.g. the return value) would reference bad context.
Since ceph_release is only called by finalize() then no thread can be in
JNI.  So this is actually safe.

Using r/w here provides trade-off between allowing concurrency into
libcephfs, and not having to constantly update the Java bindings. The
only assumption is that unmount/mount race with the rest of the
interface.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2013-01-14 13:11:09 -08:00
Noah Watkins
85c1035754 java: remove all intrinsic locks
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2013-01-14 13:11:09 -08:00
Noah Watkins
2b9da45d98 java: remove unnecessary synchronization
The body of ceph_unmount is a call to a synchronized method.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2013-01-14 13:11:09 -08:00
Noah Watkins
fb8a488e22 java: remove create/release synchronization
The constructor calls create, and finalize() calls release. Since each
of these can only happen once (enforced by Java), there is no fear of a
race condition.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2013-01-14 13:11:09 -08:00
Sage Weil
017b6d63db Revert "osdmap: spread replicas across hosts with default crush map"
This reverts commit 7ea5d84fa3.

This breaks teuthology and vstart both in its current state.
2013-01-14 07:37:59 -08:00
Joao Eduardo Luis
410906e049 mon: OSDMonitor: don't output to stdout in plain text if json is specified
Fixes: #3748

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-12 23:04:20 -08:00
Sage Weil
7ea5d84fa3 osdmap: spread replicas across hosts with default crush map
This is more often the case than not, and we don't have a good way to
magically know what size of cluster the user will be creating.  Better to
err on the side of doing the right thing for more people.

Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-01-11 17:10:24 -08:00
Joao Eduardo Luis
3610e72e4f mon: OSDMonitor: only share osdmap with up OSDs
Try to share the map with a randomly picked OSD; if the picked monitor is
not 'up', then try to find the nearest 'up' OSD in the map by doing a
backward and a forward linear search on the map -- this would be O(n) in
the worst case scenario, as we only do a single iteration starting on the
picked position, incrementing and decrementing two different iterators
until we find an appropriate OSD or we exhaust the map.

Fixes: #3629
Backport: bobtail

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-12 01:09:01 +00:00
Dan Mick
1f721804df rbd: Fix tabs
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-01-11 16:25:29 -08:00
John Wilkins
34138993eb doc: Updates to CRUSH paper.
fixes: 3329, 3707, 3711, 3389

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-11 15:56:02 -08:00
Dan Mick
e94b06a192 rbd: make 'add' modprobe rbd so it has a chance of success
Check for existence of /sys/bus/rbd first to avoid unnecessary calls

Fixes: #3784
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-01-11 14:28:50 -08:00
Dan Mick
15bb00cafc rbd: call udevadm settle on map/unmap
When we map/unmap devices, udev gets called to manage device nodes;
this will allow the command to wait for those manipulations to complete,
particularly for test runs, so that the device tree is stable by the
time the command exits.

--no-settle is also provided to avoid this behavior if desired (say,
for a series of 'map' commands, perhaps the user wants to wait for
settling only on the last of the series).

Fixes: #3635
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-01-11 14:28:50 -08:00
Samuel Just
66eb93b836 OSD: only trim up to the oldest map still in use by a pg
map_cache.cached_lb() provides us with a lower bound across
all pgs for in-use osdmaps.  We cannot trim past this since
those maps are still in use.

backport: bobtail
Fixes: #3770
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-11 12:17:10 -08:00
Samuel Just
8cf79f252a OSD: check for empty command in do_command
Fixes: #3878
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-01-11 12:15:53 -08:00
John Wilkins
3e1472955b Merge pull request #32 from imjustmatthew/imjustmatthew_docs
Correct typo in mon docs 'ceph.com' to 'ceph.conf'
2013-01-11 12:09:25 -08:00
Matthew Roy
0f161f1e59 Correct typo in mon docs 'ceph.com' to 'ceph.conf' 2013-01-11 14:59:53 -05:00
Alex Elder
aeb02061de qa/run_xfstests.sh: use cloned xfstests repository
Use our own copy of the xfstests repository rather than hitting
the upstream one repeatedly.

Signed-off-by: Alex Elder <elder@inktank.com>
2013-01-11 12:49:36 -06:00
Joao Eduardo Luis
8d0fa15e6a mon: Monitor: only schedule a timecheck after election if we are not alone
Fixes: #3790

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-11 10:15:20 -08:00
Sage Weil
310112f702 Merge remote-tracking branch 'gh/wip-3633'
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-10 18:05:27 -08:00
Joao Eduardo Luis
58e03ecb89 mon: Monitor: unify 'ceph health' and 'ceph status'; add json output
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-01-11 00:44:21 +00:00
Joao Eduardo Luis
bc57c7a9f8 mon: Monitor: use 'else if' on handle_command instead of bunches of 'if'
... when the options are mutually exclusive.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-01-11 00:44:21 +00:00
Joao Eduardo Luis
7a7fff5725 mon: Monitor: move a couple of if's together on handle_command()
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-01-11 00:44:21 +00:00
Joao Eduardo Luis
ff1c254b82 mon: Monitor: reduce indentation level; make code more readable
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-01-11 00:44:21 +00:00
Joao Eduardo Luis
684d4ba242 mon: Monitor: add timecheck infrastructure to detect clock skews
Fixes: #3633
Fixes: #3695

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-11 00:44:21 +00:00