Commit Graph

18485 Commits

Author SHA1 Message Date
Sage Weil
25cceca0a4 doc: slow osd requests
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
75ad8979e7 doc: diagnose full osd cluster
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
956e2e2274 mon: list nearfull/full osd detail
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
2bec51a21e doc: describe 'stuck' states we check for
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
d72b821741 doc: document some osd failure recovery scenarios
- simple osd failure
- ceph health [detail]
- peering failure ('down') state
- unfound objects

Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
2b87d4f29f osd: list might_have_unfound locations in query result
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
2822fe506d mon: include unfound count in health detail
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:29 -08:00
Sage Weil
8b0bd12796 mon: refactor health, include optional detail
'ceph health' to get the usual summary, 'ceph health detail' to
additionally get a comprehensive list of problems found.

Eventually we can format this as yaml, json, whatever, too.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 17:05:22 -08:00
Samuel Just
98f8219dd3 Merge branch 'wip_omap'
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-06 11:46:47 -08:00
Samuel Just
b6c2e839a4 test_rados_api_aio: add omap
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
4c4fcea323 osd: testing for tmap auto upgrade
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
adace1cf98 ReplicatedPG: transparently upgrade TMAP
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
2abf37762b RadosModel: Add omap operations to RadosModel
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
8228798646 ReplicatedPG: Add omap ops to ReplicatedPG
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
81c22dfbd2 librados: Added omap operations to librados
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
d2bf68d1df osdc: Add omap operation stubs to Objecter::ObjectOperation
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
b85f7d7b13 ReplicatedPG: add omap_header to recovery
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Samuel Just
d8dcb28e50 librados: add tmap_put to ObjectWriteOperation
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 11:44:36 -08:00
Sage Weil
b52d408758 Merge branch 'wip-1796'
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-06 11:03:01 -08:00
Sage Weil
195301ef97 mds: respawn when blacklisted
If we are blacklisted by the OSD cluster, it's because we were too slow
and were replaced by another ceph-mds.  Respawn and re-register as a
standby.

If we get some other write error, shut down.

Fixes: #1796
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-06 11:02:17 -08:00
Sage Weil
769ef369db journaler: add generic write error handler
Specify a generic callback for any write error the journaler encounters.
This is more helpful than passing up write errors to specific callers
because

 - there are several of them
 - journaler initiates writes on its own (like the head)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-06 11:02:17 -08:00
Sage Weil
50682189d9 Merge remote-tracking branch 'gh/wip-2105'
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-06 10:49:18 -08:00
Sage Weil
86186405bc .gitignore: src/ocf/rbd
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 10:24:04 -08:00
Sage Weil
e3b4ba99cc filestore: create snap_0 on mkfs
If we create a new filestore, apply one transaction, and then crash, we
want to make sure roll back to a consistent reference point--empty.  The
simplest solution is to create that snap_0 during mkfs.  This avoids
strangeness like

2012-02-27 00:42:00.336703 7fb1381ef780 filestore(/ceph/osd.0) mkfs in /ceph/osd.0
2012-02-27 00:42:00.341399 7fb1381ef780 journal _open /ceph/osd.0.journal fd 10: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2012-02-27 00:42:00.349705 7fb1381ef780 filestore(/ceph/osd.0) mkjournal created journal on /ceph/osd.0.journal
2012-02-27 00:42:00.349728 7fb1381ef780 filestore(/ceph/osd.0) mkfs done in /ceph/osd.0
2012-02-27 00:42:00.349787 7fb1381ef780 filestore(/ceph/osd.0) mount FIEMAP ioctl is NOT supported
2012-02-27 00:42:00.349800 7fb1381ef780 filestore(/ceph/osd.0) mount detected btrfs
2012-02-27 00:42:00.349813 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs CLONE_RANGE ioctl is supported
2012-02-27 00:42:00.357023 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs SNAP_CREATE is supported
2012-02-27 00:42:00.405174 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs SNAP_DESTROY is supported
2012-02-27 00:42:00.405214 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs START_SYNC got (25) Inappropriate ioctl for device
2012-02-27 00:42:00.405228 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs START_SYNC is NOT supported: (25) Inappropriate ioctl for device
2012-02-27 00:42:00.405235 7fb1381ef780 filestore(/ceph/osd.0) mount WARNING: btrfs snaps enabled, but no SNAP_CREATE_V2 ioctl (from kernel 2.6.37+)
2012-02-27 00:42:00.405561 7fb1381ef780 filestore(/ceph/osd.0) mount found snaps <>
2012-02-27 00:42:00.405576 7fb1381ef780 filestore(/ceph/osd.0) mount WARNING: no consistent snaps found, store may be in inconsistent state

and subsequent badness if we fail before a proper commit is made.

Fixes: #2105
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-06 09:19:32 -08:00
Sage Weil
a14d44fca2 filestore: drop useless read_op_seq() arg
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-06 09:19:16 -08:00
Sage Weil
b78b725d53 Merge pull request #9 from fghaas/ocf-ra
OCF resource agents: add rbd

Reviewed-by: Sage Weil <sage@newdream.net>
Reviewed-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2012-03-06 09:14:25 -08:00
Florian Haas
affda7c01a rbd OCF RA: fix whitespace inconsistency
Signed-off-by: Florian Haas <florian@hastexo.com>
2012-03-06 09:58:52 +01:00
Sage Weil
d9d5cf2ec1 Merge remote branch 'gh/wip-msgr-interface'
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-05 22:48:07 -08:00
Sage Weil
ed0f605365 Merge remote branch 'gh/wip-swift-acls'
Lightly-reviewed-by: Sage Weil <sage@newdream.net>
2012-03-05 14:35:30 -08:00
Sage Weil
3e95dfdf88 osd: delay non-replayed ops during replay
If we get new (non-replayed) ops during replay, those need to wait until
after the replayed ops are ordered and applied.  Otherwise we break the op
ordering completely, particularly with something like

 - pg not active
 - get op 1, put on waiting_for_active
 - pg enters replay
 - get op 2, apply immediately
 - finish replay, requeue op 1

Fixes: #2082
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-05 14:21:31 -08:00
Sage Weil
702f09ea74 librados: close narrow shutdown race
timer.shutdown() will drop and retake the lock, so set DISCONNECTED first
to avoid a message slipping in and reaching the objecter like so:

INFO:teuthology.task.rados.rados.0.err:osdc/Objecter.cc: In function 'void Objecter::handle_osd_op_reply(MOSDOpReply*)' thread 7f0bc2b1b700 time 2012-03-03 18:35:25.302135
INFO:teuthology.task.rados.rados.0.err:osdc/Objecter.cc: 1151: FAILED assert(initialized)
INFO:teuthology.task.rados.rados.0.err: ceph version 0.43-46-g2e57997 (commit:2e57997894944696fcc737aae9b57e30b6bb5bdc)
INFO:teuthology.task.rados.rados.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xb3) [0x7f0bc59bd66f]
INFO:teuthology.task.rados.rados.0.err: 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x82) [0x7f0bc58e885e]
INFO:teuthology.task.rados.rados.0.err: 3: (librados::RadosClient::_dispatch(Message*)+0x66) [0x7f0bc58a2674]
INFO:teuthology.task.rados.rados.0.err: 4: (librados::RadosClient::ms_dispatch(Message*)+0x130) [0x7f0bc58a246e]
INFO:teuthology.task.rados.rados.0.err: 5: (Messenger::ms_deliver_dispatch(Message*)+0x8b) [0x7f0bc5a4e859]
INFO:teuthology.task.rados.rados.0.err: 6: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7f0bc5a377fc]
INFO:teuthology.task.rados.rados.0.err: 7: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x7f0bc58b5512]
INFO:teuthology.task.rados.rados.0.err: 8: (Thread::_entry_func(void*)+0x23) [0x7f0bc5ac4c75]
INFO:teuthology.task.rados.rados.0.err: 9: (()+0x7971) [0x7f0bc5110971]
INFO:teuthology.task.rados.rados.0.err: 10: (clone()+0x6d) [0x7f0bc495092d]

Fixes: #2135
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-03-05 14:21:12 -08:00
Sage Weil
743da9bd22 osd: don't trust pusher's data_complete
The pusher doesn't know what clone_overlap we'll see, so it has no idea
if we are data_complete from our perspective, making this check useless.
In particular, we screw up if we race with a recalculation of
clone_overlap.

Fixes: #2133
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-05 14:21:00 -08:00
Sage Weil
e1a9e18b38 osd: warn if recovery still has missing at end
We shouldn't get to this point.  If we do, recover_primary didn't do what
it needed to.  Dump the remaining missing set and hope we can debug.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-05 14:20:48 -08:00
Florian Haas
c31b86963a OCF resource agents: add rbd
Add a resource agent for mapping, unmapping and monitoring RBD devices.

Maps an RBD on start, unmaps it on stop. Checks "rbd showmapped"
output for monitoring whether the device is mapped, thus does not
rely on the ceph-rbdnamer udev magic to be enabled.

This RA is cloneable and essentially allows people to use RBD devices
as a drop-in replacement for
- iSCSI devices,
- host-based mirrored devices using md RAID-1,
- DRBD devices
in Pacemaker clusters.
2012-03-05 21:30:30 +01:00
Sage Weil
75cbed61e9 DBObjectMap: remove stray ;
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-03 21:01:45 -08:00
Sage Weil
0272b5906d LevelDBStore: #include types.h
This fixes some compile errors on one of my boxes (squeeze).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-03 14:45:44 -08:00
Sage Weil
004ec667a6 .gitignore: *.tar.bz2
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-02 14:59:51 -08:00
Greg Farnum
6e2a16b8c2 msgr: start re-ordering functions into a better order
This is the start of making the SimpleMessenger interface legible
to users. In addition to moving the configuration and accessor
functions to the top of the file, it adds virtual to the functions
which are part of the defined Messenger interface.
You can tell from some of the comments that work remains.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 14:46:06 -08:00
Sage Weil
38537ba74a Merge branch 'stable' 2012-03-02 13:45:03 -08:00
Greg Farnum
38bec5da48 msgr: remove refcounting of Messengers.
This was pretty pointless since each Messenger has a well-defined
exit point and shutdown process.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
091b176016 msgr: make nonce a required part of the SimpleMessenger constructor.
With that, remove the set_nonce function and the gratuitous passing
of nonce around through layers of functions.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
26e48f4234 msgr: Require that init functions are called before bind() and start().
Fix up callers to handle these constraints.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
29be52820d librados: remove gratuitous call to add_dispatcher_head.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
cd174c5e2b msgr: promote the started bool to Messenger.
Make it a protected member of Messenger instead of a public part of
SimpleMessenger.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:30 -08:00
Greg Farnum
578bc9c420 msgr: Remove the SimpleMessenger::bind() nonce parameter.
Instead, use the just-established nonce value.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:27 -08:00
Greg Farnum
ef244773ee msgr: Remove the SimpleMessenger start/start_with_nonce distinction.
Instead, have a settable nonce value that you can fill in any time
after construction and that it uses during regular start().

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:27 -08:00
Greg Farnum
ffa595598d msgr: Remove SimpleMessenger::register_entity
This function has been vestigial for a long time. Remove it and move
its remaining functionality into the constructor.
Update users to the new interface (this is remarkably easy and
simplifies the code).

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:27 -08:00
Greg Farnum
3bd1d2ae4a msgr: add start() and wait() stubs to the Messenger interface
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:24 -08:00
Sage Weil
70360f840e github.com/NewDreamNetwork -> github.com/ceph 2012-03-02 11:00:08 -08:00
Sage Weil
cacf0fdec8 filestore: fix rollback safety check
There is a window in the old check between when current/commit_op_seq is
written and the snapshot is taken.  If ceph-osd crashes, we'll be unable to
start because we'll believe current/ was in use without proper checkpoints.

Instead, make the snapped/not snapped state of current/ explicit.

Fixes: #2118
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-03-02 09:50:11 -08:00