Commit Graph

18241 Commits

Author SHA1 Message Date
Sage Weil
e42a0e9f59 crush: move (de)compile into CrushCompiler class
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-19 14:44:06 -08:00
Sage Weil
4dd8c3542a crush: uninline encode/decode
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-19 12:08:11 -08:00
Sage Weil
6b5be27634 crush: cleanup: use temp var for curstep
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-19 11:59:11 -08:00
Sage Weil
ff5178c86a mds: use want_state to indicate shutdown
State gets DNE when we receive the first map.  And want_ makes more sense
anyway.  Fixes MDS startup.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-19 07:41:47 -08:00
Sage Weil
344c202203 osd: fix up argument to PG::init()
Commit cefa55b288 moved PG initialization
into init(), but passed acting for both up and acting args.  This lead to
confusion between primary and replica.

Also fix debug print so that the output is useful.

Fixes: #2075, #2070
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 22:17:35 -08:00
Sage Weil
2500a9b691 SimpleMessenger: drop unused sigint()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 22:12:26 -08:00
Sage Weil
1f5e446d8a msgr: promote SimpleMessenger::Policy to Messenger::Policy
This is part of the generic interface, not specific to the implementation.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 22:12:26 -08:00
Sage Weil
10016923c9 mds: ignore all msgr callbacks on shutdown, not just dispatch
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 22:12:26 -08:00
Sage Weil
1f240ca4ff mon: discard messages while shutting down
Add SHUTDOWN state.  Ignore any msgr callbacks if set.

Fixes crash like

2012-02-18T21:57:58.912 INFO:teuthology.task.ceph:Shutting down mon daemons...
2012-02-18T21:57:58.912 DEBUG:teuthology.task.ceph.mon.a:waiting for process to exit
2012-02-18T21:57:58.913 INFO:teuthology.task.ceph.mon.a.err:2012-02-18 21:57:58.927759 7fe98dfa1700 mon.a@1(peon) e1 *** Got Signal Terminated ***
2012-02-18T21:57:59.014 INFO:teuthology.task.ceph.mon.a.err:*** Caught signal (Segmentation fault) **
2012-02-18T21:57:59.014 INFO:teuthology.task.ceph.mon.a.err: in thread 7fe98d7a0700
2012-02-18T21:57:59.014 INFO:teuthology.task.ceph.mon.a.err: ceph version 0.41-382-gc1db900 (commit:c1db9009c2cde9dc7ab8857b0d28a1b6d931e98a)
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x5b0871]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 2: (()+0xfb40) [0x7fe991a1eb40]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 3: (PerfCounters::set(int, unsigned long)+0x1a) [0x52008a]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 4: (PGMonitor::update_logger()+0x96) [0x4d4bf6]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 5: (PGMonitor::update_from_paxos()+0xa70) [0x4e0980]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 6: (Monitor::_ms_dispatch(Message*)+0x143b) [0x47bd6b]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 7: (Monitor::ms_dispatch(Message*)+0x90) [0x489210]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 8: (SimpleMessenger::dispatch_entry()+0x89a) [0x53959a]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x46358c]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 10: (()+0x7971) [0x7fe991a16971]
2012-02-18T21:57:59.017 INFO:teuthology.task.ceph.mon.a.err: 11: (clone()+0x6d) [0x7fe9902a592d]

which is analogous to #2014.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 21:41:13 -08:00
Sage Weil
787dd17097 msgr: fix shutdown vs accept race
This is a kludge.  The real fix is to rewrite SimpleMessenger as a state
machine.

Fixes: #2073
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 14:28:44 -08:00
Sage Weil
c3a509a0f6 mds: drop all messages during suicide
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-18 14:28:42 -08:00
Sage Weil
fe0859aad5 Merge remote branch 'gh/wip-pg-states' 2012-02-18 14:00:50 -08:00
Sage Weil
6e89d9ca06 osd: update_stats() in GetInfo state start
This is the first stage of peering.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 16:24:10 -08:00
Sage Weil
fb31f63170 osd: don't update_stats() on prec_replica_info
Nothing changes here...

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 16:24:10 -08:00
Sage Weil
9e309c493e filestore: hold journal_lock during
Hold journal_lock during replay so that we don't stomp on variables like
op_seq and open_ops that the the commit thread cares about.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 16:24:10 -08:00
Sage Weil
06a2202b96 osd: only complete/deregister repop once
It's now possible to send the ack and deregister the repop before the
op_applied() happens.  And when that happens, we'll call eval_repop() once
more.  Don't do anything in that case.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 16:24:10 -08:00
Josh Durgin
c1db9009c2 Merge branch 'next' 2012-02-17 14:31:44 -08:00
Josh Durgin
4925e9c6d9 man: regenerate man pages
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-17 14:27:13 -08:00
Josh Durgin
304389ca0e man: move man page fixes to rst
83cf1b62fd and
e5f49104ab updated the nroff output
but not the rst source.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-17 14:27:06 -08:00
Florian Haas
a446f32394 doc: fix snapshot creation/deletion syntax in rbd man page (trivial)
Creating a snapshot requires using "rbd snap create",
as opposed to just "rbd create". Also for purposes of
clarification, add note that removing a snapshot similarly
requires "rbd snap rm".

Thanks to Josh Durgin for the explanation on IRC.

Signed-off-by: Florian Haas <florian@hastexo.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-17 14:27:00 -08:00
Sage Weil
7837c19b56 osd: make op_commit imply op_applied for purposes of repop completion
For repop completion, we want waitfor_ack and _commit to be empty.  For
replicas, a commit reply implies ack, so ack is always a subset of commit.
But for the local write, we wait for applied separately, so we can have
repops open where we sent the reply to the client but still have it open
and consuming memory.  And generating 'old request' warnings in the logs
(when the filestore is taking a long time to apply to the fs).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 13:48:02 -08:00
Sage Weil
d6c767456c osd: add REMAPPED state
Set this bit whenever up != acting.  This tells you that the OSDMap is
explicitly remapping the PG to different nodes (than what CRUSH specified).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 13:46:11 -08:00
Sage Weil
8e6f9ca8ac osd: refactor recovery completion
- rename is_all_update() -> needs_recovery(), reverse logic.
- drop up != acting check; that has nothing to do with
  recovery itself
- drop trigger in Active::react(const ActMap&)... it's nonsensical
- CompleteRecovery always leads to finish_recovery (or acting set change)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-17 13:19:57 -08:00
Sage Weil
8c0e184c50 osd: introduce RECOVERING pg state
Since clean now means not degraded, we need some other indication that
recovery has completed and we are "done" (given the current up/down state
of the OSDs).

Adding a 'recovering' state also makes it clearer to users that work is
being done, as opposed to the current situation, where they look for the
absense of 'clean'.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-17 10:56:58 -08:00
Sage Weil
db41bdda7e paxos: fix is_consistent() check
If our last_committed == 1, we don't need a separate stash.  This is the
logic that slurp() follows, so fix is_consistent() to match.

Fixes: #2077
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-17 10:23:12 -08:00
Tom Callaway
d913e5e670 osd: change nested iterator name
Don't shadow the iterator variable.

Signed-off-by: Tom Callaway <spot@redhat.com>
Signed-off-by: David Nalley <david@gnsa.us>
2012-02-17 09:17:27 -08:00
Tom Callaway
2325da8635 add missing #includes to build on gcc 4.7
Signed-off-by: Tom Callaway <spot@redhat.com>
Signed-off-by: David Nalley <david@gnsa.us>
2012-02-17 09:17:22 -08:00
Tom Callaway
d938246c50 mds: comment out unused code in mds dump_pop_map
Signed-off-by: Tom Callaway <spot@redhat.com>
Signed-off-by: David Nalley <david@gnsa.us>
2012-02-17 09:17:05 -08:00
Sage Weil
07504607e3 Merge branch 'next' 2012-02-16 21:00:49 -08:00
Sage Weil
95633b9b88 osd: fix _activate_committed replica->primary message
Normally we take a fresh map reference in PG::lock().  However,
_activate_committed needs to make sure the map hasn't changed significantly
before acting.  In the case of #2068, the OSD map has moved forward and
the mapping has changed, but the PG hasn't processed that yet, and thus
mis-tags the MOSDPGInfo message.

Tag the message with the e epoch, and also pass down the primary's address
to send the message to the right location.

Fixes: #2068
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-16 21:00:35 -08:00
Sage Weil
41425f6be9 osd: skip threadpool pause on shutdown when blackholed
We can't pause the threadpools if they're blocked on a blackholed
filestore.  Instead, just call _exit().

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-16 15:18:58 -08:00
Sage Weil
4b3bb5ab37 osd: fix _activate_committed replica->primary message
Normally we take a fresh map reference in PG::lock().  However,
_activate_committed needs to make sure the map hasn't changed significantly
before acting.  In the case of #2068, the OSD map has moved forward and
the mapping has changed, but the PG hasn't processed that yet, and thus
mis-tags the MOSDPGInfo message.

Tag the message with the e epoch, and also pass down the primary's address
to send the message to the right location.

Fixes: #2068
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-16 09:12:45 -08:00
Sage Weil
82eceb9a3b osd: fix do not always clear DEGRADED/set CLEAN on recovery finish
Clean means we have exactly the right number of replicas and recovery is
complete.  Degraded means we do not have enough replicas, either because
recovery is in progress, or because acting is too small.

A consequence is that if we have a PG with len(up) == 1 but a pg_temp
mapping so that len(acting) == 2, it will be active and not clean.

Fixes: #2060
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-15 15:20:35 -08:00
Wido den Hollander
45701f5b68 init: Only check if auto start is disabled when the issued command is "start"
This still makes sure daemons don't start on boot.

When auto start was disabled it would also prevent logrotate from doing it's job.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 09:29:15 -08:00
Holger Macht
543e8b98d0 ceph.spec.in: Move libcls_*.so from -devel to base package
OSDs (src/osd/ClassHandler.cc) specifically look for libcls_*.so in
/usr/$libdir/rados-classes, so libcls_rbd.so and libcls_rgw.so need to
be shipped along with the base package.

Signed-off-by: Holger Macht <hmacht@suse.de>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 09:28:41 -08:00
Sage Weil
1a994bed63 objclass: add debug_objclass knob, default to off
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 09:04:22 -08:00
Sage Weil
ba0ef62f86 osd: reduce watch/notify debug noise
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 09:03:28 -08:00
Sage Weil
ebbfdefa12 msgr: mark_all_down on shutdown
This ensures we destroy all the Pipes and discard their messages.  Among
other things, this can avoid

2012-02-15 03:16:46.385242 7fe712b9a700 mon.f@5(peon) e1 *** Got Signal Terminated ***
2012-02-15 03:16:46.470227 7fe712b9a700 mon.f@5(peon) e1 shutdown
msg/SimpleMessenger.h: In function 'virtual SimpleMessenger::Pipe::~Pipe()' thread 7fe716a37780 time 2012-02-15 03:16:46.471005
msg/SimpleMessenger.h: 234: FAILED assert(!i->second->is_on_list())
 ceph version 0.41-362-g40802ae (commit:40802ae883a94d205a8716065b80ad5d7ff57d12)
 1: (SimpleMessenger::Pipe::~Pipe()+0x199) [0x4669d9]
 2: (SimpleMessenger::~SimpleMessenger()+0x31) [0x552231]
 3: (main()+0x3026) [0x4614a6]
 4: (__libc_start_main()+0xfe) [0x7fe714dd6d8e]
 5: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e219]
 ceph version 0.41-362-g40802ae (commit:40802ae883a94d205a8716065b80ad5d7ff57d12)
 1: (SimpleMessenger::Pipe::~Pipe()+0x199) [0x4669d9]
 2: (SimpleMessenger::~SimpleMessenger()+0x31) [0x552231]
 3: (main()+0x3026) [0x4614a6]
 4: (__libc_start_main()+0xfe) [0x7fe714dd6d8e]
 5: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e219]

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 08:21:10 -08:00
Sage Weil
c1b6b218d2 osd: do not sync_and_flush if blackholed
If we have blackholed this will block forever.  In that case dont' bother.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 08:21:02 -08:00
Sage Weil
e6ffe31bdf workqueue: make pause/unpause count
We can pause() multiple times, and we need as many unpause()s to actually
resume work.

This resolves problems where we have two actors interested in pausing a
queue, both want to stop work, and they aren't interacting/coordinating.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-15 08:20:32 -08:00
Sage Weil
40802ae883 osd: exit code 0 on SIGINT/SIGTERM
This makes daemon-handler happy...

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 22:05:36 -08:00
Sage Weil
2aafdeada8 signals: check write(2) return values
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-14 21:04:05 -08:00
Sage Weil
9cd090038f osd: semi-clean shutdown on signal
Make some effort to stop work in progress, remove pid file, and exit with
informative error code.

Note that this is much simpler than the shutdown() exit path; I'm not sure
whether a complete teardown is useful.  It's also difficult to maintain
and get right with everything else going on, and it's not clear that it's
worth the effort right now.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:54 -08:00
Sage Weil
ec066829a7 mds: remove some cruft
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:54 -08:00
Sage Weil
395dc659b9 mds: remove pidfile
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:53 -08:00
Sage Weil
bbe5cd755f mon: do a clean shutdown on SIGINT/SIGTERM
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:53 -08:00
Sage Weil
eafe832791 mon: install async signal handlers for SIG{HUP,INT,TERM}
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:53 -08:00
Sage Weil
e905564bb2 osd: install async signal handlers for SIG{HUP,INT,TERM}
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:53 -08:00
Sage Weil
be704fe1d9 mds: install async signal handlers for SIG{HUP,INT,TERM}
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:53 -08:00
Sage Weil
afa1f9e392 signal: remove unused/obsolete handle_shutdown_signal
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 21:03:53 -08:00