Commit Graph

18403 Commits

Author SHA1 Message Date
Greg Farnum
38bec5da48 msgr: remove refcounting of Messengers.
This was pretty pointless since each Messenger has a well-defined
exit point and shutdown process.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
091b176016 msgr: make nonce a required part of the SimpleMessenger constructor.
With that, remove the set_nonce function and the gratuitous passing
of nonce around through layers of functions.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
26e48f4234 msgr: Require that init functions are called before bind() and start().
Fix up callers to handle these constraints.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
29be52820d librados: remove gratuitous call to add_dispatcher_head.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:36 -08:00
Greg Farnum
cd174c5e2b msgr: promote the started bool to Messenger.
Make it a protected member of Messenger instead of a public part of
SimpleMessenger.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 12:32:30 -08:00
Greg Farnum
578bc9c420 msgr: Remove the SimpleMessenger::bind() nonce parameter.
Instead, use the just-established nonce value.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:27 -08:00
Greg Farnum
ef244773ee msgr: Remove the SimpleMessenger start/start_with_nonce distinction.
Instead, have a settable nonce value that you can fill in any time
after construction and that it uses during regular start().

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:27 -08:00
Greg Farnum
ffa595598d msgr: Remove SimpleMessenger::register_entity
This function has been vestigial for a long time. Remove it and move
its remaining functionality into the constructor.
Update users to the new interface (this is remarkably easy and
simplifies the code).

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:27 -08:00
Greg Farnum
3bd1d2ae4a msgr: add start() and wait() stubs to the Messenger interface
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-03-02 11:20:24 -08:00
Yehuda Sadeh
85d04c6ceb rgw: don't check for ECANCELED in the _impl() functions
We already check it in the outer functions.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-29 14:46:33 -08:00
Yehuda Sadeh
86340655ff rgw: don't retry certain operations if we raced
The atomic get/put scheme was retrying writes in case where it lost
races (head object was rewritten by another client). Instead we can
just back off and return success.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-29 14:46:33 -08:00
Sage Weil
b1f264406f msgr: fix race in learned_addr()
- two connect() threads
- both hit if (need_addr) check
- one takes lock, sets addr, need_addr = false, unlocks
- continues to ::encode(ms_addr, ...);
- meanwhile, second thread set ms_addr _again_, but copies peer port into
  place before adjusting it.  racing ::encode() sees bad port and sends it
  to the peer.

Fix this two ways:

- don't copy bad port into place; set it first
- re-check need_addr after taking lock

Fixes: #1747
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-29 13:22:34 -08:00
Sage Weil
8a2b76411e msgr: print existing->state before failing assert
May help with #1378.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 12:28:19 -08:00
Sage Weil
cbb128090c Merge remote-tracking branch 'gh/wip-2121'
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-29 11:07:03 -08:00
Sage Weil
052d64e1c4 osd: unregister signal handlers on shutdown
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:46:13 -08:00
Sage Weil
db96831bbd mon: unregister signal handlers on shutdown
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:46:06 -08:00
Sage Weil
8e9bf6111e mds: unregister SIGHUP too
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:45:56 -08:00
Sage Weil
bb5c76400c radosgw: handle SIGHUP
Fixes: #2121
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:45:46 -08:00
Sage Weil
9c7b63e122 init-radosgw: add 'reload' command to send SIGHUP
Fixes: #2121
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 09:23:22 -08:00
Sage Weil
e843766504 osd: fix typo is recovery_state query dump
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 09:21:22 -08:00
Sage Weil
0e03e9dd8d osd: add missing space to scrub error
[ERR] 18.5 osd.3: soid 8a5e37ad/rb.0.0.000000002b99/headextra attr _, extra attr snapset

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 09:17:07 -08:00
Greg Farnum
2437ce02c1 msgr: discard the local_pipe's queue on shutdown.
To facilitate this, we do two things:
1) actually identify the number of special code values we pass around
2) use that to prevent trying to put() those non-pointer values in
Pipe::discard_queue().
Then we just call local_pipe.discard_queue() in wait() like happens
(indirectly, via reaping) with all the normal Pipes in rank_pipe.

But this does make me think that we may be approaching the point
where it's appropriate to create a subclass LocalPipe (against a
RemotePipe like our current Pipe implementation is mostly intended
to be).

Should fix #2086.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-29 09:12:46 -08:00
Sage Weil
7690f0b959 osd: remove down OSDs from peer_info on reset
If an OSD goes down, remove it from peer_info. In particular, I saw

2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap
2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering]  PriorSet: affected_by_map osd.1 now down
...
2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw)
2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior  prior osd.1 is down
2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior final: probe 3,5 down 1 blocked_by {}
...
2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)

Any time an osd goes down we want to ensure we remove it from peer_info.
Handling this in Reset and Started states captures all of the nested
states, which forward the event (or re-post transit to Reset).  We can
also drop the Primary reaction, which is now superfluous.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-29 09:10:57 -08:00
Sage Weil
fe94c0414e Merge branch 'next' 2012-02-28 17:04:55 -08:00
Josh Durgin
b9a675a293 mon: report pgs stuck inactive/unclean/stale in health check
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-28 13:53:15 -08:00
Greg Farnum
d10e1f46df mon: fix slurp_latest to fill in any missing incrementals
Fixes #1789.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-28 12:29:05 -08:00
Sage Weil
7b48cca184 test_osd_types: fix unit test for new pg_t::is_split() prototype
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:33:18 -08:00
Sage Weil
fd0712dfb4 Makefile: drop separate libjson_spirit.la
automake seems to have difficulty with the .la dependency on another .la.
Since libjson_spirit.la is only used by libcommon.la anyway, just build it
directly into that.  Sigh.

...
CXXLD libjson_spirit.la
AR libmds.a
CXXLD libcls_rbd.la
CXXLD libcls_rgw.la
CXXLD cephfs
CCLD test_ioctls
CC libcommon_la-ceph_ver.lo
CXX libcommon_la-version.lo
CXX ceph_dencoder.o
CCLD mount.ceph
CC ceph_ver.o
CXX test_libhadoopcephfs_build-version.o
CXXLD test_libhadoopcephfs_build
CXXLD libcommon.la
libtool: link: cannot find the library `libjson_spirit.la' or unhandled argument `libjson_spirit.la'

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:30:47 -08:00
Sage Weil
edd35c04b4 osd: drop useless ENOMEM check
new throws exception; doesn't return NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:26:04 -08:00
Sage Weil
a7de459f69 ceph-osd: clarify error messages
So we know where the error came from.  And use real error codes in init().

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:11:59 -08:00
Wido den Hollander
97926e1846 init: Actually do start the daemons when 'service ceph start <type>' is specified
A bug in my previous patch prevented any daemon with auto_start set to false from starting.

This patch allows:
* /etc/init.d/ceph start osd|mds|mon
* service ceph start osd|mds|mon

It however does not start daemons if auto_start is disabled when you invoke:
* /etc/init.d/ceph start
* service ceph start

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2012-02-28 09:10:52 -08:00
Sage Weil
f317028f42 doc: beginnings of documentation of stuck pgs and pg states
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-27 15:41:57 -08:00
Sage Weil
1917024134 filestore: make less noise on ENOENT
Don't generate high-level log spam on every open error.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-27 15:13:35 -08:00
Greg Farnum
244b702966 pg: use get_cluster_inst instead of get_inst in activate
This was mistakenly broken in 4b3bb5ab37

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sam Just <sam.just@dreamhost.com>
2012-02-27 14:49:34 -08:00
Sage Weil
f02195b40f Merge branch 'wip-split2'
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-27 14:37:41 -08:00
Sage Weil
b6a04174bc osd: pg_t::is_split(): make children out param a pointer, and optional
Also unit test it.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:35:21 -08:00
Sage Weil
85ed06e973 osd: bypass split code
Until it is fully implemented.  It's also disabled in the monitor
currently, but just in case it gets into the OSDMap, do nothing for now.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:18:21 -08:00
Sage Weil
15d5324904 osd: fix pg locking flags
Two things we need to handle:

 - callers who already hold map_lock (split_pg())
 - callers who already hold another pg->lock, and want to skip the lockdep
   check for this one.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:16:27 -08:00
Sage Weil
fc7b11a9ee osd: partially refactor pg split
This partially refactors the OSD split code to do the split synchronously
when processing a new OSDMap.  It is incomplete in that it does not yet
do anything useful for the PG.  The full solution needs to:

- Do the split synchronously when applying the map update.
- Reset the parent pg so that it repeers.  This will cause problems until
  we consistently consider this a new interval when looking backwards in
  time; this needs to be fixed.  Anybody doing generate_past_intervals()
  or similar will need to consider a split/merge event as an interval
  boundary.
- The recovery state machine should trigger appropriately when this
  happens.
- The old PG that was split should probably be handle identically to the
  new children.  That means deleting the old PG instance and creating a new
  PG object for the newly-split child.  Ditto for merge.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:04:22 -08:00
Sage Weil
d9cf33223e osd: implement pg_t::is_split()
Test to determine if a pg has split between two pool sizes, and if so,
what its children are.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 11:44:24 -08:00
Sage Weil
6a081888fd osd: factor hobject key into child pgid calc during split
When we calculate the object's new pg, take the locator key into
consideration, to avoid a crash like

osd/OSD.cc: In function 'void OSD::split_pg(PG*, std::map<pg_t, PG*>&,ObjectStore::Transaction&)' thread 7fe3df8c4700 time 2012-02-20 18:22:19.900886
osd/OSD.cc: 4066: FAILED assert(child)

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 11:44:24 -08:00
Sage Weil
ee4d99099f journaler: log on unexpected objecter error
This will help with #2110, #1796, #1640.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 11:39:53 -08:00
Sage Weil
91b119a064 osd: fix recursive map_lock via check_replay_queue()
Also drop activate_pg() helper while we're at it, so it's clear that we
are the only user.

recursive lock of OSD::map_lock (33)
 ceph version 0.42-146-g7ad35ce (commit:7ad35ce489cc5f9169eb838e1196fa2ca4d6e985)
2012-02-24 12:30:16.541416 1: (PG::lock(bool)+0x2a) [0xa09348]
2012-02-24 12:30:16.541424 2: (OSD::_lookup_lock_pg(pg_t)+0xbd) [0x84b8df]
2012-02-24 12:30:16.541431 3: (OSD::activate_pg(pg_t, utime_t)+0x9f) [0x87463b]
2012-02-24 12:30:16.541442 4: (OSD::check_replay_queue()+0x12f) [0x87452d]
2012-02-24 12:30:16.541450 5: (OSD::tick()+0x23c) [0x8535ea]
2012-02-24 12:30:16.541456 6: (OSD::C_Tick::finish(int)+0x1f) [0x881671]
2012-02-24 12:30:16.541462 7: (SafeTimer::timer_thread()+0x2d5) [0x8f8211]
2012-02-24 12:30:16.541468 8: (SafeTimerThread::entry()+0x1c) [0x8f923c]
2012-02-24 12:30:16.541475 9: (Thread::_entry_func(void*)+0x23) [0x9c8109]
2012-02-24 12:30:16.541485 10: (()+0x68ba) [0x7f9dbed838ba]
2012-02-24 12:30:16.541491 11: (clone()+0x6d) [0x7f9dbd66f02d]
2012-02-24 12:30:16.541495 common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)' thread 7f9db9d98700 time 2012-02-24 12:30:16.541504

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Sam Just <samuel.just@dreamhost.com>
2012-02-27 09:56:21 -08:00
Sage Weil
402ece5e31 init-ceph: stick with /var/run for the time being
/run isn't present on older systems.  Stick with the old location until it
is more pervasive, or we add an autoconf option to control it.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-26 20:56:05 -08:00
Laszlo Boszormenyi
41295b584a debian: /var/run/ceph -> /run/ceph
/run/ceph should exists for creating UNIX domain sockets
ceph uses UNIX domain sockets for internal communication. Create their
directory on startup as /run is on a virtual filesystem.

Last-Update: <2012-02-26>
Bug-Debian: http://bugs.debian.org/660238
Forwarded: <ceph-devel@vger.kernel.org>
Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
2012-02-26 20:47:53 -08:00
Laszlo Boszormenyi
0d8b5756e1 debian: build-{indep,arch}
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
2012-02-26 20:45:52 -08:00
Laszlo Boszormenyi
3ad6ccb4a6 debian: sdparm|hdparm, new standards version
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
2012-02-26 20:45:06 -08:00
Yehuda Sadeh
266902a993 rgw: initialize bucket_id in bucket structure
might make valgrind a little bit less noisy.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-24 17:01:32 -08:00
Sage Weil
f8f6e4d850 rgw: _exit(0) on SIGTERM
We need to do something a bit smarter to get coverage information, but this
is a start.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-24 15:32:07 -08:00
Sage Weil
708be0a5ab Merge remote branch 'gh/wip-crush-adjust'
Reviewed-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2012-02-24 13:52:32 -08:00