RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-16 16:39:21 +00:00

Author	SHA1	Message	Date
Greg Farnum	38bec5da48	msgr: remove refcounting of Messengers. This was pretty pointless since each Messenger has a well-defined exit point and shutdown process. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 12:32:36 -08:00
Greg Farnum	091b176016	msgr: make nonce a required part of the SimpleMessenger constructor. With that, remove the set_nonce function and the gratuitous passing of nonce around through layers of functions. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 12:32:36 -08:00
Greg Farnum	26e48f4234	msgr: Require that init functions are called before bind() and start(). Fix up callers to handle these constraints. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 12:32:36 -08:00
Greg Farnum	29be52820d	librados: remove gratuitous call to add_dispatcher_head. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 12:32:36 -08:00
Greg Farnum	cd174c5e2b	msgr: promote the started bool to Messenger. Make it a protected member of Messenger instead of a public part of SimpleMessenger. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 12:32:30 -08:00
Greg Farnum	578bc9c420	msgr: Remove the SimpleMessenger::bind() nonce parameter. Instead, use the just-established nonce value. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 11:20:27 -08:00
Greg Farnum	ef244773ee	msgr: Remove the SimpleMessenger start/start_with_nonce distinction. Instead, have a settable nonce value that you can fill in any time after construction and that it uses during regular start(). Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 11:20:27 -08:00
Greg Farnum	ffa595598d	msgr: Remove SimpleMessenger::register_entity This function has been vestigial for a long time. Remove it and move its remaining functionality into the constructor. Update users to the new interface (this is remarkably easy and simplifies the code). Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 11:20:27 -08:00
Greg Farnum	3bd1d2ae4a	msgr: add start() and wait() stubs to the Messenger interface Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-03-02 11:20:24 -08:00
Yehuda Sadeh	85d04c6ceb	rgw: don't check for ECANCELED in the _impl() functions We already check it in the outer functions. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-02-29 14:46:33 -08:00
Yehuda Sadeh	86340655ff	rgw: don't retry certain operations if we raced The atomic get/put scheme was retrying writes in case where it lost races (head object was rewritten by another client). Instead we can just back off and return success. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-02-29 14:46:33 -08:00
Sage Weil	b1f264406f	msgr: fix race in learned_addr() - two connect() threads - both hit if (need_addr) check - one takes lock, sets addr, need_addr = false, unlocks - continues to ::encode(ms_addr, ...); - meanwhile, second thread set ms_addr _again_, but copies peer port into place before adjusting it. racing ::encode() sees bad port and sends it to the peer. Fix this two ways: - don't copy bad port into place; set it first - re-check need_addr after taking lock Fixes: #1747 Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-02-29 13:22:34 -08:00
Sage Weil	8a2b76411e	msgr: print existing->state before failing assert May help with #1378. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-29 12:28:19 -08:00
Sage Weil	cbb128090c	Merge remote-tracking branch 'gh/wip-2121' Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>	2012-02-29 11:07:03 -08:00
Sage Weil	052d64e1c4	osd: unregister signal handlers on shutdown Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2012-02-29 09:46:13 -08:00
Sage Weil	db96831bbd	mon: unregister signal handlers on shutdown Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2012-02-29 09:46:06 -08:00
Sage Weil	8e9bf6111e	mds: unregister SIGHUP too Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2012-02-29 09:45:56 -08:00
Sage Weil	bb5c76400c	radosgw: handle SIGHUP Fixes: #2121 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2012-02-29 09:45:46 -08:00
Sage Weil	9c7b63e122	init-radosgw: add 'reload' command to send SIGHUP Fixes: #2121 Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-29 09:23:22 -08:00
Sage Weil	e843766504	osd: fix typo is recovery_state query dump Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-29 09:21:22 -08:00
Sage Weil	0e03e9dd8d	osd: add missing space to scrub error [ERR] 18.5 osd.3: soid 8a5e37ad/rb.0.0.000000002b99/headextra attr _, extra attr snapset Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-29 09:17:07 -08:00
Greg Farnum	2437ce02c1	msgr: discard the local_pipe's queue on shutdown. To facilitate this, we do two things: 1) actually identify the number of special code values we pass around 2) use that to prevent trying to put() those non-pointer values in Pipe::discard_queue(). Then we just call local_pipe.discard_queue() in wait() like happens (indirectly, via reaping) with all the normal Pipes in rank_pipe. But this does make me think that we may be approaching the point where it's appropriate to create a subclass LocalPipe (against a RemotePipe like our current Pipe implementation is mostly intended to be). Should fix #2086. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-02-29 09:12:46 -08:00
Sage Weil	7690f0b959	osd: remove down OSDs from peer_info on reset If an OSD goes down, remove it from peer_info. In particular, I saw 2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap 2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] PriorSet: affected_by_map osd.1 now down ... 2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw) 2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] PriorSet: build_prior prior osd.1 is down 2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] PriorSet: build_prior final: probe 3,5 down 1 blocked_by {} ... 2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog 2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598) 2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598) 2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) 2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598) Any time an osd goes down we want to ensure we remove it from peer_info. Handling this in Reset and Started states captures all of the nested states, which forward the event (or re-post transit to Reset). We can also drop the Primary reaction, which is now superfluous. Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Samuel Just <samuel.just@dreamhost.com> Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>	2012-02-29 09:10:57 -08:00
Sage Weil	fe94c0414e	Merge branch 'next'	2012-02-28 17:04:55 -08:00
Josh Durgin	b9a675a293	mon: report pgs stuck inactive/unclean/stale in health check Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Sage Weil <sage.weil@dreamhost.com>	2012-02-28 13:53:15 -08:00
Greg Farnum	d10e1f46df	mon: fix slurp_latest to fill in any missing incrementals Fixes #1789. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2012-02-28 12:29:05 -08:00
Sage Weil	7b48cca184	test_osd_types: fix unit test for new pg_t::is_split() prototype Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-28 09:33:18 -08:00
Sage Weil	fd0712dfb4	Makefile: drop separate libjson_spirit.la automake seems to have difficulty with the .la dependency on another .la. Since libjson_spirit.la is only used by libcommon.la anyway, just build it directly into that. Sigh. ... CXXLD libjson_spirit.la AR libmds.a CXXLD libcls_rbd.la CXXLD libcls_rgw.la CXXLD cephfs CCLD test_ioctls CC libcommon_la-ceph_ver.lo CXX libcommon_la-version.lo CXX ceph_dencoder.o CCLD mount.ceph CC ceph_ver.o CXX test_libhadoopcephfs_build-version.o CXXLD test_libhadoopcephfs_build CXXLD libcommon.la libtool: link: cannot find the library `libjson_spirit.la' or unhandled argument `libjson_spirit.la' Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-28 09:30:47 -08:00
Sage Weil	edd35c04b4	osd: drop useless ENOMEM check new throws exception; doesn't return NULL. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-28 09:26:04 -08:00
Sage Weil	a7de459f69	ceph-osd: clarify error messages So we know where the error came from. And use real error codes in init(). Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-28 09:11:59 -08:00
Wido den Hollander	97926e1846	init: Actually do start the daemons when 'service ceph start <type>' is specified A bug in my previous patch prevented any daemon with auto_start set to false from starting. This patch allows: * /etc/init.d/ceph start osd\|mds\|mon * service ceph start osd\|mds\|mon It however does not start daemons if auto_start is disabled when you invoke: * /etc/init.d/ceph start * service ceph start Signed-off-by: Wido den Hollander <wido@widodh.nl>	2012-02-28 09:10:52 -08:00
Sage Weil	f317028f42	doc: beginnings of documentation of stuck pgs and pg states Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-02-27 15:41:57 -08:00
Sage Weil	1917024134	filestore: make less noise on ENOENT Don't generate high-level log spam on every open error. Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Samuel Just <samuel.just@dreamhost.com>	2012-02-27 15:13:35 -08:00
Greg Farnum	244b702966	pg: use get_cluster_inst instead of get_inst in activate This was mistakenly broken in `4b3bb5ab37` Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Reviewed-by: Sam Just <sam.just@dreamhost.com>	2012-02-27 14:49:34 -08:00
Sage Weil	f02195b40f	Merge branch 'wip-split2' Reviewed-by: Samuel Just <samuel.just@dreamhost.com>	2012-02-27 14:37:41 -08:00
Sage Weil	b6a04174bc	osd: pg_t::is_split(): make children out param a pointer, and optional Also unit test it. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 14:35:21 -08:00
Sage Weil	85ed06e973	osd: bypass split code Until it is fully implemented. It's also disabled in the monitor currently, but just in case it gets into the OSDMap, do nothing for now. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 14:18:21 -08:00
Sage Weil	15d5324904	osd: fix pg locking flags Two things we need to handle: - callers who already hold map_lock (split_pg()) - callers who already hold another pg->lock, and want to skip the lockdep check for this one. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 14:16:27 -08:00
Sage Weil	fc7b11a9ee	osd: partially refactor pg split This partially refactors the OSD split code to do the split synchronously when processing a new OSDMap. It is incomplete in that it does not yet do anything useful for the PG. The full solution needs to: - Do the split synchronously when applying the map update. - Reset the parent pg so that it repeers. This will cause problems until we consistently consider this a new interval when looking backwards in time; this needs to be fixed. Anybody doing generate_past_intervals() or similar will need to consider a split/merge event as an interval boundary. - The recovery state machine should trigger appropriately when this happens. - The old PG that was split should probably be handle identically to the new children. That means deleting the old PG instance and creating a new PG object for the newly-split child. Ditto for merge. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 14:04:22 -08:00
Sage Weil	d9cf33223e	osd: implement pg_t::is_split() Test to determine if a pg has split between two pool sizes, and if so, what its children are. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 11:44:24 -08:00
Sage Weil	6a081888fd	osd: factor hobject key into child pgid calc during split When we calculate the object's new pg, take the locator key into consideration, to avoid a crash like osd/OSD.cc: In function 'void OSD::split_pg(PG, std::map<pg_t, PG>&,ObjectStore::Transaction&)' thread 7fe3df8c4700 time 2012-02-20 18:22:19.900886 osd/OSD.cc: 4066: FAILED assert(child) Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 11:44:24 -08:00
Sage Weil	ee4d99099f	journaler: log on unexpected objecter error This will help with #2110, #1796, #1640. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-27 11:39:53 -08:00
Sage Weil	91b119a064	osd: fix recursive map_lock via check_replay_queue() Also drop activate_pg() helper while we're at it, so it's clear that we are the only user. recursive lock of OSD::map_lock (33) ceph version 0.42-146-g7ad35ce (commit:7ad35ce489cc5f9169eb838e1196fa2ca4d6e985) 2012-02-24 12:30:16.541416 1: (PG::lock(bool)+0x2a) [0xa09348] 2012-02-24 12:30:16.541424 2: (OSD::_lookup_lock_pg(pg_t)+0xbd) [0x84b8df] 2012-02-24 12:30:16.541431 3: (OSD::activate_pg(pg_t, utime_t)+0x9f) [0x87463b] 2012-02-24 12:30:16.541442 4: (OSD::check_replay_queue()+0x12f) [0x87452d] 2012-02-24 12:30:16.541450 5: (OSD::tick()+0x23c) [0x8535ea] 2012-02-24 12:30:16.541456 6: (OSD::C_Tick::finish(int)+0x1f) [0x881671] 2012-02-24 12:30:16.541462 7: (SafeTimer::timer_thread()+0x2d5) [0x8f8211] 2012-02-24 12:30:16.541468 8: (SafeTimerThread::entry()+0x1c) [0x8f923c] 2012-02-24 12:30:16.541475 9: (Thread::_entry_func(void)+0x23) [0x9c8109] 2012-02-24 12:30:16.541485 10: (()+0x68ba) [0x7f9dbed838ba] 2012-02-24 12:30:16.541491 11: (clone()+0x6d) [0x7f9dbd66f02d] 2012-02-24 12:30:16.541495 common/lockdep.cc: In function 'int lockdep_will_lock(const char, int)' thread 7f9db9d98700 time 2012-02-24 12:30:16.541504 Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Sam Just <samuel.just@dreamhost.com>	2012-02-27 09:56:21 -08:00
Sage Weil	402ece5e31	init-ceph: stick with /var/run for the time being /run isn't present on older systems. Stick with the old location until it is more pervasive, or we add an autoconf option to control it. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2012-02-26 20:56:05 -08:00
Laszlo Boszormenyi	41295b584a	debian: /var/run/ceph -> /run/ceph /run/ceph should exists for creating UNIX domain sockets ceph uses UNIX domain sockets for internal communication. Create their directory on startup as /run is on a virtual filesystem. Last-Update: <2012-02-26> Bug-Debian: http://bugs.debian.org/660238 Forwarded: <ceph-devel@vger.kernel.org> Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>	2012-02-26 20:47:53 -08:00
Laszlo Boszormenyi	0d8b5756e1	debian: build-{indep,arch} Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>	2012-02-26 20:45:52 -08:00
Laszlo Boszormenyi	3ad6ccb4a6	debian: sdparm\|hdparm, new standards version Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>	2012-02-26 20:45:06 -08:00
Yehuda Sadeh	266902a993	rgw: initialize bucket_id in bucket structure might make valgrind a little bit less noisy. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-02-24 17:01:32 -08:00
Sage Weil	f8f6e4d850	rgw: _exit(0) on SIGTERM We need to do something a bit smarter to get coverage information, but this is a start. Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-24 15:32:07 -08:00
Sage Weil	708be0a5ab	Merge remote branch 'gh/wip-crush-adjust' Reviewed-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>	2012-02-24 13:52:32 -08:00

1 2 3 4 5 ...

18403 Commits