Commit Graph

18411 Commits

Author SHA1 Message Date
Sage Weil
70360f840e github.com/NewDreamNetwork -> github.com/ceph 2012-03-02 11:00:08 -08:00
Sage Weil
cacf0fdec8 filestore: fix rollback safety check
There is a window in the old check between when current/commit_op_seq is
written and the snapshot is taken.  If ceph-osd crashes, we'll be unable to
start because we'll believe current/ was in use without proper checkpoints.

Instead, make the snapped/not snapped state of current/ explicit.

Fixes: #2118
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-03-02 09:50:11 -08:00
Sage Weil
098cd92140 Merge remote branch 'gh/wip_fs_omap'
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
2012-03-02 09:35:11 -08:00
Josh Durgin
3a83517256 RadosModel: separate initialization and construction
Several error codes needed to be checked.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 17:18:31 -08:00
Josh Durgin
2b176fbe2b Merge branch 'next' 2012-03-01 17:17:38 -08:00
Josh Durgin
cd31388578 librados: only shutdown objecter after it's initialized
The objecter is only initialized once the RadosClient state is
CONNECTED from the perspective of a RadosClient::shutdown()
caller. Error paths in RadosClient::connect() may call shutdown while
still in the CONNECTING state.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 17:16:29 -08:00
Samuel Just
2c275efb72 Makefile: add headers for distcheck
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
feaf44e764 ReplicatedPG: Add omap to recovery
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
9331e63359 MOSDSubOp: Add entry for omap recovery
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
6a624b960a test: Add KeyValueDB atomicity checker
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
82199d5d31 os/: DBObjectMap and KeyValueDB interface with tests
DBObjectMap is an implementation of ObjectMap in terms of KeyValueDB.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
2ab6f023ea ObjectStore.h: Initial ObjectStore omap interfaces
ObjectMap.h defines the interface which will be implemented by
leveldb.  store_test now tests basic omap operations.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
e9dd01f502 os/CollectionIndex: Add debugging constructor and Path::coll()
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
d9b130faf0 Added LevelDBStore
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
58a3b7f75a Added leveldb submodule
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-01 10:11:42 -08:00
Samuel Just
cddcc2d269 Makefile: make check-local relative to $(srcdir)
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-29 20:46:05 -08:00
Sage Weil
749281eda7 Makefile: add json_spirit headers to tarball
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 16:21:15 -08:00
Yehuda Sadeh
85d04c6ceb rgw: don't check for ECANCELED in the _impl() functions
We already check it in the outer functions.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-29 14:46:33 -08:00
Yehuda Sadeh
86340655ff rgw: don't retry certain operations if we raced
The atomic get/put scheme was retrying writes in case where it lost
races (head object was rewritten by another client). Instead we can
just back off and return success.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-29 14:46:33 -08:00
Sage Weil
b1f264406f msgr: fix race in learned_addr()
- two connect() threads
- both hit if (need_addr) check
- one takes lock, sets addr, need_addr = false, unlocks
- continues to ::encode(ms_addr, ...);
- meanwhile, second thread set ms_addr _again_, but copies peer port into
  place before adjusting it.  racing ::encode() sees bad port and sends it
  to the peer.

Fix this two ways:

- don't copy bad port into place; set it first
- re-check need_addr after taking lock

Fixes: #1747
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-29 13:22:34 -08:00
Sage Weil
8a2b76411e msgr: print existing->state before failing assert
May help with #1378.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 12:28:19 -08:00
Sage Weil
cbb128090c Merge remote-tracking branch 'gh/wip-2121'
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-29 11:07:03 -08:00
Sage Weil
052d64e1c4 osd: unregister signal handlers on shutdown
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:46:13 -08:00
Sage Weil
db96831bbd mon: unregister signal handlers on shutdown
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:46:06 -08:00
Sage Weil
8e9bf6111e mds: unregister SIGHUP too
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:45:56 -08:00
Sage Weil
bb5c76400c radosgw: handle SIGHUP
Fixes: #2121
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-29 09:45:46 -08:00
Sage Weil
9c7b63e122 init-radosgw: add 'reload' command to send SIGHUP
Fixes: #2121
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 09:23:22 -08:00
Sage Weil
e843766504 osd: fix typo is recovery_state query dump
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 09:21:22 -08:00
Sage Weil
0e03e9dd8d osd: add missing space to scrub error
[ERR] 18.5 osd.3: soid 8a5e37ad/rb.0.0.000000002b99/headextra attr _, extra attr snapset

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-29 09:17:07 -08:00
Greg Farnum
2437ce02c1 msgr: discard the local_pipe's queue on shutdown.
To facilitate this, we do two things:
1) actually identify the number of special code values we pass around
2) use that to prevent trying to put() those non-pointer values in
Pipe::discard_queue().
Then we just call local_pipe.discard_queue() in wait() like happens
(indirectly, via reaping) with all the normal Pipes in rank_pipe.

But this does make me think that we may be approaching the point
where it's appropriate to create a subclass LocalPipe (against a
RemotePipe like our current Pipe implementation is mostly intended
to be).

Should fix #2086.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-29 09:12:46 -08:00
Sage Weil
7690f0b959 osd: remove down OSDs from peer_info on reset
If an OSD goes down, remove it from peer_info. In particular, I saw

2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap
2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering]  PriorSet: affected_by_map osd.1 now down
...
2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw)
2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior  prior osd.1 is down
2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior final: probe 3,5 down 1 blocked_by {}
...
2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)

Any time an osd goes down we want to ensure we remove it from peer_info.
Handling this in Reset and Started states captures all of the nested
states, which forward the event (or re-post transit to Reset).  We can
also drop the Primary reaction, which is now superfluous.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-29 09:10:57 -08:00
Sage Weil
fe94c0414e Merge branch 'next' 2012-02-28 17:04:55 -08:00
Josh Durgin
b9a675a293 mon: report pgs stuck inactive/unclean/stale in health check
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-28 13:53:15 -08:00
Greg Farnum
d10e1f46df mon: fix slurp_latest to fill in any missing incrementals
Fixes #1789.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-28 12:29:05 -08:00
Sage Weil
7b48cca184 test_osd_types: fix unit test for new pg_t::is_split() prototype
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:33:18 -08:00
Sage Weil
fd0712dfb4 Makefile: drop separate libjson_spirit.la
automake seems to have difficulty with the .la dependency on another .la.
Since libjson_spirit.la is only used by libcommon.la anyway, just build it
directly into that.  Sigh.

...
CXXLD libjson_spirit.la
AR libmds.a
CXXLD libcls_rbd.la
CXXLD libcls_rgw.la
CXXLD cephfs
CCLD test_ioctls
CC libcommon_la-ceph_ver.lo
CXX libcommon_la-version.lo
CXX ceph_dencoder.o
CCLD mount.ceph
CC ceph_ver.o
CXX test_libhadoopcephfs_build-version.o
CXXLD test_libhadoopcephfs_build
CXXLD libcommon.la
libtool: link: cannot find the library `libjson_spirit.la' or unhandled argument `libjson_spirit.la'

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:30:47 -08:00
Sage Weil
edd35c04b4 osd: drop useless ENOMEM check
new throws exception; doesn't return NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:26:04 -08:00
Sage Weil
a7de459f69 ceph-osd: clarify error messages
So we know where the error came from.  And use real error codes in init().

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-28 09:11:59 -08:00
Wido den Hollander
97926e1846 init: Actually do start the daemons when 'service ceph start <type>' is specified
A bug in my previous patch prevented any daemon with auto_start set to false from starting.

This patch allows:
* /etc/init.d/ceph start osd|mds|mon
* service ceph start osd|mds|mon

It however does not start daemons if auto_start is disabled when you invoke:
* /etc/init.d/ceph start
* service ceph start

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2012-02-28 09:10:52 -08:00
Sage Weil
f317028f42 doc: beginnings of documentation of stuck pgs and pg states
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-27 15:41:57 -08:00
Sage Weil
1917024134 filestore: make less noise on ENOENT
Don't generate high-level log spam on every open error.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-27 15:13:35 -08:00
Greg Farnum
244b702966 pg: use get_cluster_inst instead of get_inst in activate
This was mistakenly broken in 4b3bb5ab37

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sam Just <sam.just@dreamhost.com>
2012-02-27 14:49:34 -08:00
Sage Weil
f02195b40f Merge branch 'wip-split2'
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-27 14:37:41 -08:00
Sage Weil
b6a04174bc osd: pg_t::is_split(): make children out param a pointer, and optional
Also unit test it.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:35:21 -08:00
Sage Weil
85ed06e973 osd: bypass split code
Until it is fully implemented.  It's also disabled in the monitor
currently, but just in case it gets into the OSDMap, do nothing for now.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:18:21 -08:00
Sage Weil
15d5324904 osd: fix pg locking flags
Two things we need to handle:

 - callers who already hold map_lock (split_pg())
 - callers who already hold another pg->lock, and want to skip the lockdep
   check for this one.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:16:27 -08:00
Sage Weil
fc7b11a9ee osd: partially refactor pg split
This partially refactors the OSD split code to do the split synchronously
when processing a new OSDMap.  It is incomplete in that it does not yet
do anything useful for the PG.  The full solution needs to:

- Do the split synchronously when applying the map update.
- Reset the parent pg so that it repeers.  This will cause problems until
  we consistently consider this a new interval when looking backwards in
  time; this needs to be fixed.  Anybody doing generate_past_intervals()
  or similar will need to consider a split/merge event as an interval
  boundary.
- The recovery state machine should trigger appropriately when this
  happens.
- The old PG that was split should probably be handle identically to the
  new children.  That means deleting the old PG instance and creating a new
  PG object for the newly-split child.  Ditto for merge.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 14:04:22 -08:00
Sage Weil
d9cf33223e osd: implement pg_t::is_split()
Test to determine if a pg has split between two pool sizes, and if so,
what its children are.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 11:44:24 -08:00
Sage Weil
6a081888fd osd: factor hobject key into child pgid calc during split
When we calculate the object's new pg, take the locator key into
consideration, to avoid a crash like

osd/OSD.cc: In function 'void OSD::split_pg(PG*, std::map<pg_t, PG*>&,ObjectStore::Transaction&)' thread 7fe3df8c4700 time 2012-02-20 18:22:19.900886
osd/OSD.cc: 4066: FAILED assert(child)

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 11:44:24 -08:00
Sage Weil
ee4d99099f journaler: log on unexpected objecter error
This will help with #2110, #1796, #1640.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-27 11:39:53 -08:00