Ceph is a distributed object, block, and file storage platform
Go to file
Sage Weil 7690f0b959 osd: remove down OSDs from peer_info on reset
If an OSD goes down, remove it from peer_info. In particular, I saw

2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap
2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering]  PriorSet: affected_by_map osd.1 now down
...
2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw)
2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior  prior osd.1 is down
2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior final: probe 3,5 down 1 blocked_by {}
...
2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)

Any time an osd goes down we want to ensure we remove it from peer_info.
Handling this in Reset and Started states captures all of the nested
states, which forward the event (or re-post transit to Reset).  We can
also drop the Primary reaction, which is now superfluous.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-02-29 09:10:57 -08:00
admin doc: Switch doxygen integration from breathe to asphyxiate. 2012-01-09 19:03:56 -08:00
ceph-object-corpus@b2639b83db test/encoding/readable.sh: skip old version with known incompatibilities 2012-02-13 14:08:25 -08:00
debian debian: build-{indep,arch} 2012-02-26 20:45:52 -08:00
doc doc: beginnings of documentation of stuck pgs and pg states 2012-02-27 15:41:57 -08:00
fusetrace
keys doc: Write about deb installation. 2011-09-02 12:34:05 -07:00
m4 Makefile: Add recent acx_pthread.m4 that has a fix for nostdlib issue. 2012-01-12 09:17:06 -08:00
man ceph-dencoder: man page 2012-02-23 18:48:57 -08:00
qa qa/btrfs/test_rmdir_async_snap 2012-02-20 10:56:42 -08:00
src osd: remove down OSDs from peer_info on reset 2012-02-29 09:10:57 -08:00
udev udev: drop device number from name 2011-12-08 16:36:47 -08:00
wireshark
.gitignore .gitignore: src/ocf/ceph 2011-12-30 09:17:06 -08:00
.gitmodules add ceph-object-corpus.git submodule 2012-02-08 13:17:22 -08:00
AUTHORS
autogen.sh
ceph.spec.in ceph.spec.in: add ceph-dencoder 2012-02-23 18:48:57 -08:00
ChangeLog
CodingStyle CodingStyle: whitespace 2011-07-14 10:50:08 -07:00
configure.ac v0.42.2 2012-02-24 13:00:39 -08:00
COPYING add libjson_spirit.la 2012-02-24 11:24:44 -08:00
COPYING-LGPL2.1 COPYING: note licenses for all files, not just the default 2012-01-12 10:03:27 -08:00
do_autogen.sh do_autogen.sh: -T for --without-tcmalloc 2012-02-24 11:15:04 -08:00
Doxyfile doxygen: Use first sentence as brief description. 2012-01-09 19:03:56 -08:00
INSTALL
Makefile.am Makefile: include run-cli-tests-maybe-unset-ccache in dist tarball. 2011-09-23 15:55:01 -07:00
NEWS
README c* -> ceph-* 2011-09-22 15:08:25 -07:00
RELEASE_CHECKLIST
SubmittingPatches

Ceph - a scalable distributed storage system
-----------------------------------------

Please see http://ceph.newdream.net/ for current info.

----

To build the server daemons, and FUSE client,

$ ./autogen.sh
$ ./configure
$ make

(Note that the FUSE client will only be built if libfuse is present.)

----

A quick summary of binaries that will be built in src/

daemons:
 ceph-mon -- monitor daemon.  handles cluster state and configuration
         information.
 ceph-osd -- storage daemon.  stores objects on a given block device.
 ceph-mds -- metadata daemon.  handles file system namespace.
 ceph-fuse -- fuse client.

tools:
 ceph -- send management commands to the monitor cluster.
 rados -- interact with the object store
 rbd -- manipulate rados block device images
 monmaptool -- create/edit mon map
 osdmaptool -- create/edit osd map 
 crushtool -- create/edit crush map

scripts:
 mkcephfs -- cluster mkfs tool
 init-ceph -- init.d start/stop script