mirror of
https://github.com/ceph/ceph
synced 2024-12-17 00:46:05 +00:00
Ceph is a distributed object, block, and file storage platform
7690f0b959
If an OSD goes down, remove it from peer_info. In particular, I saw 2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap 2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] PriorSet: affected_by_map osd.1 now down ... 2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw) 2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] PriorSet: build_prior prior osd.1 is down 2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] PriorSet: build_prior final: probe 3,5 down 1 blocked_by {} ... 2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog 2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598) 2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598) 2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) 2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598) Any time an osd goes down we want to ensure we remove it from peer_info. Handling this in Reset and Started states captures all of the nested states, which forward the event (or re-post transit to Reset). We can also drop the Primary reaction, which is now superfluous. Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Samuel Just <samuel.just@dreamhost.com> Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com> |
||
---|---|---|
admin | ||
ceph-object-corpus@b2639b83db | ||
debian | ||
doc | ||
fusetrace | ||
keys | ||
m4 | ||
man | ||
qa | ||
src | ||
udev | ||
wireshark | ||
.gitignore | ||
.gitmodules | ||
AUTHORS | ||
autogen.sh | ||
ceph.spec.in | ||
ChangeLog | ||
CodingStyle | ||
configure.ac | ||
COPYING | ||
COPYING-LGPL2.1 | ||
do_autogen.sh | ||
Doxyfile | ||
INSTALL | ||
Makefile.am | ||
NEWS | ||
README | ||
RELEASE_CHECKLIST | ||
SubmittingPatches |
Ceph - a scalable distributed storage system ----------------------------------------- Please see http://ceph.newdream.net/ for current info. ---- To build the server daemons, and FUSE client, $ ./autogen.sh $ ./configure $ make (Note that the FUSE client will only be built if libfuse is present.) ---- A quick summary of binaries that will be built in src/ daemons: ceph-mon -- monitor daemon. handles cluster state and configuration information. ceph-osd -- storage daemon. stores objects on a given block device. ceph-mds -- metadata daemon. handles file system namespace. ceph-fuse -- fuse client. tools: ceph -- send management commands to the monitor cluster. rados -- interact with the object store rbd -- manipulate rados block device images monmaptool -- create/edit mon map osdmaptool -- create/edit osd map crushtool -- create/edit crush map scripts: mkcephfs -- cluster mkfs tool init-ceph -- init.d start/stop script