Ceph is a distributed object, block, and file storage platform
Go to file
Jim Schutt a5297388a7 msgr: Correctly handle half-open connections.
If poll() says a socket is ready for reading, but zero bytes
are read, that means that the peer has sent a FIN.  Handle that.

One way the incorrect handling was manifesting is as follows:

Under a heavy write load, clients log many messages like this:

[19021.523192] libceph:  tid 876 timed out on osd6, will reset osd
[19021.523328] libceph:  tid 866 timed out on osd10, will reset osd
[19081.616032] libceph:  tid 841 timed out on osd0, will reset osd
[19081.616121] libceph:  tid 826 timed out on osd2, will reset osd
[19081.616176] libceph:  tid 806 timed out on osd3, will reset osd
[19081.616226] libceph:  tid 875 timed out on osd9, will reset osd
[19081.616275] libceph:  tid 834 timed out on osd12, will reset osd
[19081.616326] libceph:  tid 874 timed out on osd10, will reset osd

After the clients are done writing and the file system should
be quiet, osd hosts have a high load with many active threads:

$ ps u -C cosd
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1383  162 11.5 1456248 943224 ?      Ssl  11:31 406:59 /usr/bin/cosd -i 7 -c /etc/ceph/ceph.conf

$ for p in `ps -C cosd -o pid --no-headers`; do grep -nH State /proc/$p/task/*/status | grep -v sleep; done
/proc/1383/task/10702/status:2:State:   R (running)
/proc/1383/task/10710/status:2:State:   R (running)
/proc/1383/task/10717/status:2:State:   R (running)
/proc/1383/task/11396/status:2:State:   R (running)
/proc/1383/task/27111/status:2:State:   R (running)
/proc/1383/task/27117/status:2:State:   R (running)
/proc/1383/task/27162/status:2:State:   R (running)
/proc/1383/task/27694/status:2:State:   R (running)
/proc/1383/task/27704/status:2:State:   R (running)
/proc/1383/task/27728/status:2:State:   R (running)

With this fix applied, a heavy load still causes many client
resets of osds, but no runaway threads result.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-03 09:10:58 -08:00
debian make: create /etc/ceph if it doesn't exist 2010-12-02 17:35:32 -08:00
fusetrace
man man: add man page for cephfs 2010-12-02 16:18:38 -08:00
qa mds: fix null_snapflush with multiple intervening snaps 2010-11-10 20:58:49 -08:00
src msgr: Correctly handle half-open connections. 2010-12-03 09:10:58 -08:00
web
wireshark
.gitignore
AUTHORS
autogen.sh autogen.sh: check for pkg-config 2010-11-05 11:34:11 -07:00
builddebs.sh debian: update scripts to do packaging fixes 2010-10-18 10:19:28 -07:00
ceph.spec.in man: add man page for cephfs 2010-12-02 16:18:38 -08:00
ChangeLog
configure.ac ceph v0.23.2 2010-12-01 10:03:26 -08:00
COPYING
INSTALL
Makefile.am
NEWS
publish.sh debian: sign/publish specific deb version 2010-10-18 13:29:21 -07:00
pull.sh
push.sh
README
RELEASE_CHECKLIST v0.22 2010-10-15 15:34:44 -07:00
release.sh debian: 0.22-4 2010-10-21 17:31:58 -07:00
sign.sh debian: sign/publish specific deb version 2010-10-18 13:29:21 -07:00
SubmittingPatches SubmittingPatches: initial version 2010-10-28 14:55:09 -07:00
update_pbuilder.sh

Ceph - a scalable distributed file system
-----------------------------------------

Please see http://ceph.newdream.net/ for current info.

----

To build the server daemons, and FUSE client,

$ ./autogen.sh
$ ./configure

$ make
 or
$ cd src
$ make

(Note that the FUSE client will only be built if libfuse is present.)

----

A quick summary of binaries that will be built in src/

daemons:
 cmon -- monitor daemon.  handles cluster state and configuration
         information.
 cosd -- storage daemon.  stores objects on a given block device.
 cmds -- metadata daemon.  handles file system namespace.
 ceph -- send management commands to the monitor cluster.

userland clients:
 cfuse -- fuse client.
 csyn -- synthetic workload generator client.

tools:
 monmaptool -- create/edit mon map
 osdmaptool -- create/edit osd map 
 crushtool -- create/edit crush map

scripts:
 mkcephfs -- cluster mkfs tool
 init-ceph -- init.d start/stop script