ceph/web/source.body


<div class="mainsegment">
	<h3>Getting Started</h3>
	<div>
		The Ceph source code is managed with Git.  For a Git crash course, there is a <a href="http://www.kernel.org/pub/software/scm/git/docs/tutorial.html">tutorial</a> and more from the <a href="http://git.or.cz/#documentation">official Git site</a>.  Here is a quick <a href="http://git.or.cz/course/svn.html">crash course for Subversion users</a>.

		<p>The Ceph project is always looking for more participants. If you are interested in using Ceph, or contributing to its development, please <a href="http://lists.sourceforge.net/mailman/listinfo/ceph-devel">join our mailing list</a> and <a href="mailto:ceph-devel@lists.sourceforge.net">drop us a line</a>.

		<h4>Checking out the source</h4>
		<div>
			You can check out a working copy (actually, clone the repository) with
<pre>
git clone git://ceph.newdream.net/ceph.git
</pre>
or
<pre>
git clone http://ceph.newdream.net/git/ceph.git
</pre>
			To pull the latest,
<pre>
git pull
</pre>
			You can browse the git repo at <a href="http://ceph.newdream.net/git">http://ceph.newdream.net/git</a>.
		</div>

		<h4>Build Targets</h4>
		<div>
			There are a range of binary targets, mostly for ease of development and testing:
			<ul>
			<li><b>cmon</b> -- monitor</li>
			<li><b>cosd</b> -- OSD storage daemon</li>
			<li><b>cmds</b> -- MDS metadata server</li>
			<li><b>cfuse</b> -- client, mountable via FUSE</li>
			<li><b>csyn</b> -- client sythetic workload generator</li>
			<li><b>cmonctl</b> -- control tool</li>
			<p>
			<li><b>fakesyn</b> -- places all logical elements (MDS, client, etc.) in a single binary, with synchronous message delivery (for easy debugging!).  Includes synthetic workload generation.</li>
			<li><b>fakefuse</b> -- same as fakesyn, but mounts a single client via FUSE.</li>
			</ul>
		</div>

		<h4>Runtime Environment</h4>
		<div>
			Few quick steps to get things started.  Note that these instructions assume either that you are running on one node, or have a shared directory (e.g. over NFS) mounted on each node.

			<ol>
			<li>Checkout, change into the <tt>ceph/src</tt> directory, and build.  E.g.,
<pre>
git clone git://ceph.newdream.net/ceph.git
cd ceph
./autogen.sh
./configure   # of CXXFLAGS="-g" ./configure to disable optimizations (for debugging)
cd src
make
</pre>

			<li>Create a <tt>log/</tt> dir for various runtime stats.
<pre>
mkdir log
</pre>
			<li>Identify the EBOFS block devices. This is accomplished with symlinks (or actual files) in the <tt>dev/</tt> directory.  Devices can be identified by symlinks named after the hostname (e.g. <tt>osd.googoo-1</tt>), logical OSD number (e.g. <tt>osd4</tt>), or simply <tt>osd.all</tt> (in that order of preference).  For example,
<pre>
mkdir dev
ln -s /dev/sda3 dev/osd.all   # all nodes use /dev/sda3
ln -s /dev/sda4 dev/osd0      # except osd0, which should use /dev/sd4
</pre>
				That is, when an osd starts up, it first looks for <tt>dev/osd$n</tt>, then <tt>dev/osd.all</tt>, in that order.

				These need not be "real" devices--they can be regular files too.  To get going with fakesyn, for example, or to test a whole "cluster" running on the same node,
<pre>
# create small "disks" for osd0-osd3
for f in 0 1 2 3; do                                 # default is 4 OSDs
dd if=/dev/zero of=dev/osd$f bs=1048576 count=1024   # 1 GB each
done
</pre>
				Note that if your home/working directory is mounted via NFS or similar, you'll want to symlink <tt>dev/</tt> to a directory on a local disk.
		</div>


		<h4>Starting up a full "cluster" on a single host</h4>
		<div>
			You can start up a the full cluster of daemons on a single host.  Assuming you've created a set of individual files for each OSD's block device (the second option of #3 above), there is a <tt>start.sh</tt> and <tt>stop.sh</tt> script that will start up on port 12345.
<p>
One caveat here is that the ceph daemons need to know what IP they are reachable at; they determine that by doing a lookup on the machine's hostname.  Since many/most systems map the hostname to 127.0.0.1 in <tt>/etc/hosts</tt>, you either need to change that (the easiest approach, usually) or add a <tt>--bind 1.2.3.4</tt> argument to cmon/cosd/cmds to help them out.
<p>
Note that the monitor has the only fixed and static ip:port in the system.  The rest of the cluster daemons bind to a random port and register themselves with the monitor.
		</div>

		<h4>Mounting with FUSE</h4>
		<div>
			The easiest route is <tt>fakefuse</tt>:
<pre>
modprobe fuse  # make sure fuse module is loaded
mkdir mnt      # or whereever you want your mount point
make fakefuse && ./fakefuse --mkfs --debug_ms 1 mnt
</pre>
			You should be able to ls, copy files, or whatever else (in another terminal; fakefuse will stay in the foreground).  Control-C will kill fuse and cause an orderly shutdown.  Alternatively, <tt>fusermount -u mnt</tt> will unmount.  If fakefuse crashes or hangs, you may need to <tt>kill -9 fakefuse</tt> and/or <tt>fusermount -u mnt</tt> to clean up.  Overall, FUSE is pretty well-behaved.

If you have the cluster daemon's already running (as above), you can mount via the standalone fuse client:
<pre>
modprobe fuse
mkdir mnt
make cfuse && ./cfuse mnt
</pre>
		</div>

		<h4>Running the kernel client in a UML instance</h4>
		<div>
			Any recent mainline kernel will do here.
<pre>
$ cd linux
$ patch -p1 < ~/ceph/src/kernel/kconfig.patch
patching file fs/Kconfig
patching file fs/Makefile
$ cp ~/ceph/src/kernel/sample.uml.config .config
$ ln -s ~/ceph/src/kernel fs/ceph
$ ln -s ~/ceph/src/include/ceph_fs.h include/linux
$ make ARCH=um
</pre>
			I am using <a href="http://uml.nagafix.co.uk/Debian-3.1/Debian-3.1-AMD64-root_fs.bz2">this x86_64 Debian UML root fs image</a>, but any image will do (see <a href="http://user-mode-linux.sf.net">http://user-mode-linux.sf.net</a>) as long as the architecture (e.g. x86_64 vs i386) matches your host.  Start up the UML guest instance with something like
<pre>
./linux ubda=Debian-3.1-AMD64-root_fs mem=256M eth0=tuntap,,,1.2.3.4  # 1.2.3.4 is the _host_ ip
</pre>
			Note that if UML crashes/oopses/whatever, you can restart quick-and-dirty (up arrow + enter) with
<pre>
reset ; killall -9 linux ; ./linux ubda=Debian-3.1-AMD64-root_fs mem=256M eth0=tuntap,,,1.2.3.4
</pre>
			You'll need to configure the network in UML with an unused IP.  For my debian-based root fs image, this <tt>/etc/network/interfaces</tt> file does the trick:
<pre>
iface eth0 inet static
        address 1.2.3.5     # unused ip in your host's netowrk for the uml guest
        netmask 255.0.0.0
        gateway 1.2.3.4     # host ip
auto eth0
</pre>
			Note that you need install uml-utilities (<tt>apt-get install uml-utilities</tt> on debian distros) and add yourself to the <tt>uml-net</tt> group on the host (or run the UML instance as root) for the network to start up properly.
			<p>
			Inside UML, you'll want an <tt>/etc/fstab</tt> line like
<pre>
none            /host           hostfs  defaults        0       0
</pre>
			You can then load the kernel client module and mount from the UML instance with
<pre>
insmod /host/path/to/ceph/src/kernel/ceph.ko
mount -t ceph 1.2.3.4:/ mnt  # 1.2.3.4 is host
</pre>

		</div>

		<h4>Running fakesyn -- everything one process</h4>
		<div>
			A quick example, assuming you've set up "fake" EBOFS devices as above:
<pre>
make fakesyn && ./fakesyn --mkfs --debug_ms 1 --debug_client 3 --syn rw 1 100000
# where those options mean:
#	--mkfs               # start with a fresh file system
#	--debug_ms 1         # show message delivery
#	--debug_client 3     # show limited client stuff
#	--syn rw 1 100000    # write 1MB to a file in 100,000 byte chunks, then read it back
</pre>
			One the synthetic workload finishes, the synthetic client unmounts, and the whole system shuts down.

			The full set of command line arguments can be found in <tt>config.cc</tt>.
		</div>

	</div>
</div>