ceph/web/source.body
2007-12-19 13:28:33 -08:00

217 lines
11 KiB
Plaintext

<div class="mainsegment">
<h3>Getting Started</h3>
<div>
The Ceph source code is managed with Subversion. For information on accessing the repository, please refer to the <a href="http://sourceforge.net/docs/E09">SourceForge's Subversion documentation</a>.
<p>The Ceph project is always looking for more participants. If you are interested in using Ceph, or contributing to its development, please <a href="http://lists.sourceforge.net/mailman/listinfo/ceph-devel">join our mailing list</a> and <a href="mailto:ceph-devel@lists.sourceforge.net">drop us a line</a>.
<h4>Checking out</h4>
<div>
You can check out a working copy with
<pre>
svn co https://ceph.svn.sourceforge.net/svnroot/ceph/trunk/ceph
</pre>
</div>
<h4>Build Targets</h4>
<div>
There are a range of binary targets, mostly for ease of development and testing:
<ul>
<li><b>fakesyn</b> -- places all logical elements (MDS, client, etc.) in a single binary, with synchronous message delivery (for easy debugging!). Includes synthetic workload generation.</li>
<li><b>fakefuse</b> -- same as fakesyn, but mounts a single client via FUSE.</li>
<li><b>newsyn</b> -- starts up all logical elements using MPI. As with fakesyn, it includes synthetic workload generation.</li>
<li><b>cosd</b> -- standalone OSD</li>
<li><b>cmon</b> -- standalone monitor</li>
<li><b>cmds</b> -- standalone MDS</li>
<li><b>cfuse</b> -- standalone client, mountable via FUSE</li>
</ul>
For most development, fakesyn, fakefuse, and newsyn are sufficient.
</div>
<h4>Runtime Environment</h4>
<div>
Few quick steps to get things started. Note that these instructions assume either that you are running on one node, or have a shared directory (e.g. over NFS) mounted on each node.
<ol>
<li>Checkout, change into the <tt>ceph/</tt> directory, and build. E.g.,
<pre>
svn co https://ceph.svn.sourceforge.net/svnroot/ceph
cd ~/ceph/trunk/ceph
make mpi=no fuse=no
</pre>
(You can omit the mpi=no or fuse=no if you happen to have those installed.)
<li>Create a <tt>log/</tt> dir for various runtime stats.
<pre>
mkdir log
</pre>
<li>Identify the EBOFS block devices. This is accomplished with symlinks (or actual files) in the <tt>dev/</tt> directory. Devices can be identified by symlinks named after the hostname (e.g. <tt>osd.googoo-1</tt>), logical OSD number (e.g. <tt>osd4</tt>), or simply <tt>osd.all</tt> (in that order of preference). For example,
<pre>
mkdir dev
ln -s /dev/sda3 dev/osd.all # all nodes use /dev/sda3
ln -s /dev/sda4 dev/osd0 # except osd0, which should use /dev/sd4
</pre>
That is, when an osd starts up, it first looks for <tt>dev/osd$n</tt>, then <tt>dev/osd.all</tt>, in that order.
These need not be "real" devices--they can be regular files too. To get going with fakesyn, for example, or to test a whole "cluster" running on the same node,
<pre>
# create small "disks" for osd0-osd3
for f in 0 1 2 3; do # default is 4 OSDs
dd if=/dev/zero of=dev/osd$f bs=1048576 count=1024 # 1 GB each
done
</pre>
Note that if your home/working directory is mounted via NFS or similar, you'll want to symlink <tt>dev/</tt> to a directory on a local disk.
</div>
<h4>Running fakesyn -- everything one process</h4>
<div>
A quick example, assuming you've set up "fake" EBOFS devices as above:
<pre>
make fakesyn && ./fakesyn --mkfs --debug_ms 1 --debug_client 3 --syn rw 1 100000
# where those options mean:
# --mkfs # start with a fresh file system
# --debug_ms 1 # show message delivery
# --debug_client 3 # show limited client stuff
# --syn rw 1 100000 # write 1MB to a file in 100,000 byte chunks, then read it back
</pre>
One the synthetic workload finishes, the synthetic client unmounts, and the whole system shuts down.
The full set of command line arguments can be found in <tt>config.cc</tt>.
</div>
<h4>Starting up a full "cluster" on a single host</h4>
<div>
You can start up a the full cluster of daemons on a single host. Assuming you've created a set of individual files for each OSD's block device (the second option of #3 above), you can create a <tt>stop.sh</tt> script like
<pre>
#!/bin/sh
killall cosd cmds cmon
</pre>
and a <tt>start.sh</tt> script like
<pre>
#!/bin/sh
./stop.sh
./mkmonmap 1.2.3.4:12345 # your IP here; any unused port will do
./cmon --mkfs --mon 0 &
./cosd --mkfs --osd 0 &
./cosd --mkfs --osd 1 &
./cosd --mkfs --osd 2 &
./cosd --mkfs --osd 3 &
./cmds &
</pre>
Note that the IP you specify is for the monitor. This is the only fixed and static ip:port in the system. The rest of the cluster daemons bind to a random port and register themselves with the monitor.
</div>
<h4>Mounting with FUSE</h4>
<div>
The easiest route is <tt>fakefuse</tt>:
<pre>
modprobe fuse # make sure fuse module is loaded
mkdir mnt # or whereever you want your mount point
make fakefuse && ./fakefuse --mkfs --debug_ms 1 mnt
</pre>
You should be able to ls, copy files, or whatever else (in another terminal; fakefuse will stay in the foreground). Control-C will kill fuse and cause an orderly shutdown. Alternatively, <tt>fusermount -u mnt</tt> will unmount. If fakefuse crashes or hangs, you may need to <tt>kill -9 fakefuse</tt> and/or <tt>fusermount -u mnt</tt> to clean up. Overall, FUSE is pretty well-behaved.
If you have the cluster daemon's already running (as above), you can mount via the standalone fuse client:
<pre>
modprobe fuse
mkdir mnt
make cfuse && ./cfuse mnt
</pre>
</div>
<h4>Running the kernel client in a UML instance</h4>
<div>
Any recent mainline kernel will do here.
<pre>
$ cd linux
$ patch -p1 < ~/ceph/trunk/ceph/kernel/kconfig.patch
patching file fs/Kconfig
patching file fs/Makefile
$ cp ~/ceph/trunk/ceph/kernel/sample.uml.config .config
$ ln -s ~/ceph/trunk/ceph/kernel fs/ceph
$ ln -s ~/ceph/trunk/ceph/include/ceph_fs.h include/linux
$ make ARCH=um
</pre>
I am using <a href="http://uml.nagafix.co.uk/Debian-3.1/Debian-3.1-AMD64-root_fs.bz2">this x86_64 Debian UML root fs image</a>, but any image will do (see <a href="http://user-mode-linux.sf.net">http://user-mode-linux.sf.net</a>) as long as the architecture (e.g. x86_64 vs i386) matches your host. Start up the UML instance with something like
<pre>
./linux ubda=Debian-3.1-AMD64-root_fs mem=256M eth0=tuntap,,,1.2.3.4 # 1.2.3.4 is the _host_ ip
</pre>
Note that if UML crashes/oopses/whatever, you can restart quick-and-dirty (up arrow + enter) with
<pre>
reset ; killall -9 linux ; ./linux ubda=Debian-3.1-AMD64-root_fs mem=256M eth0=tuntap,,,1.2.3.4
</pre>
You'll need to configure the network in UML with an unused IP. For my debian-based root fs image, this <tt>/etc/network/interfaces</tt> file does the trick:
<pre>
iface eth0 inet static
address 1.2.3.5 # unused ip in your host's netowrk
netmask 255.0.0.0
gateway 1.2.3.4 # host ip
auto eth0
</pre>
Note that you need install uml-utilities (<tt>apt-get install uml-utilities</tt> on debian distros) and add yourself to the <tt>uml-net</tt> group on the host (or run the UML instance as root) for the network to start up properly.
<p>
Inside UML, you'll want an <tt>/etc/fstab</tt> line like
<pre>
none /host hostfs defaults 0 0
</pre>
You can then load the kernel client module and mount from the UML instance with
<pre>
insmod /host/path/to/ceph/trunk/ceph/kernel/ceph.ko
mount -t ceph 1.2.3.4:/ -o monport=12345,ip=1.2.3.5 mnt # 1.2.3.4 is host, 1.2.3.5 is uml ip
</pre>
</div>
<h4>Running on multiple nodes</h4>
<div>
If you're ready to start things up on multiple nodes (or even just multiple processes on the same node), <tt>newsyn</tt> is the easiest way to get things launched. It uses MPI to start up all the processes. Assuming you have MPICH2 (or similar) installed,
<pre>
mpd & # for a single host
mpiboot -n 10 # for multiple hosts (see MPICH docs)
make newsyn && mpiexec -l -n 10 ./newsyn --mkfs --nummds 1 --numosd 6 --numclient 20 --syn writefile 100 16384
</pre>
You will probably want to make <tt>dev/osd.all</tt> a symlink to some block device that exists on every node you're starting an OSD on. Otherwise, you'll need a symlink (for "block device" file) for each osd.
If you want to mount a distributed FS (instead of generating a synthetic workload), try
<pre>
make newsyn && mpiexec -l -n 10 ./newsyn --mkfs --nummds 2 --numosd 6 --numclient 0 # 0 clients, just mds and osds
# in another terminal,
mkdir mnt
make cfuse && ./cfuse mnt
# and in yet another terminal,
ls mnt
touch mnt/asdf # etc
</pre>
Currently, when the last client (<tt>cfuse</tt> instance, in this case) shuts down, the whole thing will shut down. Assuming things shut down cleanly, you should be able to start things up again without the <tt>--mkfs</tt> flag and recover the prior file system state.
</div>
<h4>Structure</h4>
<div>
Here's a crude table diagram that shows how the major (user space) pieces fit together. Ingore the MDS bits; that's mostly wrong.
<table border=0>
<tr> <td></td> <td>Application</td> </tr>
<tr> <td></td> <td class=kernel>kernel</td> </tr>
<tr> <td>Application</td> <td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/client/fuse.cc?view=markup">FUSE glue</a></td> </tr>
<tr> <td class=entity colspan=2><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/client/Client.h?view=markup">Client</a></td> <td class=net width=50></td><td class=entity colspan=2><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/mds/MDS.h?view=markup">MDS</a></td> </tr>
<tr> <td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/osdc/Filer.h?view=markup">Filer</a></td> <td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/osdc/ObjectCacher.h?view=markup">ObjectCacher</a></td> <td></td><td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/mds/MDLog.h?view=markup">MDLog</td><td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/mds/MDStore.h?view=markup">MDStore</a></td> </tr>
<tr> <td></td> <td class=lib colspan=4><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/osdc/Objecter.h?view=markup">Objecter</a></td> </tr>
<tr> <td colspan=2></td> <td class=net colspan=2>(message layer)</td> </tr>
<tr> <td colspan=2></td> <td class=entity colspan=2><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/osd/OSD.h?view=markup">OSD</a></td> </tr>
<tr> <td colspan=2></td> <td class=abstract colspan=2><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/osd/ObjectStore.h?view=markup">ObjectStore</a></td> </tr>
<tr> <td colspan=2></td> <td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/ebofs/Ebofs.h?view=markup">EBOFS</a></td> <td rowspan=2 class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/osd/FakeStore.h?view=markup">FakeStore</a></td> </tr>
<tr> <td colspan=2></td> <td class=lib><a href="http://svn.sourceforge.net/viewvc/ceph/trunk/ceph/ebofs/BlockDevice.h?view=markup">BlockDevice</a></td> </tr>
<tr> <td colspan=2></td> <td class=kernel colspan=2>Kernel POSIX interface</td> </tr>
<tr> <td height=30></td> </tr>
<tr> <td>Key:</td> <td class=net>Network</td> <td class=entity>Entity</td> <td class=lib>Lib/module</td> <td class=abstract>Abstract interface</td> <td class=kernel>Kernel</td> </tr>
</table>
</div>
</div>
</div>