ceph/trunk/web/tasks.body


<div class="mainsegment">
	<h3>Current Roadmap</h3>
	<div>
		Here is a brief summary of what we're currently working on, and what state we expect Ceph to take in the foreseeable future.  This is a rough estimate, and highly dependent on what kind of interest Ceph generates in the larger community.

		<ul>
		<li><b>Q2 2007</b> -- MDS recovery for multi-MDS configurations.
		<li><b>Q3 2007</b> -- Distributed monitors.
		<li><b>Q4 2007</b> -- Snapshots.
		</ul>

	</div>

	<h3>Tasks</h3>
	<div>
		Although Ceph is currently a working prototype that demonstrates the key features of the architecture, a variety of features need to be implemented in order to make Ceph a stable file system that can be used in production environments.  Some of these tasks are outlined below.  If you are a kernel or file system developer and are interested in contributing to Ceph, please join the email list and <a href="mailto:ceph-devel@lists.sourceforge.net">drop us a line</a>.

		<p>

		<h4>Snapshots</h4>
		<div>
			The distributed object storage fabric (RADOS) includes a simple mechanism of versioning objects, and performing copy-on-write when old objects are updated.  In order to utilize this mechanism for implementing flexible snapshots, the MDS needs to be extended to manage versioned directory entries and maintain some additional directory links.  For more information, see <a href="http://www.soe.ucsc.edu/~sage/papers/290s.osdsnapshots.pdf">this tech report</a>.
		</div>

		<h4>Content-addressable Storage</h4>
		<div>
			The underlying problems of reliable, scalable and distributed object storage are solved by the RADOS object storage system.  This mechanism can be leveraged to implement a content-addressible storage system (i.e. one that stores duplicated data only once) by selecting a suitable chunking strategy and naming objects by the hash of their contents.  Ideally, we'd like to incorporate this into the overall Ceph file system, so that different parts of the file system can be selectively stored normally or by their hash.  Ideally, the system could (perhaps lazily) detect duplicated data when it is written and adjust the underlying storage strategy accordingly in order to optimize for space efficiency or performance.
		</div>

		<h4>Ebofs</h4>
		<div>
			Each Ceph OSD (storage node) runs a custom object "file system" called EBOFS to store objects on locally attached disks.  Although the current implementation of EBOFS is fully functional and already demonstrates promising performance (outperforming ext2/3, XFS, and ReiserFS under the workloads we anticipate), a range of improvements will be needed before it is ready for prime-time.  These include:
			<ul>
			<li><b>Intelligent use of NVRAM for metadata and/or data journaling.</b>  A subset of write requests need to be synchronously committed to stable storage in order to ensure both good performance and strong file system integrity guarantees.  Even a small amount of NVRAM (whether it is FLASH, MRAM, battery-backed DRAM, etc.) will enable very low latency for small writes while maintaining data safety.
			<li><b>RAID-aware allocation.</b>  Although we conceptually think of each OSD as a disk with an attached CPU, memory, and network interface, it is more likely that the actual OSDs deployed in production systems will be small to medium sized storage servers: a standard server with a locally attached array of SAS or SATA disks.  In order to properly take advantage of the parallelism inherent in the use of multiple disks, the EBOFS allocator and disk scheduling algorithms have to be aware of the underlying structure of the array (be it RAID0, 1, 5, 10, etc.) in order to reap the performance and reliability rewards.
			</ul>
		</div>


		<h4>Native kernel client</h4>
		<div>
			The prototype Ceph client is implemented as a user-space library.  Although it can be mounted under Linux via the <a href="http://fuse.sourceforge.net/">FUSE (file system in userspace)</a> library, this incurs a significant performance penalty and limits Ceph's ability to provide strong POSIX semantics and consistency.  A native Linux kernel implementation of the client in needed in order to properly take advantage of the performance and consistency features of Ceph.  Because the client interaction with Ceph MDS and OSDs is more complicated than existing network file systems like NFS, this is a non-trivial endeavor.  We are actively looking for experienced kernel programmers to help us out!
		</div>

		<h4>CRUSH tools</h4>
		<div>
			Ceph utilizes a novel data distribution function called CRUSH to distribute data (in the form of objects) to storage nodes (OSDs).  CRUSH is designed to generate a balanced distribution will allowing the storage cluster to be dynamically expanded or contracted, and to separate object replicas across failure domains to enhance data safety.  There is a certain amount of finesse involved in properly managing the OSD hierarchy from which CRUSH generates its distribution in order to minimize the amount of data migration that results from changes.  An administrator tool would be useful for helping to manage the CRUSH mapping function in order to best exploit the available storage and network infrastructure.  For more information, please refer to the technical paper describing <a href="publications.html">CRUSH</a>.
		</div>

		<p>The Ceph project is always looking for more participants.  If any of these projects sound interesting to you, please <a href="http://lists.sourceforge.net/mailman/listinfo/ceph-devel">join our mailing list</a> and <a href="mailto:ceph-devel@lists.sourceforge.net">drop us a line</a>.
	</div>
</div>


<b>Please feel free to <a href="mailto:sage@newdream.net">contact us</a> with any questions or comments.</b>