Ceph is a distributed network file system designed to provide excellent performance, reliability, and scalability. Ceph fills two significant gaps in the array of currently available file systems:
<ol>
<li><b>Petabyte-scale storage</b> -- Ceph is built from the ground up to seamlessly and gracefully scale from gigabytes to petabytes and beyond. Scalability is considered in terms of workload as well as total storage. Ceph gracefully handles workloads in which tens thousands of clients or more simultaneously access the same file, or write to the same directory--usage scenarios that bring existing enterprise storage systems to their knees.
<li><b>Robust, open-source distributed storage</b> -- Ceph is released under the terms of the LGPL, which means it is free software (as in speech). Ceph will provide a variety of key features that are sorely lacking from existing open-source file systems, including snapshots, seamless scalability (the ability to simply add disks to expand volumes), and intelligent load balancing.
Here are some of the key features that make Ceph different from existing file systems that you may have worked with:
<ol>
<li><b>Seamless scaling</b> -- A Ceph filesystem can be seamlessly expanded by simply adding storage nodes (OSDs). However, unlike most existing file systems, Ceph proactively migrates data onto new devices in order to maintain a balanced distribution of data that effectively utilizes all available resources (disk bandwidth and spindles) and avoids data hot spots (e.g., active data residing primarly on old disks while newer disks sit empty and idle).
<li><b>Strong reliability and fast recovery</b> -- All data in Ceph is replicated across multiple OSDs. If any OSD fails, data is automatically re-replicated to other devices. However, unlike typical RAID systems, the replicas for data on each disk are spread out among a large number of other disks, and when a disk fails, the replacement replicas are also distributed across many disks. This allows recovery to proceed in parallel (with dozens of disks copying to dozens of other disks), removing the need for explicit "spare" disks (which are effectively wasted until they are needed) and preventing a single disk from becoming a "RAID rebuild" bottleneck.
<li><b>Adaptive MDS</b> -- The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload. If thousands of clients suddenly access a single file or directory, that metadata is dynamically replicated across multiple servers to distribute the workload. Similarly, as the size and popularity of the file system hierarchy changes over time, that hierarchy is dynamically redistributed among available metadata servers in order to balance load and most effectively use server resources. (In contrast, current file systems force system administrators to carve their data set into static "volumes" and assign volumes to servers. Volume sizes and workloads inevitably shift over time, forcing administrators to constantly shuffle data between servers or manually allocate new resources where they are currently needed.)
</ol>
For more information about the underlying architecture of Ceph, please see the <a href="overview.html">Overview</a>. This project is based on a substantial body of research conducted by the <a href="http://ssrc.cse.ucsc.edu/proj/ceph.html">Storage Systems Research Center</a> at the University of California, Santa Cruz over the past few years that has resulted in a number of <a href="publications.html">publications</a>.
<!--
Unlike conventional network file systems (exemplified by NFS), Ceph maximizes the separation of metadata management from data storage by utilizing intelligent storage nodes called Object Storage Devices (OSDs). Each OSD combines a CPU, memory, and network interface (essentially, a small, lightweight server) with a hard disk or RAID. This basic architecture allows low-level block allocation, data replication, failure detection, and failure recovery to be distributed among potentially thousands of OSDs. This approach allows clients (hosts or applications accessing the file system) to communicate directly with OSDs storing the data they need, while avoiding the bottleneck inherent in NFS-like systems with a single point of access.
<p>File system metadata (directory namespace) is managed by a separate, small cluster of metadata servers (MDSs), which is responsible for coordinating access to the file system. The Ceph MDS uses a load balancing architecture that dynamically distributes responsibility for management the file system hierarchy across dozens (or even hundreds) of MDS servers, allowing Ceph to scale significantly better than other distributed file systems.
<p>Ceph is currently in the prototype stage, and is under very active development. The file system is mountable and more or less usable, but a variety of subsystems are not yet fully functional (most notably including MDS failure recovery, reliable failure monitoring, and flexible snapshots).
<p>The Ceph project is actively seeking participants. If you are interested in using Ceph, or contributing to its development, please <a href="http://lists.sourceforge.net/mailman/listinfo/ceph-devel">join our mailing list</a> and <a href="mailto:ceph-devel@lists.sourceforge.net">drop us a line</a>.
A paper describing the Ceph filesystem will be presented at <a href="http://www.usenix.org/events/osdi06/">OSDI '06</a> (7th USENIX Symposium on Operating Systems Design and Implementation) in Seattle on November 8th. The following week a paper describing CRUSH (the special-purpose mapping function used to distribute data) will be presented at <a href="http://sc06.supercomputing.org">SC'06</a>, the International Conference for High Performance Computing in Tampa on November 16th. We hope to see you there!
After a few too many months of summer distractions, I've finally moved the Ceph CVS code base over from UCSC to Subversion on SourceForge, and created a Ceph home page. This is largely in preparation for upcoming paper publications which will hopefully increase Ceph's exposure and attract some interest to the project. Yay!
</div>
</div>
<b>Please feel free to <a href="mailto:sage@newdream.net">contact us</a> with any questions or comments.</b>