diff --git a/trunk/web/index.body b/trunk/web/index.body index d2ec4e5a72e..2027ff5e655 100644 --- a/trunk/web/index.body +++ b/trunk/web/index.body @@ -12,8 +12,8 @@ We are actively seeking experienced C/C++ and Linux kernel developers who are in Ceph is a distributed network file system designed to provide excellent performance, reliability, and scalability. Ceph fills two significant gaps in the array of currently available file systems:
    -
  1. Petabyte-scale storage -- Ceph is built from the ground up to seamlessly and gracefully scale from gigabytes to petabytes and beyond. Scalability is considered in terms of workload as well as total storage. Ceph is designed to gracefully handle workloads in which tens thousands of clients or more simultaneously access the same file, or write to the same directory--usage scenarios that bring typical enterprise storage systems to their knees. -
  2. Robust, open-source distributed storage -- Ceph is released under the terms of the LGPL, which means it is free software (as in speech). Ceph will provide a variety of key features that are sorely lacking from existing open-source file systems, including seamless scalability (the ability to simply add disks to expand volumes), intelligent load balancing, content-addressable storage (CAS), and snapshot functionality. +
  3. Robust, open-source distributed storage -- Ceph is released under the terms of the LGPL, which means it is free software (as in speech and beer). Ceph will provide a variety of key features that are generally lacking from existing open-source file systems, including seamless scalability (the ability to simply add disks to expand volumes), intelligent load balancing, and efficient, easy to use snapshot functionality. +
  4. Scalability -- Ceph is built from the ground up to seamlessly and gracefully scale from gigabytes to petabytes and beyond. Scalability is considered in terms of workload as well as total storage. Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file, or write to the same directory--usage scenarios that bring typical enterprise storage systems to their knees.
Here are some of the key features that make Ceph different from existing file systems that you may have worked with: @@ -21,7 +21,7 @@ We are actively seeking experienced C/C++ and Linux kernel developers who are in
  1. Seamless scaling -- A Ceph filesystem can be seamlessly expanded by simply adding storage nodes (OSDs). However, unlike most existing file systems, Ceph proactively migrates data onto new devices in order to maintain a balanced distribution of data. This effectively utilizes all available resources (disk bandwidth and spindles) and avoids data hot spots (e.g., active data residing primarly on old disks while newer disks sit empty and idle).
  2. Strong reliability and fast recovery -- All data in Ceph is replicated across multiple OSDs. If any OSD fails, data is automatically re-replicated to other devices. However, unlike typical RAID systems, the replicas for data on each disk are spread out among a large number of other disks, and when a disk fails, the replacement replicas are also distributed across many disks. This allows recovery to proceed in parallel (with dozens of disks copying to dozens of other disks), removing the need for explicit "spare" disks (which are effectively wasted until they are needed) and preventing a single disk from becoming a "RAID rebuild" bottleneck. -
  3. Adaptive MDS -- The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload. If thousands of clients suddenly access a single file or directory, that metadata is dynamically replicated across multiple servers to distribute the workload. Similarly, as the size and popularity of the file system hierarchy changes over time, that hierarchy is dynamically redistributed among available metadata servers in order to balance load and most effectively use server resources. (In contrast, current file systems force system administrators to carve their data set into static "volumes" and assign volumes to servers. Volume sizes and workloads inevitably shift over time, forcing administrators to constantly shuffle data between servers or manually allocate new resources where they are currently needed.) +
  4. Adaptive MDS -- The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload. As the size and popularity of the file system hierarchy changes over time, that hierarchy is dynamically redistributed among available metadata servers in order to balance load and most effectively use server resources. (In contrast, current file systems force system administrators to carve their data set into static "volumes" and assign volumes to servers. Volume sizes and workloads inevitably shift over time, forcing administrators to constantly shuffle data between servers or manually allocate new resources where they are currently needed.) Similarly, if thousands of clients suddenly access a single file or directory, that metadata is dynamically replicated across multiple servers to distribute the workload.
For more information about the underlying architecture of Ceph, please see the Overview. This project is based on a substantial body of research conducted by the Storage Systems Research Center at the University of California, Santa Cruz over the past few years that has resulted in a number of publications. diff --git a/trunk/web/overview.body b/trunk/web/overview.body index 9bc31fa616b..60339495333 100644 --- a/trunk/web/overview.body +++ b/trunk/web/overview.body @@ -21,6 +21,8 @@ Ceph fills this gap by providing a scalable, reliable file system that can seaml

+ A thorough overview of the system architecture can be found in this paper that appeared at OSDI '06. +

A Ceph installation consists of three main elements: clients, metadata servers (MDSs), and object storage devices (OSDs). Ceph clients can either be individual processes linking directly to a user-space client library, or a host mounting the Ceph file system natively (ala NFS). OSDs are servers with attached disks and are responsible for storing data.

The Ceph architecture is based on three key design principles that set it apart from traditional file systems. @@ -29,7 +31,7 @@ Ceph fills this gap by providing a scalable, reliable file system that can seaml

  • Separation of metadata and data management.
    A small set of metadata servers (MDSs) manage the file system hierarchy (namespace). Clients communicate with an MDS to open/close files, get directory listings, remove files, or any other operations that involve file names. Once a file is opened, clients communicate directly with OSDs (object-storage devices) to read and write data. A large Ceph system may involve anywhere from one to many dozens (or possibly hundreds) of MDSs, and anywhere from four to hundreds or thousands of OSDs.

    - Both file data and file system metadata are striped over multiple objects, each of which is replicated on multiple OSDs for reliability. A special-purpose mapping function called CRUSH is used to determine which OSDs store which objects. CRUSH resembles a hash function in that this mapping is pseudo-random (it appears random, but is actually deterministic). This provides load balancing across all devices that is relatively invulnerable to "hot spots," while Ceph's policy of redistributing data ensures that workload remains balanced and all devices are equally utilized even when the storage cluster is expanded or OSDs are removed. + Both file data and file system metadata are striped over multiple objects, each of which is replicated on multiple OSDs for reliability. A special-purpose mapping function called CRUSH is used to determine which OSDs store which objects. CRUSH resembles a hash function in that this mapping is pseudo-random (it appears random, but is actually deterministic). This provides load balancing across all devices that is relatively invulnerable to "hot spots," while Ceph's policy of redistributing data ensures that workload remains balanced and all devices are equally utilized even when the storage cluster is expanded or OSDs are removed.