ceph/doc/architecture.rst

======================
 Architecture of Ceph
======================

- Introduction to Ceph Project

  - High-level overview of project benefits for users (few paragraphs, mention each subproject)
  - Introduction to sub-projects (few paragraphs to a page each)

    - RADOS
    - RGW
    - RBD
    - Ceph

  - Example scenarios Ceph projects are/not suitable for
  - (Very) High-Level overview of Ceph

    This would include an introduction to basic project terminology,
    the concept of OSDs, MDSes, and Monitors, and things like
    that. What they do, some of why they're awesome, but not how they
    work.

- Discussion of MDS terminology, daemon types (active, standby,
  standby-replay)

.. todo:: write me

====================================
 Object Store Architecture Overview
====================================
.. image:: object_store.png

.. todo:: write more here

=================================
 Library architecture
=================================
Ceph is structured into libraries which are built and then combined together to
make executables and other libraries.

- libcommon: a collection of utilities which are available to nearly every ceph
  library and executable. In general, libcommon should not contain global
  variables, because it is intended to be linked into libraries such as
  libceph.so.

- libglobal: a collection of utilities focused on the needs of Ceph daemon
  programs. In here you will find pidfile management functions, signal
  handlers, and so forth.

.. todo:: document other libraries

=================================
 Configuration Management System
=================================
The configuration management system exists to provide every daemon with the
proper configuration information. The configuration can be viewed as a set of
key-value pairs.

How can the configuration be set? Well, there are several sources:
 - the ceph configuration file, usually named ceph.conf
 - command line arguments::
    --debug-ms=1
    --debug-pg=10
    etc.
 - arguments injected at runtime by using injectargs

======================================================
 The Configuration File
======================================================
Most configuration settings originate in the Ceph configuration file.

How do we find the configuration file? Well, in order, we check:
 - the default locations
 - the environment variable CEPH_CONF
 - the command line argument -c

Each stanza of the configuration file describes the key-value pairs that will be in
effect for a particular subset of the daemons. The "global" stanza applies to
everything. The "mon", "osd", and "mds" stanzas specify settings to take effect
for all monitors, all osds, and all mds servers, respectively.  A stanza of the
form mon.$name, osd.$name, or mds.$name gives settings for the monitor, OSD, or
MDS of that name, respectively. Configuration values that appear later in the
file win over earlier ones.

A sample configuration file can be found in src/sample.ceph.conf.

======================================================
 Metavariables
======================================================
The configuration system supports certain "metavariables." If these occur
inside a configuration value, they are expanded into something else-- similar to
how bash shell expansion works.

There are a few different metavariables:
 - $host: expands to the current hostname
 - $type: expands to one of "mds", "osd", or "mon"
 - $id: expands to the daemon identifier. For osd.0, this would be "0"; for mds.a, it would be "a"
 - $num: same as $id
 - $name: expands to $type.$id

======================================================
 Interfacing with the Configuration Management System
======================================================
There are two ways for Ceph code to get configuration values. One way is to
read it directly from a variable named "g_conf," or equivalently,
"g_ceph_ctx->_conf." The other is to register an observer that will called
every time the relevant configuration values changes.  This observer will be
called soon after the initial configuration is read, and every time after that
when one of the relevant values changes. Each observer tracks a set of keys
and is invoked only when one of the relevant keys changes.

The interface to implement is found in common/config_obs.h.

The observer method should be preferred in new code because
 - It is more flexible, allowing the code to do whatever reinitialization needs
   to be done to implement the new configuration value.
 - It is the only way to create a std::string configuration variable that can
   be changed by injectargs.
 - Even for int-valued configuration options, changing the values in one thread
   while another thread is reading them can lead to subtle and
   impossible-to-diagnose bugs.

For these reasons, reading directly from g_conf should be considered deprecated
and not done in new code.  Do not ever alter g_conf.

=================================
 Debug Logs
=================================
The main debugging tool for Ceph is the dout and derr logging functions.
Collectively, these are referred to as "dout logging."

Dout has several log faculties, which can be set at various log
levels using the configuration management system. So it is possible to enable
debugging just for the messenger, by setting debug_ms to 10, for example.

Dout is implemented mainly in common/DoutStreambuf.cc

The dout macro avoids even generating log messages which are not going to be
used, by enclosing them in an "if" statement. What this means is that if you
have the debug level set at 0, and you run this code

``dout(20) << "myfoo() = " << myfoo() << dendl;``


myfoo() will not be called here.

Unfortunately, the performance of debug logging is relatively low. This is
because there is a single, process-wide mutex which every debug output
statement takes, and every debug output statement leads to a write() system
call or a call to syslog(). There is also a computational overhead to using C++
streams to consider. So you will need to be parsimonius in your logging to get
the best performance.

Sometimes, enabling logging can hide race conditions and other bugs by changing
the timing of events. Keep this in mind when debugging.

=================================
 CephContext
=================================
A CephContext represents a single view of the Ceph cluster. It comes complete
with a configuration, a set of performance counters (PerfCounters), and a
heartbeat map. You can find more information about CephContext in
src/common/ceph_context.h.

Generally, you will have only one CephContext in your application, called
g_ceph_context. However, in library code, it is possible that the library user
will initialize multiple CephContexts. For example, this would happen if he
called rados_create more than once.

A ceph context is required to issue log messages. Why is this? Well, without
the CephContext, we would not know which log messages were disabled and which
were enabled.  The dout() macro implicitly references g_ceph_context, so it
can't be used in library code.  It is fine to use dout and derr in daemons, but
in library code, you must use ldout and lderr, and pass in your own CephContext
object. The compiler will enforce this restriction.