mirror of
https://github.com/ceph/ceph
synced 2024-12-12 06:28:31 +00:00
5bcea24d69
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
169 lines
7.0 KiB
ReStructuredText
169 lines
7.0 KiB
ReStructuredText
======================
|
|
Architecture of Ceph
|
|
======================
|
|
|
|
- Introduction to Ceph Project
|
|
|
|
- High-level overview of project benefits for users (few paragraphs, mention each subproject)
|
|
- Introduction to sub-projects (few paragraphs to a page each)
|
|
|
|
- RADOS
|
|
- RGW
|
|
- RBD
|
|
- Ceph
|
|
|
|
- Example scenarios Ceph projects are/not suitable for
|
|
- (Very) High-Level overview of Ceph
|
|
|
|
This would include an introduction to basic project terminology,
|
|
the concept of OSDs, MDSes, and Monitors, and things like
|
|
that. What they do, some of why they're awesome, but not how they
|
|
work.
|
|
|
|
- Discussion of MDS terminology, daemon types (active, standby,
|
|
standby-replay)
|
|
|
|
.. todo:: write me
|
|
|
|
=================================
|
|
Library architecture
|
|
=================================
|
|
Ceph is structured into libraries which are built and then combined together to
|
|
make executables and other libraries.
|
|
|
|
- libcommon: a collection of utilities which are available to nearly every ceph
|
|
library and executable. In general, libcommon should not contain global
|
|
variables, because it is intended to be linked into libraries such as
|
|
libceph.so.
|
|
|
|
- libglobal: a collection of utilities focused on the needs of Ceph daemon
|
|
programs. In here you will find pidfile management functions, signal
|
|
handlers, and so forth.
|
|
|
|
.. todo:: document other libraries
|
|
|
|
=================================
|
|
Configuration Management System
|
|
=================================
|
|
The configuration management system exists to provide every daemon with the
|
|
proper configuration information. The configuration can be viewed as a set of
|
|
key-value pairs.
|
|
|
|
How can the configuration be set? Well, there are several sources:
|
|
- the ceph configuration file, usually named ceph.conf
|
|
- command line arguments::
|
|
--debug-ms=1
|
|
--debug-pg=10
|
|
etc.
|
|
- arguments injected at runtime by using injectargs
|
|
|
|
======================================================
|
|
The Configuration File
|
|
======================================================
|
|
Most configuration settings originate in the Ceph configuration file.
|
|
|
|
How do we find the configuration file? Well, in order, we check:
|
|
- the default locations
|
|
- the environment variable CEPH_CONF
|
|
- the command line argument -c
|
|
|
|
Each stanza of the configuration file describes the key-value pairs that will be in
|
|
effect for a particular subset of the daemons. The "global" stanza applies to
|
|
everything. The "mon", "osd", and "mds" stanzas specify settings to take effect
|
|
for all monitors, all osds, and all mds servers, respectively. A stanza of the
|
|
form mon.$name, osd.$name, or mds.$name gives settings for the monitor, OSD, or
|
|
MDS of that name, respectively. Configuration values that appear later in the
|
|
file win over earlier ones.
|
|
|
|
A sample configuration file can be found in src/sample.ceph.conf.
|
|
|
|
======================================================
|
|
Metavariables
|
|
======================================================
|
|
The configuration system supports certain "metavariables." If these occur
|
|
inside a configuration value, they are expanded into something else-- similar to
|
|
how bash shell expansion works.
|
|
|
|
There are a few different metavariables:
|
|
- $host: expands to the current hostname
|
|
- $type: expands to one of "mds", "osd", or "mon"
|
|
- $id: expands to the daemon identifier. For osd.0, this would be "0"; for mds.a, it would be "a"
|
|
- $num: same as $id
|
|
- $name: expands to $type.$id
|
|
|
|
======================================================
|
|
Interfacing with the Configuration Management System
|
|
======================================================
|
|
There are two ways for Ceph code to get configuration values. One way is to
|
|
read it directly from a variable named "g_conf," or equivalently,
|
|
"g_ceph_ctx->_conf." The other is to register an observer that will called
|
|
every time the relevant configuration values changes. This observer will be
|
|
called soon after the initial configuration is read, and every time after that
|
|
when one of the relevant values changes. Each observer tracks a set of keys
|
|
and is invoked only when one of the relevant keys changes.
|
|
|
|
The interface to implement is found in common/config_obs.h.
|
|
|
|
The observer method should be preferred in new code because
|
|
- It is more flexible, allowing the code to do whatever reinitialization needs
|
|
to be done to implement the new configuration value.
|
|
- It is the only way to create a std::string configuration variable that can
|
|
be changed by injectargs.
|
|
- Even for int-valued configuration options, changing the values in one thread
|
|
while another thread is reading them can lead to subtle and
|
|
impossible-to-diagnose bugs.
|
|
|
|
For these reasons, reading directly from g_conf should be considered deprecated
|
|
and not done in new code. Do not ever alter g_conf.
|
|
|
|
=================================
|
|
Debug Logs
|
|
=================================
|
|
The main debugging tool for Ceph is the dout and derr logging functions.
|
|
Collectively, these are referred to as "dout logging."
|
|
|
|
Dout has several log faculties, which can be set at various log
|
|
levels using the configuration management system. So it is possible to enable
|
|
debugging just for the messenger, by setting debug_ms to 10, for example.
|
|
|
|
Dout is implemented mainly in common/DoutStreambuf.cc
|
|
|
|
The dout macro avoids even generating log messages which are not going to be
|
|
used, by enclosing them in an "if" statement. What this means is that if you
|
|
have the debug level set at 0, and you run this code
|
|
|
|
``dout(20) << "myfoo() = " << myfoo() << dendl;``
|
|
|
|
|
|
myfoo() will not be called here.
|
|
|
|
Unfortunately, the performance of debug logging is relatively low. This is
|
|
because there is a single, process-wide mutex which every debug output
|
|
statement takes, and every debug output statement leads to a write() system
|
|
call or a call to syslog(). There is also a computational overhead to using C++
|
|
streams to consider. So you will need to be parsimonius in your logging to get
|
|
the best performance.
|
|
|
|
Sometimes, enabling logging can hide race conditions and other bugs by changing
|
|
the timing of events. Keep this in mind when debugging.
|
|
|
|
=================================
|
|
CephContext
|
|
=================================
|
|
A CephContext represents a single view of the Ceph cluster. It comes complete
|
|
with a configuration, a set of performance counters (PerfCounters), and a
|
|
heartbeat map. You can find more information about CephContext in
|
|
src/common/ceph_context.h.
|
|
|
|
Generally, you will have only one CephContext in your application, called
|
|
g_ceph_context. However, in library code, it is possible that the library user
|
|
will initialize multiple CephContexts. For example, this would happen if he
|
|
called rados_create more than once.
|
|
|
|
A ceph context is required to issue log messages. Why is this? Well, without
|
|
the CephContext, we would not know which log messages were disabled and which
|
|
were enabled. The dout() macro implicitly references g_ceph_context, so it
|
|
can't be used in library code. It is fine to use dout and derr in daemons, but
|
|
in library code, you must use ldout and lderr, and pass in your own CephContext
|
|
object. The compiler will enforce this restriction.
|