mirror of
https://github.com/ceph/ceph
synced 2024-12-22 03:22:00 +00:00
d54ded4991
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
723 lines
18 KiB
ReStructuredText
723 lines
18 KiB
ReStructuredText
======================
|
|
OSD Config Reference
|
|
======================
|
|
|
|
You can configure OSDs in the Ceph configuration file, but OSDs can use the
|
|
default values and a very minimal configuration. A minimal OSD configuration
|
|
sets ``osd journal size`` and ``osd host``, and uses default values for
|
|
nearly everything else.
|
|
|
|
|
|
General Settings
|
|
================
|
|
|
|
The following settings provide an OSD's ID, and determine paths to data and
|
|
journals. Ceph deployment scripts typically generate the UUID automatically. We
|
|
**DO NOT** recommend changing the default paths for data or journals, as it
|
|
makes it more problematic to troubleshoot Ceph later.
|
|
|
|
The journal size should be at least twice the product of the expected drive
|
|
speed multiplied by ``filestore max sync interval``. However, the most common
|
|
practice is to partition the journal drive (often an SSD), and mount it such
|
|
that Ceph uses the entire partition for the journal.
|
|
|
|
|
|
``osd uuid``
|
|
|
|
:Description: The universally unique identifier (UUID) for the OSD.
|
|
:Type: UUID
|
|
:Default: The UUID.
|
|
|
|
|
|
``osd data``
|
|
|
|
:Description: The path to the OSDs data. You must create the directory when
|
|
deploying Ceph. You should mount a drive for OSD data at this
|
|
mount point. We do not recommend changing the default.
|
|
|
|
:Type: String
|
|
:Default: ``/var/lib/ceph/osd/$cluster-$id``
|
|
|
|
|
|
``osd journal``
|
|
|
|
:Description: The path to the OSD's journal. This may be a path to a file or a
|
|
block device (such as a partition of an SSD). If it is a file,
|
|
you must create the directory to contain it. We recommend using a
|
|
drive separate from the ``osd data`` drive.
|
|
|
|
:Type: String
|
|
:Default: ``/var/lib/ceph/osd/$cluster-$id/journal``
|
|
|
|
|
|
``osd journal size``
|
|
|
|
:Description: The size of the journal in megabytes. If this is 0, and the
|
|
journal is a block device, the entire block device is used.
|
|
Since v0.54, this is ignored if the journal is a block device,
|
|
and the entire block device is used.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``5120``
|
|
:Recommended: Begin with 1GB. Should be at least twice the product of the
|
|
expected speed multiplied by ``filestore max sync interval``.
|
|
|
|
|
|
``osd max write size``
|
|
|
|
:Description: The maximum size of a write in megabytes.
|
|
:Type: 32-bit Integer
|
|
:Default: ``90``
|
|
|
|
|
|
``osd client message size cap``
|
|
|
|
:Description: The largest client data message allowed in memory.
|
|
:Type: 64-bit Integer Unsigned
|
|
:Default: 500MB default. ``500*1024L*1024L``
|
|
|
|
|
|
``osd class dir``
|
|
|
|
:Description: The class path for RADOS class plug-ins.
|
|
:Type: String
|
|
:Default: ``$libdir/rados-classes``
|
|
|
|
|
|
Monitor OSD Interaction
|
|
=======================
|
|
|
|
OSDs check each other's heartbeats and report to monitors periodically. Ceph can
|
|
use default values in many cases. However, if your network has latency issues,
|
|
you may need to adopt longer intervals. See `Configuring Monitor/OSD
|
|
Interaction`_ for a detailed discussion of heartbeats.
|
|
|
|
|
|
``osd heartbeat address``
|
|
|
|
:Description: An OSD's network address for heartbeats.
|
|
:Type: Address
|
|
:Default: The host address.
|
|
|
|
|
|
``osd heartbeat interval``
|
|
|
|
:Description: How often an OSD pings its peers (in seconds).
|
|
:Type: 32-bit Integer
|
|
:Default: ``6``
|
|
|
|
|
|
``osd heartbeat grace``
|
|
|
|
:Description: The elapsed time when an OSD hasn't shown a heartbeat that the
|
|
cluster considers it ``down``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``20``
|
|
|
|
|
|
``osd mon heartbeat interval``
|
|
|
|
:Description: How often the OSD pings a monitor if it has no OSD peers.
|
|
:Type: 32-bit Integer
|
|
:Default: ``30``
|
|
|
|
|
|
``osd mon report interval max``
|
|
|
|
:Description: The maximum time in seconds for an OSD to report to a monitor
|
|
before the monitor considers the OSD ``down``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``120``
|
|
|
|
|
|
``osd mon report interval min``
|
|
|
|
:Description: The minimum number of seconds for an OSD to report to a monitor
|
|
to avoid the monitor considering the OSD ``down``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``5``
|
|
:Valid Range: Should be less than ``osd mon report interval max``
|
|
|
|
|
|
``osd mon ack timeout``
|
|
|
|
:Description: The number of seconds to wait for a monitor to acknowledge a
|
|
request for statistics.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``30``
|
|
|
|
|
|
``osd min down reporters``
|
|
|
|
:Description: The minimum number of OSDs required to report a ``down`` OSD.
|
|
:Type: 32-bit Integer
|
|
:Default: ``1``
|
|
|
|
|
|
``osd min down reports``
|
|
|
|
:Description: The minimum number of times an OSD must report that another
|
|
is ``down``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``3``
|
|
|
|
|
|
Data Placement
|
|
==============
|
|
|
|
When you create pools and set the number of placement groups for the pool, Ceph
|
|
uses default values when you don't specifically override the defaults. **We
|
|
recommend** overridding some of the defaults. Specifically, we recommend setting
|
|
a pools replica size and overriding the default number of placement groups. You
|
|
can specifically set these values when running `pool`_ commands. You can also
|
|
override the defaults by adding new ones in the ``[global]`` section of your
|
|
Ceph configuration file.
|
|
|
|
|
|
.. literalinclude:: pool-pg.conf
|
|
:language: ini
|
|
|
|
|
|
``osd pg bits``
|
|
|
|
:Description: Placement group bits per OSD.
|
|
:Type: 32-bit Integer
|
|
:Default: ``6``
|
|
|
|
|
|
``osd pgp bits``
|
|
|
|
:Description: The number of bits per OSD for PGPs.
|
|
:Type: 32-bit Integer
|
|
:Default: ``6``
|
|
|
|
|
|
``osd crush chooseleaf type``
|
|
|
|
:Description: The bucket type to use for ``chooseleaf`` in a CRUSH rule. Uses
|
|
ordinal rank rather than name.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``1``. Typically a host containing one or more OSDs.
|
|
|
|
|
|
``osd min rep``
|
|
|
|
:Description: The minimum number of replicas for a ruleset.
|
|
:Type: 32-bit Integer
|
|
:Default: ``1``
|
|
|
|
|
|
``osd max rep``
|
|
|
|
:Description: The maximum number of replicas for a ruleset.
|
|
:Type: 32-bit Integer
|
|
:Default: ``10``
|
|
|
|
|
|
``osd pool default crush rule``
|
|
|
|
:Description: The default CRUSH ruleset to use when creating a pool.
|
|
:Type: 32-bit Integer
|
|
:Default: ``0``
|
|
|
|
|
|
``osd pool default size``
|
|
|
|
:Description: Sets the number of replicas for objects in the pool. The default
|
|
value is the same as
|
|
``ceph osd pool set {pool-name} size {size}``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``2``
|
|
|
|
|
|
``osd pool default min size``
|
|
|
|
:Descrption: Sets the minimum number of written replicas for objects in the
|
|
pool in order to acknowledge a write operation to the client.
|
|
If minimum is not met, Ceph will not acknowledge the write to the
|
|
client. This setting ensures a minimum number of replicas when
|
|
operating in ``degraded`` mode.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``0``, which means no particular minimum. If ``0``,
|
|
minimum is ``size - (size / 2)``.
|
|
|
|
|
|
``osd pool default pg num``
|
|
|
|
:Description: The default number of placement groups for a pool. The default
|
|
value is the same as ``pg_num`` with ``mkpool``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``8``
|
|
|
|
|
|
``osd pool default pgp num``
|
|
|
|
:Description: The default number of placement groups for placement for a pool.
|
|
The default value is the same as ``pgp_num`` with ``mkpool``.
|
|
PG and PGP should be equal (for now).
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``8``
|
|
|
|
|
|
``osd pool default flags``
|
|
|
|
:Description: The default flags for new pools.
|
|
:Type: 32-bit Integer
|
|
:Default: ``0``
|
|
|
|
|
|
``osd max pgls``
|
|
|
|
:Description: The maximum number of placement groups to list. A client
|
|
requesting a large number can tie up the OSD.
|
|
|
|
:Type: Unsigned 64-bit Integer
|
|
:Default: ``1024``
|
|
:Note: Default should be fine.
|
|
|
|
|
|
``osd min pg log entries``
|
|
|
|
:Description: The minimum number of placement group logs to maintain
|
|
when trimming log files.
|
|
|
|
:Type: 32-bit Int Unsigned
|
|
:Default: ``1000``
|
|
|
|
|
|
``osd default data pool replay window``
|
|
|
|
:Description: The time (in seconds) for an OSD to wait for a client to replay
|
|
a request.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``45``
|
|
|
|
|
|
Scrubbing
|
|
=========
|
|
|
|
In addition to making multiple copies of objects, Ceph insures data integrity by
|
|
scrubbing placement groups. Ceph scrubbing is analogous to ``fsck`` on the
|
|
object storage layer. For each placement group, Ceph generates a catalog of all
|
|
objects and compares each primary object and its replicas to ensure that no
|
|
objects are missing or mismatched. Light scrubbing (daily) checks the object
|
|
size and attributes. Deep scrubbing (weekly) reads the data and uses checksums
|
|
to ensure data integrity.
|
|
|
|
Scrubbing is important for maintaining data integrity, but it can reduce
|
|
performance. You can adjust the following settings to increase or decrease
|
|
scrubbing operations.
|
|
|
|
|
|
``osd max scrubs``
|
|
|
|
:Description: The maximum number of scrub operations for an OSD.
|
|
:Type: 32-bit Int
|
|
:Default: ``1``
|
|
|
|
|
|
``osd scrub thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a scrub thread.
|
|
:Type: 32-bit Integer
|
|
:Default: ``60``
|
|
|
|
|
|
``osd scrub finalize thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a scrub finalize
|
|
thread.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``60*10``
|
|
|
|
|
|
``osd scrub load threshold``
|
|
|
|
:Description: The maximum CPU load. Ceph will not scrub when the CPU load is
|
|
higher than this number. Default is 50%.
|
|
|
|
:Type: Float
|
|
:Default: ``0.5``
|
|
|
|
|
|
``osd scrub min interval``
|
|
|
|
:Description: The maximum interval in seconds for scrubbing the OSD when the
|
|
cluster load is low.
|
|
|
|
:Type: Float
|
|
:Default: 5 minutes. ``300``
|
|
|
|
|
|
``osd scrub max interval``
|
|
|
|
:Description: The maximum interval in seconds for scrubbing the OSD
|
|
irrespective of cluster load.
|
|
|
|
:Type: Float
|
|
:Default: Once per day. ``60*60*24``
|
|
|
|
|
|
``osd deep scrub interval``
|
|
|
|
:Description: The interval for "deep" scrubbing (fully reading all data).
|
|
:Type: Float
|
|
:Default: Once per week. ``60*60*24*7``
|
|
|
|
|
|
``osd deep scrub stride``
|
|
|
|
:Description: Read size when doing a deep scrub.
|
|
:Type: 32-bit Int
|
|
:Default: 512 KB. ``524288``
|
|
|
|
|
|
Operations
|
|
==========
|
|
|
|
Operations settings allow you to configure the number of threads for servicing
|
|
requests. If you set ``osd op threads`` to ``0``, it disables multi-threading.
|
|
By default, Ceph uses two threads with a 30 second timeout and a 30 second
|
|
complaint time if an operation doesn't complete within those time parameters.
|
|
You can set operations priority weights between client operations and
|
|
recovery operations to ensure optimal performance during recovery.
|
|
|
|
|
|
``osd op threads``
|
|
|
|
:Description: The number of threads to service OSD operations. Set to ``0`` to
|
|
disable it. Increasing the number may increase the request
|
|
processing rate.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``2``
|
|
|
|
|
|
``osd client op priority``
|
|
|
|
:Description: The priority set for client operations. It is relative to
|
|
``osd recovery op priority``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``63``
|
|
:Valid Range: 1-63
|
|
|
|
|
|
``osd recovery op priority``
|
|
|
|
:Description: The priority set for recovery operations. It is relative to
|
|
``osd client op priority``.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``10``
|
|
:Valid Range: 1-63
|
|
|
|
|
|
``osd op thread timeout``
|
|
|
|
:Description: The OSD operation thread timeout in seconds.
|
|
:Type: 32-bit Integer
|
|
:Default: ``30``
|
|
|
|
|
|
``osd op complaint time``
|
|
|
|
:Description: An operation becomes complaint worthy after the specified number
|
|
of seconds have elapsed.
|
|
|
|
:Type: Float
|
|
:Default: ``30``
|
|
|
|
|
|
``osd disk threads``
|
|
|
|
:Description: The number of disk threads, which are used to perform background
|
|
disk intensive OSD operations such as scrubbing and snap
|
|
trimming.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``1``
|
|
|
|
|
|
``osd op history size``
|
|
|
|
:Description: The maximum number of completed operations to track.
|
|
:Type: 32-bit Unsigned Integer
|
|
:Default: ``20``
|
|
|
|
|
|
``osd op history duration``
|
|
|
|
:Description: The oldest completed operation to track.
|
|
:Type: 32-bit Unsigned Integer
|
|
:Default: ``600``
|
|
|
|
|
|
``osd op log threshold``
|
|
|
|
:Description: How many operations logs to display at once.
|
|
:Type: 32-bit Integer
|
|
:Default: ``5``
|
|
|
|
|
|
|
|
Backfilling
|
|
===========
|
|
|
|
When you add or remove OSDs to a cluster, the CRUSH algorithm will want to
|
|
rebalance the cluster by moving placement groups to or from OSDs to restore the
|
|
balance. The process of migrating placement groups and the objects they contain
|
|
can reduce the cluster's operational performance considerably. To maintain
|
|
operational performance, Ceph performs this migration with 'backfilling', which
|
|
allows Ceph to set backfill operations to a lower priority than requests to read
|
|
or write data.
|
|
|
|
|
|
``osd max backfills``
|
|
|
|
:Description: The maximum number of backfills allowed to or from a single OSD.
|
|
:Type: 64-bit Unsigned Integer
|
|
:Default: ``10``
|
|
|
|
|
|
``osd backfill scan min``
|
|
|
|
:Description: The scan interval in seconds for backfill operations when cluster
|
|
load is low.
|
|
:Type: 32-bit Integer
|
|
:Default: ``64``
|
|
|
|
|
|
``osd backfill scan max``
|
|
|
|
:Description: The maximum scan interval in seconds for backfill operations
|
|
irrespective of cluster load.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``512``
|
|
|
|
|
|
``osd backfill full ratio``
|
|
|
|
:Description: Refuse to accept backfill requests when the OSD's full ratio is
|
|
above this value.
|
|
:Type: Float
|
|
:Default: ``0.85``
|
|
|
|
|
|
``osd backfill retry interval``
|
|
|
|
:Description: The number of seconds to wait before retrying backfill requests.
|
|
:Type: Double
|
|
:Default: ``10.0``
|
|
|
|
|
|
|
|
|
|
OSD Map
|
|
=======
|
|
|
|
OSD maps reflect the OSD daemons operating in the cluster. Over time, the
|
|
number of map epochs increases. Ceph provides some settings to ensure that
|
|
Ceph performs well as the OSD map grows larger.
|
|
|
|
|
|
``osd map dedup``
|
|
|
|
:Description: Enable removing duplicates in the OSD map.
|
|
:Type: Boolean
|
|
:Default: ``true``
|
|
|
|
|
|
``osd map cache size``
|
|
|
|
:Description: The size of the OSD map cache in megabytes.
|
|
:Type: 32-bit Integer
|
|
:Default: ``500``
|
|
|
|
|
|
``osd map cache bl size``
|
|
|
|
:Description: The size of the in-memory OSD map cache in OSD daemons.
|
|
:Type: 32-bit Integer
|
|
:Default: ``50``
|
|
|
|
|
|
``osd map cache bl inc size``
|
|
|
|
:Description: The size of the in-memory OSD map cache incrementals in
|
|
OSD daemons.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``100``
|
|
|
|
|
|
``osd map message max``
|
|
|
|
:Description: The maximum map entries allowed per MOSDMap message.
|
|
:Type: 32-bit Integer
|
|
:Default: ``100``
|
|
|
|
|
|
|
|
Recovery
|
|
========
|
|
|
|
When the cluster starts or when an OSD crashes and restarts, the OSD begins
|
|
peering with other OSDs before writes can occur. See `Monitoring OSDs and PGs`_
|
|
for details.
|
|
|
|
If an OSD crashed and comes back online, usually it will be out of sync with
|
|
other OSDs containing more recent versions of objects in the placement groups.
|
|
When this happens, the OSD goes into recovery mode and seeks to get the latest
|
|
copy of the data and bring its map back up to date. Depending upon how long the
|
|
OSD was down, the OSD's objects and placement groups may be significantly out of
|
|
date. Also, if a failure domain went down (e.g., a rack), more than one OSD may
|
|
come back online at the same time. This can make the recovery process time
|
|
consuming and resource intensive.
|
|
|
|
To maintain operational performance, Ceph performs recovery with limitations on
|
|
the number recovery requests, threads and object chunk sizes which allows Ceph
|
|
perform well in a degraded state.
|
|
|
|
|
|
``osd recovery delay start``
|
|
|
|
:Description: After peering completes, Ceph will delay for the specified number
|
|
of seconds before starting to recover objects.
|
|
|
|
:Type: Float
|
|
:Default: ``15``
|
|
|
|
|
|
``osd recovery max active``
|
|
|
|
:Description: The number of active recovery requests per OSD at one time. More
|
|
requests will accelerate recovery, but the requests places an
|
|
increased load on the cluster.
|
|
|
|
:Type: 32-bit Integer
|
|
:Default: ``5``
|
|
|
|
|
|
``osd recovery max chunk``
|
|
|
|
:Description: The maximum size of a recovered chunk of data to push.
|
|
:Type: 64-bit Integer Unsigned
|
|
:Default: ``1 << 20``
|
|
|
|
|
|
``osd recovery threads``
|
|
|
|
:Description: The number of threads for recovering data.
|
|
:Type: 32-bit Integer
|
|
:Default: ``1``
|
|
|
|
|
|
``osd recovery thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a recovery thread.
|
|
:Type: 32-bit Integer
|
|
:Default: ``30``
|
|
|
|
|
|
``osd recover clone overlap``
|
|
|
|
:Description: Preserves clone overlap during recovery. Should always be set
|
|
to ``true``.
|
|
|
|
:Type: Boolean
|
|
:Default: ``true``
|
|
|
|
|
|
|
|
Miscellaneous
|
|
=============
|
|
|
|
|
|
``osd snap trim thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a snap trim thread.
|
|
:Type: 32-bit Integer
|
|
:Default: ``60*60*1``
|
|
|
|
|
|
``osd backlog thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a backlog thread.
|
|
:Type: 32-bit Integer
|
|
:Default: ``60*60*1``
|
|
|
|
|
|
``osd default notify timeout``
|
|
|
|
:Description: The OSD default notification timeout (in seconds).
|
|
:Type: 32-bit Integer Unsigned
|
|
:Default: ``30``
|
|
|
|
|
|
``osd check for log corruption``
|
|
|
|
:Description: Check log files for corruption. Can be computationally expensive.
|
|
:Type: Boolean
|
|
:Default: ``false``
|
|
|
|
|
|
``osd remove thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a remove OSD thread.
|
|
:Type: 32-bit Integer
|
|
:Default: ``60*60``
|
|
|
|
|
|
``osd command thread timeout``
|
|
|
|
:Description: The maximum time in seconds before timing out a command thread.
|
|
:Type: 32-bit Integer
|
|
:Default: ``10*60``
|
|
|
|
|
|
``osd command max records``
|
|
|
|
:Description: Limits the number of lost objects to return.
|
|
:Type: 32-bit Integer
|
|
:Default: ``256``
|
|
|
|
|
|
``osd auto upgrade tmap``
|
|
|
|
:Description: Uses ``tmap`` for ``omap`` on old objects.
|
|
:Type: Boolean
|
|
:Default: ``true``
|
|
|
|
|
|
``osd tmapput sets users tmap``
|
|
|
|
:Description: Uses ``tmap`` for debugging only.
|
|
:Type: Boolean
|
|
:Default: ``false``
|
|
|
|
|
|
``osd preserve trimmed log``
|
|
|
|
:Description: Preserves trimmed log files, but uses more disk space.
|
|
:Type: Boolean
|
|
:Default: ``false``
|
|
|
|
|
|
|
|
.. _pool: ../../operations/pools
|
|
.. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction
|
|
.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering |