mirror of https://github.com/ceph/ceph
434 lines
10 KiB
Groff
434 lines
10 KiB
Groff
.\" Man page generated from reStructuredText.
|
|
.
|
|
.TH "CRUSHTOOL" "8" "January 12, 2014" "dev" "Ceph"
|
|
.SH NAME
|
|
crushtool \- CRUSH map manipulation tool
|
|
.
|
|
.nr rst2man-indent-level 0
|
|
.
|
|
.de1 rstReportMargin
|
|
\\$1 \\n[an-margin]
|
|
level \\n[rst2man-indent-level]
|
|
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
-
|
|
\\n[rst2man-indent0]
|
|
\\n[rst2man-indent1]
|
|
\\n[rst2man-indent2]
|
|
..
|
|
.de1 INDENT
|
|
.\" .rstReportMargin pre:
|
|
. RS \\$1
|
|
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
|
|
. nr rst2man-indent-level +1
|
|
.\" .rstReportMargin post:
|
|
..
|
|
.de UNINDENT
|
|
. RE
|
|
.\" indent \\n[an-margin]
|
|
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
.nr rst2man-indent-level -1
|
|
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
|
..
|
|
.
|
|
.nr rst2man-indent-level 0
|
|
.
|
|
.de1 rstReportMargin
|
|
\\$1 \\n[an-margin]
|
|
level \\n[rst2man-indent-level]
|
|
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
-
|
|
\\n[rst2man-indent0]
|
|
\\n[rst2man-indent1]
|
|
\\n[rst2man-indent2]
|
|
..
|
|
.de1 INDENT
|
|
.\" .rstReportMargin pre:
|
|
. RS \\$1
|
|
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
|
|
. nr rst2man-indent-level +1
|
|
.\" .rstReportMargin post:
|
|
..
|
|
.de UNINDENT
|
|
. RE
|
|
.\" indent \\n[an-margin]
|
|
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
.nr rst2man-indent-level -1
|
|
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
|
..
|
|
.SH SYNOPSIS
|
|
.nf
|
|
\fBcrushtool\fP ( \-d \fImap\fP | \-c \fImap.txt\fP | \-\-build \-\-num_osds \fInumosds\fP
|
|
\fIlayer1\fP \fI\&...\fP | \-\-test ) [ \-o \fIoutfile\fP ]
|
|
.fi
|
|
.sp
|
|
.SH DESCRIPTION
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \fBcrushtool\fP is a utility that lets you create, compile, decompile
|
|
and test CRUSH map files.
|
|
.UNINDENT
|
|
.sp
|
|
CRUSH is a pseudo\-random data distribution algorithm that efficiently
|
|
maps input values (typically data objects) across a heterogeneous,
|
|
hierarchically structured device map. The algorithm was originally
|
|
described in detail in the following paper (although it has evolved
|
|
some since then):
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
\fI\%http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf\fP
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
The tool has four modes of operation.
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-compile|\-c map.txt
|
|
will compile a plaintext map.txt into a binary map file.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-decompile|\-d map
|
|
will take the compiled map and decompile it into a plaintext source
|
|
file, suitable for editing.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-build \-\-num_osds {num\-osds} layer1 ...
|
|
will create map with the given layer structure. See below for a
|
|
detailed explanation.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-test
|
|
will perform a dry run of a CRUSH mapping for a range of input
|
|
object names. See below for a detailed explanation.
|
|
.UNINDENT
|
|
.sp
|
|
Unlike other Ceph tools, \fBcrushtool\fP does not accept generic options
|
|
such as \fB\-\-debug\-crush\fP from the command line. They can however be
|
|
provided via the CEPH_ARGS environment variable. For instance, to
|
|
silence all output from the CRUSH subsystem:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
CEPH_ARGS="\-\-debug\-crush 0" crushtool ...
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.SH RUNNING TESTS WITH --TEST
|
|
.sp
|
|
The test mode will use the input crush map ( as specified with \fB\-i
|
|
map\fP ) and perform a dry run of CRUSH mapping or random placement (
|
|
if \fB\-\-simulate\fP is set ). On completion, two kinds of reports can be
|
|
created. The \fB\-\-show\-...\fP options output human readable information
|
|
on stderr. The \fB\-\-output\-csv\fP option creates CSV files that are
|
|
documented by the \fB\-\-help\-output\fP option.
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-show\-statistics
|
|
for each rule display the mapping of each object. For instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
CRUSH rule 1 x 24 [11,6]
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
shows that object \fB24\fP is mapped to devices \fB[11,6]\fP by rule
|
|
\fB1\fP\&. At the end of the mapping details, a summary of the
|
|
distribution is displayed. For instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
shows that rule \fB1\fP which is named \fBmetadata\fP successfully
|
|
mapped \fB1024\fP objects to \fBresult size == 5\fP devices when trying
|
|
to map them to \fBnum_rep 5\fP replicas. When it fails to provide the
|
|
required mapping, presumably because the number of \fBtries\fP must
|
|
be increased, a breakdown of the failures is displays. For instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
rule 1 (metadata) num_rep 10 result size == 8: 4/1024
|
|
rule 1 (metadata) num_rep 10 result size == 9: 93/1024
|
|
rule 1 (metadata) num_rep 10 result size == 10: 927/1024
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
shows that although \fBnum_rep 10\fP replicas were required, \fB4\fP
|
|
out of \fB1024\fP objects ( \fB4/1024\fP ) were mapped to \fBresult size
|
|
== 8\fP devices only.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-show\-bad\-mappings
|
|
display which object failed to be mapped to the required number of
|
|
devices. For instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
shows that when rule \fB1\fP was required to map \fB7\fP devices, it
|
|
could only map six : \fB[8,10,2,11,6,9]\fP\&.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-show\-utilization
|
|
display the expected and actual utilisation for each device, for
|
|
each number of replicas. For instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
device 0: stored : 951 expected : 853.333
|
|
device 1: stored : 963 expected : 853.333
|
|
\&...
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
shows that device \fB0\fP stored \fB951\fP objects and was expected to store \fB853\fP\&.
|
|
Implies \fB\-\-show\-statistics\fP\&.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-show\-utilization\-all
|
|
displays the same as \fB\-\-show\-utilization\fP but does not suppress
|
|
output when the weight of a device is zero.
|
|
Implies \fB\-\-show\-statistics\fP\&.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-show\-choose\-tries
|
|
display how many attempts were needed to find a device mapping.
|
|
For instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
0: 95224
|
|
1: 3745
|
|
2: 2225
|
|
\&..
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
shows that \fB95224\fP mappings succeeded without retries, \fB3745\fP
|
|
mappings succeeded with one attempts, etc. There are as many rows
|
|
as the value of the \fB\-\-set\-choose\-total\-tries\fP option.
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-output\-csv
|
|
create CSV files (in the current directory) containing information
|
|
documented by \fB\-\-help\-output\fP\&. The files are named after the rule
|
|
used when collecting the statistics. For instance, if the rule
|
|
metadata is used, the CSV files will be:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
metadata\-absolute_weights.csv
|
|
metadata\-device_utilization.csv
|
|
\&...
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
The first line of the file shortly explains the column layout. For
|
|
instance:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
metadata\-absolute_weights.csv
|
|
Device ID, Absolute Weight
|
|
0,1
|
|
\&...
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \-\-output\-name NAME
|
|
prepend \fBNAME\fP to the file names generated when \fB\-\-output\-csv\fP
|
|
is specified. For instance \fB\-\-output\-name FOO\fP will create
|
|
files:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
FOO\-metadata\-absolute_weights.csv
|
|
FOO\-metadata\-device_utilization.csv
|
|
\&...
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
The \fB\-\-set\-...\fP options can be used to modify the tunables of the
|
|
input crush map. The input crush map is modified in
|
|
memory. For example:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
$ crushtool \-i mymap \-\-test \-\-show\-bad\-mappings
|
|
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
could be fixed by increasing the \fBchoose\-total\-tries\fP as follows:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.INDENT 0.0
|
|
.TP
|
|
.B $ crushtool \-i mymap \-\-test
|
|
\-\-show\-bad\-mappings \-\-set\-choose\-total\-tries 500
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.SH BUILDING A MAP WITH --BUILD
|
|
.sp
|
|
The build mode will generate hierarchical maps. The first argument
|
|
specifies the number of devices (leaves) in the CRUSH hierarchy. Each
|
|
layer describes how the layer (or devices) preceding it should be
|
|
grouped.
|
|
.sp
|
|
Each layer consists of:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
bucket ( uniform | list | tree | straw ) size
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
The \fBbucket\fP is the type of the buckets in the layer
|
|
(e.g. "rack"). Each bucket name will be built by appending a unique
|
|
number to the \fBbucket\fP string (e.g. "rack0", "rack1"...).
|
|
.sp
|
|
The second component is the type of bucket: \fBstraw\fP should be used
|
|
most of the time.
|
|
.sp
|
|
The third component is the maximum size of the bucket. A size of zero
|
|
means a bucket of infinite capacity.
|
|
.SH EXAMPLE
|
|
.sp
|
|
Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
|
|
each node contains 4 storage devices for Ceph OSD Daemons. This configuration
|
|
allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
|
|
leaving an extra 2U for a rack switch.
|
|
.sp
|
|
To reflect our hierarchy of devices, nodes, racks and rows, we would execute
|
|
the following:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
$ crushtool \-o crushmap \-\-build \-\-num_osds 320 \e
|
|
node straw 4 \e
|
|
rack straw 20 \e
|
|
row straw 2 \e
|
|
root straw 0
|
|
# id weight type name reweight
|
|
\-87 320 root root
|
|
\-85 160 row row0
|
|
\-81 80 rack rack0
|
|
\-1 4 node node0
|
|
0 1 osd.0 1
|
|
1 1 osd.1 1
|
|
2 1 osd.2 1
|
|
3 1 osd.3 1
|
|
\-2 4 node node1
|
|
4 1 osd.4 1
|
|
5 1 osd.5 1
|
|
\&...
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
CRUSH rulesets are created so the generated crushmap can be
|
|
tested. They are the same rulesets as the one created by default when
|
|
creating a new Ceph cluster. They can be further edited with:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
# decompile
|
|
crushtool \-d crushmap \-o map.txt
|
|
|
|
# edit
|
|
emacs map.txt
|
|
|
|
# recompile
|
|
crushtool \-c map.txt \-o crushmap
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.SH AVAILABILITY
|
|
.sp
|
|
\fBcrushtool\fP is part of Ceph, a massively scalable, open-source, distributed storage system. Please
|
|
refer to the Ceph documentation at \fI\%http://ceph.com/docs\fP for more
|
|
information.
|
|
.SH SEE ALSO
|
|
.sp
|
|
\fBceph\fP(8),
|
|
\fBosdmaptool\fP(8),
|
|
.SH AUTHORS
|
|
.sp
|
|
John Wilkins, Sage Weil, Loic Dachary
|
|
.SH COPYRIGHT
|
|
2010-2014, Inktank Storage, Inc. and contributors. Licensed under Creative Commons BY-SA
|
|
.\" Generated by docutils manpage writer.
|
|
.
|