2013-12-08 21:03:33 +00:00
|
|
|
.\" Man page generated from reStructuredText.
|
|
|
|
.
|
|
|
|
.TH "CRUSHTOOL" "8" "December 09, 2013" "dev" "Ceph"
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH NAME
|
|
|
|
crushtool \- CRUSH map manipulation tool
|
2011-09-09 23:25:14 +00:00
|
|
|
.
|
|
|
|
.nr rst2man-indent-level 0
|
|
|
|
.
|
|
|
|
.de1 rstReportMargin
|
|
|
|
\\$1 \\n[an-margin]
|
|
|
|
level \\n[rst2man-indent-level]
|
|
|
|
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
|
|
-
|
|
|
|
\\n[rst2man-indent0]
|
|
|
|
\\n[rst2man-indent1]
|
|
|
|
\\n[rst2man-indent2]
|
|
|
|
..
|
|
|
|
.de1 INDENT
|
|
|
|
.\" .rstReportMargin pre:
|
|
|
|
. RS \\$1
|
|
|
|
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
|
|
|
|
. nr rst2man-indent-level +1
|
|
|
|
.\" .rstReportMargin post:
|
|
|
|
..
|
|
|
|
.de UNINDENT
|
|
|
|
. RE
|
|
|
|
.\" indent \\n[an-margin]
|
|
|
|
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
|
|
.nr rst2man-indent-level -1
|
|
|
|
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
|
|
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
|
|
|
..
|
|
|
|
.
|
2013-12-08 21:03:33 +00:00
|
|
|
.nr rst2man-indent-level 0
|
|
|
|
.
|
|
|
|
.de1 rstReportMargin
|
|
|
|
\\$1 \\n[an-margin]
|
|
|
|
level \\n[rst2man-indent-level]
|
|
|
|
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
|
|
-
|
|
|
|
\\n[rst2man-indent0]
|
|
|
|
\\n[rst2man-indent1]
|
|
|
|
\\n[rst2man-indent2]
|
|
|
|
..
|
|
|
|
.de1 INDENT
|
|
|
|
.\" .rstReportMargin pre:
|
|
|
|
. RS \\$1
|
|
|
|
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
|
|
|
|
. nr rst2man-indent-level +1
|
|
|
|
.\" .rstReportMargin post:
|
|
|
|
..
|
|
|
|
.de UNINDENT
|
|
|
|
. RE
|
|
|
|
.\" indent \\n[an-margin]
|
|
|
|
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
|
|
.nr rst2man-indent-level -1
|
|
|
|
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
|
|
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
|
|
|
..
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH SYNOPSIS
|
2011-09-09 23:25:14 +00:00
|
|
|
.nf
|
2013-11-19 02:05:18 +00:00
|
|
|
\fBcrushtool\fP ( \-d \fImap\fP | \-c \fImap.txt\fP | \-\-build \-\-num_osds \fInumosds\fP
|
2013-12-08 21:03:33 +00:00
|
|
|
\fIlayer1\fP \fI\&...\fP | \-\-test ) [ \-o \fIoutfile\fP ]
|
2011-09-09 23:25:14 +00:00
|
|
|
.fi
|
|
|
|
.sp
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH DESCRIPTION
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
\fBcrushtool\fP is a utility that lets you create, compile, and
|
|
|
|
decompile CRUSH map files.
|
|
|
|
.sp
|
|
|
|
CRUSH is a pseudo\-random data distribution algorithm that efficiently
|
|
|
|
maps input values (typically data objects) across a heterogeneous,
|
|
|
|
hierarchically structured device map. The algorithm was originally
|
|
|
|
described in detail in the following paper (although it has evolved
|
|
|
|
some since then):
|
|
|
|
.INDENT 0.0
|
|
|
|
.INDENT 3.5
|
|
|
|
\fI\%http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf\fP
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
2012-09-27 21:23:42 +00:00
|
|
|
The tool has four modes of operation.
|
2011-09-09 23:25:14 +00:00
|
|
|
.INDENT 0.0
|
2009-03-11 03:38:47 +00:00
|
|
|
.TP
|
2011-09-09 23:25:14 +00:00
|
|
|
.B \-c map.txt
|
|
|
|
will compile a plaintext map.txt into a binary map file.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
2009-03-11 03:38:47 +00:00
|
|
|
.TP
|
2011-09-09 23:25:14 +00:00
|
|
|
.B \-d map
|
|
|
|
will take the compiled map and decompile it into a plaintext source
|
|
|
|
file, suitable for editing.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
2009-03-11 03:38:47 +00:00
|
|
|
.TP
|
2013-04-29 23:01:03 +00:00
|
|
|
.B \-\-build \-\-num_osds {num\-osds} layer1 ...
|
2011-09-09 23:25:14 +00:00
|
|
|
will create a relatively generic map with the given layer
|
|
|
|
structure. See below for examples.
|
|
|
|
.UNINDENT
|
2012-09-27 21:23:42 +00:00
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
2013-11-19 02:05:18 +00:00
|
|
|
.B \-\-test
|
|
|
|
will perform a dry run of a CRUSH mapping for a range of input object
|
|
|
|
names, see crushtool \-\-help for more information.
|
2012-09-27 21:23:42 +00:00
|
|
|
.UNINDENT
|
2013-12-08 21:03:33 +00:00
|
|
|
.SH RUNNING TESTS
|
|
|
|
.sp
|
|
|
|
The test mode will use the input crush map ( as specified with \fB\-i
|
|
|
|
map\fP ) and perform a dry run of CRUSH mapping or random placement (
|
|
|
|
if \fB\-\-simulate\fP is set ). On completion, two kinds of reports can be
|
|
|
|
created. The \fB\-\-show\-...\fP options output human readable informations
|
|
|
|
on stderr. The \fB\-\-output\-csv\fP option creates CSV files that are
|
|
|
|
documented by the \fB\-\-help\-output\fP option.
|
2011-09-09 23:25:14 +00:00
|
|
|
.INDENT 0.0
|
2009-03-11 03:38:47 +00:00
|
|
|
.TP
|
2013-12-08 21:03:33 +00:00
|
|
|
.B \-\-show\-statistics
|
|
|
|
for each rule display the mapping of each object. For instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
CRUSH rule 1 x 24 [11,6]
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
shows that object \fB24\fP is mapped to devices \fB[11,6]\fP by rule
|
|
|
|
\fB1\fP\&. At the end of the mapping details, a summary of the
|
|
|
|
distribution is displayed. For instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
shows that rule \fB1\fP which is named \fBmetadata\fP successfully
|
|
|
|
mapped \fB1024\fP objects to \fBresult size == 5\fP devices when trying
|
|
|
|
to map them to \fBnum_rep 5\fP replicas. When it fails to provide the
|
|
|
|
required mapping, presumably because the number of \fBtries\fP must
|
|
|
|
be increased, a breakdown of the failures is displays. For instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
rule 1 (metadata) num_rep 10 result size == 8: 4/1024
|
|
|
|
rule 1 (metadata) num_rep 10 result size == 9: 93/1024
|
|
|
|
rule 1 (metadata) num_rep 10 result size == 10: 927/1024
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
shows that although \fBnum_rep 10\fP replicas were required, \fB4\fP
|
|
|
|
out of \fB1024\fP objects ( \fB4/1024\fP ) were mapped to \fBresult size
|
|
|
|
== 8\fP devices only.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B \-\-show\-bad\-mappings
|
|
|
|
display which object failed to be mapped to the required number of
|
|
|
|
devices. For instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
shows that when rule \fB1\fP was required to map \fB7\fP devices, it
|
|
|
|
could only map six : \fB[8,10,2,11,6,9]\fP\&.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B \-\-show\-utilization
|
|
|
|
display the expected and actual utilisation for each device, for
|
|
|
|
each number of replicas. For instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
device 0: stored : 951 expected : 853.333
|
|
|
|
device 1: stored : 963 expected : 853.333
|
|
|
|
\&...
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
shows that device \fB0\fP stored \fB951\fP objects and was expected to store \fB853\fP\&.
|
|
|
|
Implies \fB\-\-show\-statistics\fP\&.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B \-\-show\-utilization\-all
|
|
|
|
displays the same as \fB\-\-show\-utilization\fP but does not suppress
|
|
|
|
output when the weight of a device is zero.
|
|
|
|
Implies \fB\-\-show\-statistics\fP\&.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B \-\-show\-choose\-tries
|
|
|
|
display how many attempts were needed to find a device mapping.
|
|
|
|
For instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
0: 95224
|
|
|
|
1: 3745
|
|
|
|
2: 2225
|
|
|
|
\&..
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
shows that \fB95224\fP mappings succeeded without retries, \fB3745\fP
|
|
|
|
mappings succeeded with one attempts, etc. There are as many rows
|
|
|
|
as the value of the \fB\-\-set\-choose\-total\-tries\fP option.
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B \-\-output\-csv
|
|
|
|
create CVS files (in the current directory) containing information
|
|
|
|
documented by \fB\-\-help\-output\fP\&. The files are named after the rule
|
|
|
|
used when collecting the statistics. For instance, if the rule
|
|
|
|
metadata is used, the CSV files will be:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
metadata\-absolute_weights.csv
|
|
|
|
metadata\-device_utilization.csv
|
|
|
|
\&...
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
The first line of the file shortly explains the column layout. For
|
|
|
|
instance:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
metadata\-absolute_weights.csv
|
|
|
|
Device ID, Absolute Weight
|
|
|
|
0,1
|
|
|
|
\&...
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B \-\-output\-name NAME
|
|
|
|
prepend \fBNAME\fP to the file names generated when \fB\-\-output\-csv\fP
|
|
|
|
is specified. For instance \fB\-\-output\-name FOO\fP will create
|
|
|
|
files:
|
|
|
|
.INDENT 7.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
FOO\-metadata\-absolute_weights.csv
|
|
|
|
FOO\-metadata\-device_utilization.csv
|
|
|
|
\&...
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
The \fB\-\-set\-...\fP options can be used to modify the tunables of the
|
|
|
|
input crush map, provided the \fB\-\-enable\-unsafe\-tunables\fP option is
|
|
|
|
also set to disable the safeguard. The input crush map is modified in
|
|
|
|
memory. For example:
|
|
|
|
.INDENT 0.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
$ crushtool \-i mymap \-\-test \-\-show\-bad\-mappings
|
|
|
|
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
|
|
|
|
.ft P
|
|
|
|
.fi
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
|
|
|
.sp
|
|
|
|
could be fixed by increasing the \fBchoose\-total\-tries\fP as follows:
|
|
|
|
.INDENT 0.0
|
|
|
|
.INDENT 3.5
|
|
|
|
.INDENT 0.0
|
|
|
|
.TP
|
|
|
|
.B $ crushtool \-i mymap \-\-test
|
|
|
|
\-\-show\-bad\-mappings \-\-enable\-unsafe\-tunables \-\-set\-choose\-total\-tries 500
|
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
2011-09-09 23:25:14 +00:00
|
|
|
.UNINDENT
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH BUILDING A MAP
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
The build mode will generate relatively generic hierarchical maps. The
|
|
|
|
first argument simply specifies the number of devices (leaves) in the
|
|
|
|
CRUSH hierarchy. Each layer describes how the layer (or raw devices)
|
|
|
|
preceding it should be grouped.
|
|
|
|
.sp
|
|
|
|
Each layer consists of:
|
2013-12-08 21:03:33 +00:00
|
|
|
.INDENT 0.0
|
|
|
|
.INDENT 3.5
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
name ( uniform | list | tree | straw ) size
|
|
|
|
.ft P
|
|
|
|
.fi
|
2013-12-08 21:03:33 +00:00
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
The first element is the name for the elements in the layer
|
|
|
|
(e.g. "rack"). Each element\(aqs name will be append a number to the
|
|
|
|
provided name.
|
|
|
|
.sp
|
|
|
|
The second component is the type of CRUSH bucket.
|
|
|
|
.sp
|
|
|
|
The third component is the maximum size of the bucket. If the size is
|
|
|
|
0, a single bucket will be generated that includes everything in the
|
|
|
|
preceding layer.
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH EXAMPLE
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
2013-11-19 02:05:18 +00:00
|
|
|
Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
|
|
|
|
each node contains 4 storage devices for Ceph OSD Daemons. This configuration
|
|
|
|
allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
|
|
|
|
leaving an extra 2U for a rack switch.
|
|
|
|
.sp
|
|
|
|
To reflect our hierarchy of devices, nodes, racks and rows, we would execute
|
|
|
|
the following:
|
2013-12-08 21:03:33 +00:00
|
|
|
.INDENT 0.0
|
|
|
|
.INDENT 3.5
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
2013-11-19 02:05:18 +00:00
|
|
|
crushtool \-o crushmap \-\-build \-\-num_osds 320 node straw 4 rack straw 20 row straw 2
|
2011-09-09 23:25:14 +00:00
|
|
|
.ft P
|
|
|
|
.fi
|
2013-12-08 21:03:33 +00:00
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
To adjust the default (generic) mapping rules, we can run:
|
2013-12-08 21:03:33 +00:00
|
|
|
.INDENT 0.0
|
|
|
|
.INDENT 3.5
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
.nf
|
|
|
|
.ft C
|
|
|
|
# decompile
|
2013-11-19 02:05:18 +00:00
|
|
|
crushtool \-d crushmap \-o map.txt
|
2011-09-09 23:25:14 +00:00
|
|
|
|
|
|
|
# edit
|
|
|
|
vi map.txt
|
|
|
|
|
|
|
|
# recompile
|
2013-11-19 02:05:18 +00:00
|
|
|
crushtool \-c map.txt \-o crushmap
|
2011-09-09 23:25:14 +00:00
|
|
|
.ft P
|
|
|
|
.fi
|
2013-12-08 21:03:33 +00:00
|
|
|
.UNINDENT
|
|
|
|
.UNINDENT
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH AVAILABILITY
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
\fBcrushtool\fP is part of the Ceph distributed file system. Please
|
2012-09-27 21:23:42 +00:00
|
|
|
refer to the Ceph documentation at \fI\%http://ceph.com/docs\fP for more
|
2011-09-09 23:25:14 +00:00
|
|
|
information.
|
2009-03-11 03:38:47 +00:00
|
|
|
.SH SEE ALSO
|
2011-09-09 23:25:14 +00:00
|
|
|
.sp
|
|
|
|
\fBceph\fP(8),
|
|
|
|
\fBosdmaptool\fP(8),
|
|
|
|
.SH COPYRIGHT
|
2013-04-29 23:01:03 +00:00
|
|
|
2010-2013, Inktank Storage, Inc. and contributors. Licensed under Creative Commons BY-SA
|
2011-09-09 23:25:14 +00:00
|
|
|
.\" Generated by docutils manpage writer.
|
|
|
|
.
|