Ceph is a distributed object, block, and file storage platform
Go to file
Patrick Donnelly a747aeac13
log: avoid heap allocations for most log entries
Each log Entry now exists on the stack and uses a large (4k) buffer for its log
stream. This Entry is std::move'd to the queues (std::vector and
boost::circular_buffer) in the Log, involving only memory copies in the general
case. There are two memory copies (std::move) for any given Entry, once in
Log::submit_entry and again in Log::_flush

In practice, this eliminates 100% of allocations outside of startup
allocations

I've run a simple experiment with the MDS that copies /usr/bin to CephFS. I got
measurements for the number of allocations from the heap profiler and the
profile of CPU usage in the MDS.

** Before this patch **

    == Heap profile: ==

    $ google-pprof --alloc_objects --text bin/ceph-mds out/mds.a.profile.0001.heap
    Total: 1105048 objects
      433329  39.2%  39.2%   433329  39.2% ceph::logging::Log::create_entry
      209311  18.9%  58.2%   209311  18.9% __gnu_cxx::__aligned_membuf::_M_addr (inline)
      192963  17.5%  75.6%   192963  17.5% __gnu_cxx::new_allocator::allocate (inline)
       61774   5.6%  81.2%    61774   5.6% std::__cxx11::basic_string::_M_mutate
       37689   3.4%  84.6%    37689   3.4% ceph::buffer::raw_combined::create (inline)
       22773   2.1%  86.7%    22773   2.1% mempool::pool_allocator::allocate (inline)
       17761   1.6%  88.3%    20523   1.9% std::pair::pair (inline)
       15795   1.4%  89.7%    15797   1.4% std::swap (inline)
       11011   1.0%  90.7%   130061  11.8% std::__cxx11::list::_M_insert (inline)
       10822   1.0%  91.7%    10822   1.0% std::__cxx11::basic_string::reserve
        9108   0.8%  92.5%    32721   3.0% __gnu_cxx::new_allocator::construct (inline)
        8608   0.8%  93.3%     8610   0.8% std::_Deque_base::_M_initialize_map (inline)
        7694   0.7%  94.0%     7694   0.7% std::__cxx11::basic_string::_M_capacity (inline)
        6160   0.6%  94.5%     6160   0.6% Journaler::wrap_finisher
        6084   0.6%  95.1%    70892   6.4% std::map::operator[] (inline)
        5347   0.5%  95.6%     5347   0.5% MutationImpl::add_projected_fnode
        4381   0.4%  96.0%     7706   0.7% mempool::pool_allocator::construct (inline)
        3588   0.3%  96.3%   182966  16.6% Locker::_do_cap_update
        3049   0.3%  96.6%     5280   0.5% std::__shared_count::__shared_count (inline)
        3043   0.3%  96.9%     3043   0.3% MDSLogContextBase::MDSLogContextBase (inline)
        3038   0.3%  97.1%    14763   1.3% std::__uninitialized_copy::__uninit_copy (inline)

   So approximately 430k heap allocations for Entry were created! The
   basic_string::_M_mutate is also another allocation via PrebufferedStreambuf

	== Profile data ==

	Selecting interesting functions

Samples: 798K of event 'cycles:pp', Event count (approx
  Children      Self  Com  Shared Object        Symbol
+    1.04%     1.04%  log  libceph-common.so.0  [.] ceph::logging::Log::_flush
+    0.05%     0.05%  log  libceph-common.so.0  [.] ceph::logging::Log::flush
+    0.00%     0.00%  log  libceph-common.so.0  [.] ceph::logging::Log::_log_safe_write
+    0.00%     0.00%  log  libceph-common.so.0  [.] ceph::logging::Log::entry
+    0.00%     0.00%  log  libceph-common.so.0  [.] ceph::logging::Log::_flush_logbuf
...

  Children      Self  Command         Shared Object        Symbol
+    3.69%     0.00%  safe_timer      libceph-common.so.0  [.] CachedPrebufferedStreambuf::~CachedPrebufferedStreambuf
+    0.53%     0.00%  ms_dispatch     libceph-common.so.0  [.] CachedPrebufferedStreambuf::~CachedPrebufferedStreambuf
+    0.13%     0.00%  fn_anonymous    libceph-common.so.0  [.] CachedPrebufferedStreambuf::~CachedPrebufferedStreambuf
+    0.00%     0.00%  ms_dispatch     libceph-common.so.0  [.] CachedPrebufferedStreambuf::create
+    0.00%     0.00%  fn_anonymous    libceph-common.so.0  [.] CachedPrebufferedStreambuf::create

  Children      Self  Command         Shared Object        Symbol
+    0.07%     0.07%  fn_anonymous    libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  ms_dispatch     libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  ms_dispatch     ceph-mds             [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+    0.00%     0.00%  md_submit       libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  fn_anonymous    ceph-mds             [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+    0.00%     0.00%  safe_timer      libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  mds_rank_progr  libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  mds_rank_progr  ceph-mds             [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+    0.00%     0.00%  msgr-worker-0   libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  msgr-worker-2   libceph-common.so.0  [.] ceph::logging::Log::create_entry
+    0.00%     0.00%  md_submit       ceph-mds             [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+    0.00%     0.00%  msgr-worker-1   libceph-common.so.0  [.] ceph::logging::Log::create_entry

  Children      Self  Command         Shared Object        Symbol
+    8.29%     0.00%  ms_dispatch     libstdc++.so.6.0.24  [.] virtual thunk to std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+    7.55%     1.46%  ms_dispatch     libstdc++.so.6.0.24  [.] std::ostream::_M_insert<long>
+    3.87%     0.00%  fn_anonymous    libstdc++.so.6.0.24  [.] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+    2.92%     0.00%  md_submit       libstdc++.so.6.0.24  [.] virtual thunk to std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+    2.38%     0.00%  fn_anonymous    libstdc++.so.6.0.24  [.] virtual thunk to std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+    2.22%     2.22%  fn_anonymous    libstdc++.so.6.0.24  [.] std::ostream::sentry::sentry
+    1.89%     0.13%  fn_anonymous    libstdc++.so.6.0.24  [.] std::__ostream_insert<char, std::char_traits<char> >
+    0.71%     0.00%  ms_dispatch     libstdc++.so.6.0.24  [.] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+    0.39%     0.06%  fn_anonymous    libstdc++.so.6.0.24  [.] std::ostream::_M_insert<long>
+    0.29%     0.21%  ms_dispatch     libstdc++.so.6.0.24  [.] std::__ostream_insert<char, std::char_traits<char> >
+    0.27%     0.27%  ms_dispatch     libstdc++.so.6.0.24  [.] std::ostream::sentry::~sentry
+    0.27%     0.27%  fn_anonymous    libstdc++.so.6.0.24  [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<long>
+    0.22%     0.22%  ms_dispatch     libstdc++.so.6.0.24  [.] std::basic_streambuf<char, std::char_traits<char> >::xsputn
+    0.20%     0.20%  ms_dispatch     libstdc++.so.6.0.24  [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<long>
+    0.15%     0.15%  fn_anonymous    libstdc++.so.6.0.24  [.] std::ostream::sentry::~sentry
+    0.14%     0.14%  ms_dispatch     libstdc++.so.6.0.24  [.] std::ostream::sentry::sentry
+    0.13%     0.00%  ms_dispatch     libstdc++.so.6.0.24  [.] std::ostream::_M_insert<unsigned long>
+    0.13%     0.13%  fn_anonymous    libstdc++.so.6.0.24  [.] std::basic_streambuf<char, std::char_traits<char> >::xsputn
+    0.00%     0.00%  fn_anonymous    libstdc++.so.6.0.24  [.] std::ostream::_M_insert<unsigned long>
+    0.00%     0.00%  ms_dispatch     libstdc++.so.6.0.24  [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>

    And the unittest_log time:

    $ bin/unittest_log
    [==========] Running 15 tests from 1 test case
    [----------] Global test environment set-up
    [----------] 15 tests from Log
    [ RUN      ] Log.Simple
    [       OK ] Log.Simple (0 ms)
    [ RUN      ] Log.ReuseBad
    [       OK ] Log.ReuseBad (1 ms)
    [ RUN      ] Log.ManyNoGather
    [       OK ] Log.ManyNoGather (0 ms)
    [ RUN      ] Log.ManyGatherLog
    [       OK ] Log.ManyGatherLog (12 ms)
    [ RUN      ] Log.ManyGatherLogStringAssign
    [       OK ] Log.ManyGatherLogStringAssign (27 ms)
    [ RUN      ] Log.ManyGatherLogStringAssignWithReserve
    [       OK ] Log.ManyGatherLogStringAssignWithReserve (27 ms)
    [ RUN      ] Log.ManyGatherLogPrebuf
    [       OK ] Log.ManyGatherLogPrebuf (15 ms)
    [ RUN      ] Log.ManyGatherLogPrebufOverflow
    [       OK ] Log.ManyGatherLogPrebufOverflow (15 ms)
    [ RUN      ] Log.ManyGather
    [       OK ] Log.ManyGather (8 ms)
    [ RUN      ] Log.InternalSegv

    [WARNING] /home/pdonnell/cephfs-shell/src/googletest/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads
    [       OK ] Log.InternalSegv (8 ms)
    [ RUN      ] Log.LargeLog
    [       OK ] Log.LargeLog (43 ms)
    [ RUN      ] Log.TimeSwitch
    [       OK ] Log.TimeSwitch (1 ms)
    [ RUN      ] Log.TimeFormat
    [       OK ] Log.TimeFormat (0 ms)
    [ RUN      ] Log.Speed_gather
    [       OK ] Log.Speed_gather (1779 ms)
    [ RUN      ] Log.Speed_nogather
    [       OK ] Log.Speed_nogather (64 ms)
    [----------] 15 tests from Log (2000 ms total)

    [----------] Global test environment tear-down
    [==========] 15 tests from 1 test case ran. (2000 ms total)
    [  PASSED  ] 15 tests

** After Patch **

The StackStreamBuf uses 4k for its default buffer. This appears to be more than
reasonable for preventing allocations for logging

    == Heap profile: ==

	$ google-pprof --alloc_objects --text bin/ceph-mds out/mds.a.profile.0001.heap
	Total: 1052127 objects
      384957  36.6%  36.6%   384957  36.6% __gnu_cxx::new_allocator::allocate (inline)
      274720  26.1%  62.7%   274720  26.1% __gnu_cxx::__aligned_membuf::_M_addr (inline)
       68496   6.5%  69.2%    68496   6.5% std::__cxx11::basic_string::_M_mutate
       44140   4.2%  73.4%    51828   4.9% std::pair::pair (inline)
       43091   4.1%  77.5%    43091   4.1% ceph::buffer::raw_combined::create (inline)
       27706   2.6%  80.1%   236407  22.5% std::__cxx11::list::_M_insert (inline)
       25699   2.4%  82.6%    25699   2.4% std::__cxx11::basic_string::reserve
       23183   2.2%  84.8%    23183   2.2% mempool::pool_allocator::allocate (inline)
       19466   1.9%  86.6%    81716   7.8% __gnu_cxx::new_allocator::construct (inline)
       17606   1.7%  88.3%    17606   1.7% std::__cxx11::basic_string::_M_capacity (inline)
       16879   1.6%  89.9%    16881   1.6% std::swap (inline)
        8572   0.8%  90.7%     8574   0.8% std::_Deque_base::_M_initialize_map (inline)
        8477   0.8%  91.5%    11808   1.1% mempool::pool_allocator::construct (inline)
        6166   0.6%  92.1%     6166   0.6% Journaler::wrap_finisher
        6080   0.6%  92.7%   134975  12.8% std::map::operator[] (inline)
        6079   0.6%  93.3%     6079   0.6% MutationImpl::add_projected_fnode

	== Profile data ==

    Samples: 62K of event 'cycles:u', Event count (approx.)
      Overhead  Command         Shared Object         Symbol
    +    5.91%  log             libc-2.23.so          [.] vfprintf
    +    5.75%  ms_dispatch     libstdc++.so.6.0.24   [.] std::__ostream_insert<char, std::char_traits<char> >
    +    4.59%  ms_dispatch     ceph-mds              [.] StackStringBuf<4096ul>::xsputn
    +    4.26%  ms_dispatch     libc-2.23.so          [.] __memmove_ssse3_back
    +    4.07%  log             libceph-common.so.0   [.] ceph::logging::Log::_flush
    +    2.48%  ms_dispatch     libstdc++.so.6.0.24   [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<l
    +    2.13%  fn_anonymous    libstdc++.so.6.0.24   [.] std::__ostream_insert<char, std::char_traits<char> >
    +    2.09%  ms_dispatch     ceph-mds              [.] CDir::check_rstats
    +    2.06%  ms_dispatch     libstdc++.so.6.0.24   [.] std::ostream::sentry::sentry
    +    1.98%  ms_dispatch     libstdc++.so.6.0.24   [.] std::ostream::sentry::~sentry
    +    1.87%  log             libc-2.23.so          [.] __strcpy_sse2_unaligned
    +    1.60%  fn_anonymous    ceph-mds              [.] StackStringBuf<4096ul>::xsputn
    +    1.46%  log             libc-2.23.so          [.] _IO_default_xsputn
    +    1.45%  log             libc-2.23.so          [.] _itoa_word
    +    1.43%  fn_anonymous    libc-2.23.so          [.] __memmove_ssse3_back
    +    1.40%  ms_dispatch     libstdc++.so.6.0.24   [.] std::ostream::_M_insert<long>
    +    0.98%  fn_anonymous    libstdc++.so.6.0.24   [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<l
    +    0.89%  ms_dispatch     libstdc++.so.6.0.24   [.] 0x
    +    0.88%  ms_dispatch     libstdc++.so.6.0.24   [.] std::_Rb_tree_increment

    And the unittest_log time:

    $ bin/unittest_log
    [==========] Running 13 tests from 1 test case.
    [----------] Global test environment set-up.
    [----------] 13 tests from Log
    [ RUN      ] Log.Simple
    [       OK ] Log.Simple (1 ms)
    [ RUN      ] Log.ReuseBad
    [       OK ] Log.ReuseBad (0 ms)
    [ RUN      ] Log.ManyNoGather
    [       OK ] Log.ManyNoGather (0 ms)
    [ RUN      ] Log.ManyGatherLog
    [       OK ] Log.ManyGatherLog (83 ms)
    [ RUN      ] Log.ManyGatherLogStringAssign
    [       OK ] Log.ManyGatherLogStringAssign (79 ms)
    [ RUN      ] Log.ManyGatherLogStackSpillover
    [       OK ] Log.ManyGatherLogStackSpillover (81 ms)
    [ RUN      ] Log.ManyGather
    [       OK ] Log.ManyGather (80 ms)
    [ RUN      ] Log.InternalSegv

    [WARNING] /home/pdonnell/ceph/src/googletest/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads.
    [       OK ] Log.InternalSegv (7 ms)
    [ RUN      ] Log.LargeLog
    [       OK ] Log.LargeLog (55 ms)
    [ RUN      ] Log.TimeSwitch
    [       OK ] Log.TimeSwitch (4 ms)
    [ RUN      ] Log.TimeFormat
    [       OK ] Log.TimeFormat (1 ms)
    [ RUN      ] Log.Speed_gather
    [       OK ] Log.Speed_gather (1441 ms)
    [ RUN      ] Log.Speed_nogather
    [       OK ] Log.Speed_nogather (63 ms)
    [----------] 13 tests from Log (1895 ms total)

    [----------] Global test environment tear-down
    [==========] 13 tests from 1 test case ran. (1895 ms total)
    [  PASSED  ] 13 tests.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-09-14 19:46:08 -07:00
.github
admin
alpine
bin
ceph-erasure-code-corpus@2d7d78b9cc
ceph-object-corpus@e32bf8ca3d
cmake/modules
debian
doc Merge PR #23824 into master 2018-09-11 10:14:20 -07:00
etc
examples
fusetrace
keys
man
mirroring
monitoring/grafana
qa Merge pull request #23497 from noahdesu/insights 2018-09-11 15:56:10 +01:00
selinux
share
src log: avoid heap allocations for most log entries 2018-09-14 19:46:08 -07:00
sudoers.d
systemd
udev
.gitattributes
.githubmap
.gitignore
.gitmodule_mirrors
.gitmodules
.mailmap
.organizationmap
.peoplemap
AUTHORS
ceph.spec.in rpm: Fix Fedora error "No matching package to install: 'Cython3'" 2018-09-07 13:32:08 +10:00
CMakeLists.txt
CodingStyle
CONTRIBUTING.rst
COPYING
COPYING-GPL2
COPYING-LGPL2.1
do_cmake.sh
do_freebsd.sh
doc_deps.deb.txt
Doxyfile
install-deps.sh install-deps: s/openldap-client/openldap24-client/ 2018-09-04 23:42:37 +08:00
make-apk.sh
make-debs.sh
make-dist
make-srpm.sh
PendingReleaseNotes osd/PrimaryLogPG: s/list_missing/list_unfound/ 2018-09-06 09:52:20 +08:00
pom.xml
README.aix
README.alpine.md
README.FreeBSD
README.md
README.solaris
README.xio
run-make-check.sh
SubmittingPatches.rst

Ceph - a scalable distributed storage system

Please see http://ceph.com/ for current info.

Contributing Code

Most of Ceph is licensed under the LGPL version 2.1. Some miscellaneous code is under BSD-style license or is public domain. The documentation is licensed under Creative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0). There are a handful of headers included here that are licensed under the GPL. Please see the file COPYING for a full inventory of licenses by file.

Code contributions must include a valid "Signed-off-by" acknowledging the license for the modified or contributed file. Please see the file SubmittingPatches.rst for details on what that means and on how to generate and submit patches.

We do not require assignment of copyright to contribute code; code is contributed under the terms of the applicable license.

Checking out the source

You can clone from github with

git clone git@github.com:ceph/ceph

or, if you are not a github user,

git clone git://github.com/ceph/ceph

Ceph contains many git submodules that need to be checked out with

git submodule update --init --recursive

Build Prerequisites

The list of Debian or RPM packages dependencies can be installed with:

./install-deps.sh

Building Ceph

Note that these instructions are meant for developers who are compiling the code for development and testing. To build binaries suitable for installation we recommend you build deb or rpm packages, or refer to the ceph.spec.in or debian/rules to see which configuration options are specified for production builds.

Prerequisite: CMake 2.8.12

Build instructions:

./do_cmake.sh
cd build
make

This assumes you make your build dir a subdirectory of the ceph.git checkout. If you put it elsewhere, just replace .. in do_cmake.sh with a correct path to the checkout. Any additional CMake args can be specified setting ARGS before invoking do_cmake. See cmake options for more details. Eg.

ARGS="-DCMAKE_C_COMPILER=gcc-7" ./do_cmake.sh

To build only certain targets use:

make [target name]

To install:

make install

CMake Options

If you run the cmake command by hand, there are many options you can set with "-D". For example the option to build the RADOS Gateway is defaulted to ON. To build without the RADOS Gateway:

cmake -DWITH_RADOSGW=OFF [path to top level ceph directory]

Another example below is building with debugging and alternate locations for a couple of external dependencies:

cmake -DLEVELDB_PREFIX="/opt/hyperleveldb" -DOFED_PREFIX="/opt/ofed" \
-DCMAKE_INSTALL_PREFIX=/opt/accelio -DCMAKE_C_FLAGS="-O0 -g3 -gdwarf-4" \
..

To view an exhaustive list of -D options, you can invoke cmake with:

cmake -LH

If you often pipe make to less and would like to maintain the diagnostic colors for errors and warnings (and if your compiler supports it), you can invoke cmake with:

cmake -DDIAGNOSTICS_COLOR=always ..

Then you'll get the diagnostic colors when you execute:

make | less -R

Other available values for 'DIAGNOSTICS_COLOR' are 'auto' (default) and 'never'.

Building a source tarball

To build a complete source tarball with everything needed to build from source and/or build a (deb or rpm) package, run

./make-dist

This will create a tarball like ceph-$version.tar.bz2 from git. (Ensure that any changes you want to include in your working directory are committed to git.)

Running a test cluster

To run a functional test cluster,

cd build
make vstart        # builds just enough to run vstart
../src/vstart.sh --debug --new -x --localhost --bluestore
./bin/ceph -s

Almost all of the usual commands are available in the bin/ directory. For example,

./bin/rados -p rbd bench 30 write
./bin/rbd create foo --size 1000

To shut down the test cluster,

../src/stop.sh

To start or stop individual daemons, the sysvinit script can be used:

./bin/init-ceph restart osd.0
./bin/init-ceph stop

Running unit tests

To build and run all tests (in parallel using all processors), use ctest:

cd build
make
ctest -j$(nproc)

(Note: Many targets built from src/test are not run using ctest. Targets starting with "unittest" are run in make check and thus can be run with ctest. Targets starting with "ceph_test" can not, and should be run by hand.)

When failures occur, look in build/Testing/Temporary for logs.

To build and run all tests and their dependencies without other unnecessary targets in Ceph:

cd build
make check -j$(nproc)

To run an individual test manually, run ctest with -R (regex matching):

ctest -R [regex matching test name(s)]

(Note: ctest does not build the test it's running or the dependencies needed to run it)

To run an individual test manually and see all the tests output, run ctest with the -V (verbose) flag:

ctest -V -R [regex matching test name(s)]

To run an tests manually and run the jobs in parallel, run ctest with the -j flag:

ctest -j [number of jobs]

There are many other flags you can give ctest for better control over manual test execution. To view these options run:

man ctest

Building the Documentation

Prerequisites

The list of package dependencies for building the documentation can be found in doc_deps.deb.txt:

sudo apt-get install `cat doc_deps.deb.txt`

Building the Documentation

To build the documentation, ensure that you are in the top-level /ceph directory, and execute the build script. For example:

admin/build-doc