With OSD sharing data and journal, the previous code created the
journal partiton from the end of the device. A uint32_t is
used in sgdisk to get the last sector, with large HD, uint32_t
is too small.
The journal partition will be created backwards from the
a sector in the midlle of the disk leaving space before
and after it. The data partition will use whichever of
these spaces is greater. The remaining will not be used.
This patch creates the journal partition from the start as a workaround.
Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
When starting we often loop over many daemon instances. Currently we stop
on the first error and do not try to start other daemons.
Instead, try them all, but return a failure if anything did not start.
Fixes: #2545
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
Call observers so that the logging infrastructure gets initailized and we
start logging. Otherwise, unless a default log setting has been modified,
we won't start logging until we daemonize, and we won't get the nice
version banner in the log file.
Unlike the previous attempt to fix this (a3091774), we do this after all
of the lockdep initialization has completed.
Signed-off-by: Sage Weil <sage@inktank.com>
This allows us to return the appropriate overall health status on
Monitor::get_health().
Fixes: 4574
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
This involves three pieces:
For intrusive_ptr type references, we use TrackedIntPtr instead. This
uses get_with_id and put_with_id to associate an id and backtrace with
each particular ref instance.
For refs taken via direct calls to get() and put(), get and put now
require a tag string. The PG tracks individual ref counts for each tag
as well as the total.
Finally, PGs register/unregister themselves on construction/destruction
with OSDService.
As a result, on shutdown, we can check for live pgs and determine where
the references are held.
This behavior is compiled out by default, but can be included with the
--enable-pgrefdebugging flag.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Clarify the description; this is the subtree type that we won't mark out
if it is all down, but anything less than it will be.
Signed-off-by: Sage Weil <sage@inktank.com>
In EMetaBlob::add_root(), we should log the projected root xattrs
instead of original ones to reflect xattr changes.
Signed-off-by: Kuan Kai Chiu <big.chiu@bigtera.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
MDS crashes while journaling dirty root inode in handle_client_setxattr
and handle_client_removexattr. We should use journal_dirty_inode to
safely log root inode here.
Signed-off-by: Kuan Kai Chiu <big.chiu@bigtera.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Add libboost-system-dev (bug #4725).
Add hdparm to rpm installation requirements. The hdparm
command is used to determin if write-caching is enabled on
the journal device.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Say a service establishes it will only keep 500 versions once a given
condition X is true. Now say that said condition X only becomes true
after said service committing some 800 versions.
Once we decide to trim, this service would trim all 300 surplus versions
in one go. After that, each committed version would also trim the
previous version.
Trimming an unbounded number of versions is not a good practice
as it will generate bigger transactions (thus a greater workload on
leveldb) and therefore bigger messages too.
Constantly trimming versions implies more frequent accesses to leveldb,
and keeping around a couple more versions won't hurt us in any significant
way, so let us put off trimming unless we go over a predefined minimum.
This patch adds two new options:
paxos service trim min - minimum amount of versions to trigger a trim
(default: 30, 0 disables it)
paxos service trim max - maximum amount of versions to trim during a
single proposal
(default: 50, 0 disables it)
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>