In 98e23980f4 is_readable() was changed to
call is_active(), but that has a check for is_bootstrapping(), so there is
a semantic change.
As a result, we may fail PaxosService::is_readable() (due to bootstrapping)
and then try to call Paxos::wait_for_readable(). That will assert that
Paxos::is_readable() is false, but it will be true and we will crash.
Revert that part of the change, since the semantic change was not
intentional.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The existing read_iterate takes a size_t for the length, which is only 4GB
on 32-bit machines. Instead, take a uint64_t length for the new
read_iterate2().
Return 0 instead of the number of bytes read; this makes the user-facing
API a bit simpler.
Fixes: #4665
Signed-off-by: Sage Weil <sage@inktank.com>
keep bytes return from internal method
The read() method returns the bytes read, trimmed to the end of the image;
use the other read() variant to do this (which use aio_read()) instead of
read_iterate().
Signed-off-by: Sage Weil <sage@inktank.com>
On each election, we resend routed requests to the new leader (or
requeue for ourselves). Therefore, if we receive a forwarded request,
we should drop it on the floor if there is a new election. Add a field
in the PaxosServiceMessage struct to track which election epoch we
received the request in, and drop it in PaxosService::dispatch() if
that is in the past.
Signed-off-by: Sage Weil <sage@inktank.com>
If we have requests that we have forwarded, and are elected leader,
requeue those requests for ourself and queue them normally and clear out
the routed_requests map.
Signed-off-by: Sage Weil <sage@inktank.com>
Keep a reference to the source Connection* for forwarded requests. This
makes the reply path slightly cleaner, and will help us later.
Signed-off-by: Sage Weil <sage@inktank.com>
The journal no longer assumes corruption if it finds a valid entry
after an inavlid entry. Instead, these tests will exercise the
corruption detection via the header committed_up_to member.
Fixes: #4792
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The old code would only do the push once per remote node (due to the
list in $pushed_to) but would reset $unique on each attempt. This would
break if a remote host was processed twice.
Fix by just skipping the $pushed_to optimization entirely.
Fixes: #4794
Reported-by: Andreas Friedrich <andreas.friedrich@ts.fujitsu.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Two fixes for Centos 6.3 and other systems with udev versions
prior to 172. The disk peristant name using the GPT UUID does
not exist, so use the by_path persistent name instead for the
journal symlink.
The gpt label fields are not available for use in udev rules. Add
ceph-disk-udev wrapper script that extracts the partition
type guid from the label and calls ceph-disk-activate if it is
a ceph guid type. (Bug #4632)
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Instead of allowing services to directly use 'propose_pending()' on
other services, we instead add two new functions:
- request_proposal() to request 'this' service to propose its
pending value; and
- request_proposal(PaxosService *other) so that 'this' service
can request a proposal to 'other'
These functions should allow us to enforce a greater set of
constraints at time of a cross-proposal, either by making sure a
service will (e.g.) hold-off his own proposals until said proposal
is performed, or even that the other service will enforce a tighter
set of constraints that wouldn't otherwise be enforced by using
'propose_pending()' directly.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
This moves our version pointer up so that we don't re-log (by re-consuming)
log messages to /var/log/ceph/ceph.log on ceph-mon restart. OTOH, it means
we rewrite the summary of the last 50 messages, but we consider that to be
relatively cheap (and something we *always* did prior for bobtail and
earlier anyway).
Signed-off-by: Sage Weil <sage@inktank.com>
Set an interval to periodically write a full copy of the map that is lower
than the trim point (which is generally a very large number of commits).
Signed-off-by: Sage Weil <sage@inktank.com>
In order of interest/priority:
- our latest monmap version
- a backup monmap version created during sync start, if the store
appears to be in a post-aborted sync state
- a mkfs monmap version
If none of these are found, we should go ahead and try to build a
monmap from ceph.conf to join an existing cluster.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
If by fate we end up attempting a store sync after failing at
least one before, we might not have a monmap to read from the
store to backup. Therefore, in that case, we shall backup the
current monmap being used by the monitor.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Fixes: #4776
Backport: bobtail
Need to make sure that when copying an object into itself we don't
send the tail to the garbage collection.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Only set the STRIPINGV2 feature if the striping parameters are non-default.
Specifically, fix the case where the passed-in size and count are == 0.
Fixes: #4710
Signed-off-by: Sage Weil <sage@inktank.com>
Out of order journal entry writes using aio may cause entry
n+2 to be written prior to n. This does not indicate
corruption.
Fixes: #4736
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
If get or create keys returns permssion denied, exit
gracefully instead of retrying.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Change local names to be clearer
Break real_log() into common function get_log()
Move infos_oid, biginfo_oid and log_oid to globals for general use
Feature: #4201 (osd: data loss: pg export/import/remove)
Signed-off-by: David Zafman <david.zafman@inktank.com>