Compare all keys within the sync'ed prefixes across members of the quorum
and compare the key counts and CRC for inconsistencies.
Currently this is a one-shot inefficient hammer. We'll want to make this
work in chunks before it is usable in production environments.
Protect with a feature bit to avoid sending MMonScrub to mons who can't
decode it.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The current interaction between sync and stashing full osdmaps only on
active mons means that a sync can result in an incomplete osdmap_full
history:
- mon.c starts a full sync
- during sync, active osdmap service should_stash_full() is true and
includes a full in the txn
- mon.c sync finishes
- mon.c update_from_paxos gets "latest" stashed that it got from the
paxos txn
- mon.c does *not* walk to previous inc maps to complete it's collection
of full maps.
To fix this, we disable the periodic/random stash of full maps by the
osdmap service.
This introduces a new problem: we must have at least one full map (the first
one) in order for a mon that just synced to build it's full collection.
Extend the encode_trim() process to allow the osdmap service to include
the oldest full map with the trim txn. This is more complex than just
writing the full maps in the txn, but cheaper--we only write the full
map at trim time.
This *might* be related to previous bugs where the full osdmap was
missing, or case where leveldb keys seemed to 'disappear'.
Fixes: #5512
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
This reverts commit 352f362567.
Reverting this commit because it causes problems with the debian build, and
reopening #5492. The root problem appears to be lack of support by GNU
autotools for installing into both /sbin and /usr/sbin using the standard
location variables.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Bucket link was assuming the bucket head object was holding the
bucket acl, which is not true anymore.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
The once-upon-a-time unique O_LAZY value I chose forever ago is now
O_NOATIME, which means that some clients are choosing relaxed
consistency without meaning to.
It is highly unlikely that a real O_LAZY will ever exist, and we can
select it in the ceph case with the ioctl or libcephfs call, so drop
any support for doing this via open(2) flags.
Update doc/lazy_posix.txt file re: lazy io.
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
On i386 this fails to build with
common/crc32c-intel.c: In function 'ceph_have_crc32c_intel':
error: common/crc32c-intel.c:79:9: PIC register clobbered by 'ebx' in 'asm'
ARM had more to complain about.
Not sure where this test came from, but it is clearly not meant for
anything other than x86_64.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
unit tests for the ObjectContext methods ondisk_write_lock,
ondisk_write_unlock, ondisk_read_lock and ondisk_read_unlock.
A class derived from ::testing::Test is created with two sub-classes (
Thread_read_lock & Thread_write_lock ) to provide a separate thread
that can block with cond.Wait(). usleep(3) is used in the main thread
to wait for the expected side effect with increasing delays ( up to
MAX_DELAY ).
http://tracker.ceph.com/issues/5487 refs #5487
Signed-off-by: Loic Dachary <loic@dachary.org>
http://tracker.ceph.com/issues/3074fixes#3074
The support of --help option is added through this patch.
By now, it displays the generic options usage used in radosgw.
Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
It is possible to start a sync when our newest monmap is 0. Usually we see
e0 from probe, but that isn't always published as part of the very first
paxos transaction due to the way PaxosService::_active generates it's
first initial commit.
In any case, having e0 here is harmless.
Fixes: #5509
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
function was referring bucket info object directly, instead of going
through helper functions, which is now a must.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
From: Yan, Zheng <yan.zheng@intel.com>
Simple reproducer for #5453, modified to run for a finite number of
iterations.
Signed-off-by: Sage Weil <sage@inktank.com>
108000 is about 3 hours if paxos is going full-bore (1 proposal/second).
That ought to be pretty safe. Otherwise, we start trimming to soon and a
slow sync will just have to restart when it finishes.
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Do not assume that because at least one OSD has an hb_front addr that they
all do, or else we will end up assigning garbage here and later thinking
it is a addr (or, more precisely, != entity_addr_t()).
Fixes: #5460
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
If we have a real addr for hb_front for a given osd and then a new map
has the osd coming up without an hb_front, we need to clear the addr
field.
Also, improve the debug output in add_heartbeat_peer() so we can tell if
we have no connection or a connection to a blank addr.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Previously we would sample how many ops to start under the lock, drop it,
and start that many. This is racy because multiple threads can jump in
and we start too many ops. Instead, claim as many slots as we can and
release them back later if we do not end up using them.
Take care to re-wake the work-queue since we are releasing more resources
for wq use.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>