The filestore requires hobjects to be globally unique.
Fixes: #5240
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
uint64_t is passed in, but int was extracted. This fails on 32-bit builds.
Fixes: #5220
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 17029b270d)
The tests covers 100% of the LOC of merge_old_entry. It is broken down
in 13 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced. For instance:
info.last_backfill = hobject_t();
info.last_backfill.hash = 1;
oe.soid.hash = 2;
creates the conditions where merge_log_entry is expected to silently
ignore entries containing an object that is greater than
last_backfill.
PGLogTest is derived from PGLog to get access to the protected members.
Signed-off-by: Loic Dachary <loic@dachary.org>
The ObjectCacher and MonClient classes both instantiate Finisher
threads. We need to make sure they are created *after* the fork(2)
or else the process will fail to join() them on shutdown, and the
threads will not exist while fuse is doing useful work.
Put CephFuse on the heap and move all this initalization into the child
block, and make sure errors are passed back to the parent.
Fix-proposed-by: Alexandre Marangone <alexandre.maragone@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
We were double-incrementing p, both in the for statement and in the
body. While we are here, drop the unnecessary else's.
Signed-off-by: Sage Weil <sage@inktank.com>
there is no side effect.
The PGLog::clear function is added to reset all data members to the
same state they have after the object is constructed.
Signed-off-by: Loic Dachary <loic@dachary.org>
In the scenario:
- leader wins, peons lose
- leader sees it is too far behind on paxos and bootstraps
- leader tries to sync with someone, waits for a quorum of the others
- peons sit around forever waiting
The problem is that they never time out because paxos never issues a lease,
which is the normal timeout that lets them detect a leader failure.
Avoid this by starting the lease timeout as soon as we lose the election.
The timeout callback just does a bootstrap and does not rely on any other
state.
I see one possible danger here: there may be some "normal" cases where the
leader takes a long time to issue its first lease that we currently
tolerate, but won't with this new check in place. I hope that raising
the lease interval/timeout or reducing the allowed paxos drift will make
that a non-issue. If it is problematic, we will need a separate explicit
"i am alive" from the leader while it is getting ready to issue the lease
to prevent a live-lock.
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
If the client is not connected, discard the message. They will
reconnect and resend anyway, so there is no point in processing it
twice (now and later).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
- trim more at a time (by an order of magnitude)
- rename fields to paxos_trim_{min,max}; only trim when there are min items
that are trimmable, and trim at most max items at a time.
- adjust the paxos_service_trim_{min,max} values up by a factor of 2.
Since we are compacting every time we trim, adjusting these up mean less
frequent compactions and less overall work for the monitor.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
OSDMap::get_down_at() asserts that the osd exists.
Fixes: #5223
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
All lines of code are tested. The conditions under which some methods
could corrupt the content of a pg_missing_t object have not been
investigated. Since the data members are public, the caller is
ultimately responsible for the consistency of the object and the
methods have no way to enforce it.
The semantics of is_missing have been discussed in
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/15280
Signed-off-by: Loic Dachary <loic@dachary.org>
We don't actually need to write out the pg map epoch on every
activate_map as long as:
a) the osd does not trim past the oldest pg map persisted
b) the pg does update the persisted map epoch from time
to time.
To that end, we now keep a reference to the last map persisted.
The OSD already does not trim past the oldest live OSDMapRef.
Second, handle_activate_map will trim if the difference between
the current map and the last_persisted_map is large enough.
Fixes: #4731
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89)
Conflicts:
src/common/config_opts.h
src/osd/PG.cc
- last_persisted_osdmap_ref gets set in the non-static
PG::write_info
Conflicts:
src/osd/PG.cc
CID 716927 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "diri->snaprealm" to function
"SnapRealm::resolve_snapname(std::string const &
Make sure not to dereference diri->snaprealm.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>