We can no longer use the messenger source information to determine
the origin of the message since an osd might have more than one
shard of a particular pg. Thus, we need to include a pg_shard_t
from field to indicate origin. Similarly, pg_t is no longer
sufficient to specify the destination pg, we instead use spg_t.
In the event that we get a message from an old peer, we default
from to pg_shard_t(get_source().num(), ghobject_t::no_shard())
and spg_t to spg_t(pgid, ghobject_t::no_shard()). This suffices
because non-NO_SHARD shards can only appear once ec pools have
been enabled -- and doing that bans unenlightened osds.
Signed-off-by: Samuel Just <sam.just@inktank.com>
If ne.version < oe.version, the correct answer is to rollback oe.version
if possible regardless of what the entries are.
Also, rearrange to deal with the fact that we cannot roll back a missing
object.
Signed-off-by: Samuel Just <sam.just@inktank.com>
It's handy to allow a pool to answer false to ec_pool() and
true to require_rollback() in order to allow a replicated
pool to test the rollback mechanisms without allowing
non-NO_SHARD shards.
Signed-off-by: Samuel Just <sam.just@inktank.com>
The boost interval_set class is not available on centos6/rhel6. Until that
dependency is sorted out, fix the build.
Signed-off-by: Sage Weil <sage@inktank.com>
OSD should return no data if the read size is trimmed to zero by the
truncate_seq/truncate_size check. We can't rely on ObjectStore::read()
to do that because it reads the entire object when the 'len' parameter
is zero.
Fixes: #7371
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Each ceph-osd process's Objecter instance has a sequence
of tid's that start at 1. To ensure these are unique
across all time, set the client incarnation to the
OSDMap epoch in which we booted.
Note that the MDS does something similar (except the
incarnation is actually the restart count for the MDS
rank, since the MDSMap tracks that explicitly).
Backport: emperor
Signed-off-by: Sage Weil <sage@inktank.com>
We need to focus agent attention on those PGs that most need it. For
starters, full PGs need immediate attention so that we can unblock IO.
More generally, fuller ones will give us the best payoff in terms of
evicted data vs effort expended finding candidate objects.
Restructure the agent queue with priorities. Quantize evict_effort so that
PGs do not jump between priorities too frequently.
Signed-off-by: Sage Weil <sage@inktank.com>
If we are full and get a write request to a new object, put the op on a
wait list. Wake up when the agent frees up some space.
Note that we do not block writes to existing objects. That would be a
more aggressive strategy, but it is difficult to know up front whether we
will increase the size of the object or not, so we just leave it be. I
suspect this strategy is "good enough".
Also note that we do not yet prioritize agent attention to PGs that most
need eviction (e.g., those that are full).
Signed-off-by: Sage Weil <sage@inktank.com>