This was mixed up with min/max_op_len. And max_ops wasn't being used
the initial object creation stage, flooding the OSDs. Or during run().
Signed-off-by: Sage Weil <sage@newdream.net>
If we search_for_missing() on a host, make a corresponding entry in our
peer_missing map (if it isn't already there). This ensure we get (empty)
entries for strays, which makes all_unfound_are_queried_or_lost() happy.
Signed-off-by: Sage Weil <sage@newdream.net>
Otherwise, a 0 length write to an offset past the end of the file will
cause the internal accounting to reflect the full size of the file, but
not the file on disk.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Capture Alexandre's script for reproducing #1774 here for posterity, until
we write a properly harnessed test for this. Currently, workunits can't
mount/unmount, and we don't have a way to make ceph-fuse drop it's cache.
Signed-off-by: Sage Weil <sage@newdream.net>
The tail needs to refer to the entry preceeding the first entry in the
log. This updates copy_up_to() to match the basic structure of the other
copy_*() methods.
Signed-off-by: Sage Weil <sage@newdream.net>
Since last_backfill is hobject_t(), we can set this equal to last_update.
This fixes a problem where last_complete preceeds the abbreviated log we
send to the replica below.
Signed-off-by: Sage Weil <sage@newdream.net>
This (mostly) copies debian/copyright for now, but there are format
restrictions for that file. Suggestions for a cleaner way to handle this
are welcome. In the meantime, this is better...
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
rados_ioctx_locator_set_key is void. The return value seems to have
been uninitialized, so the tests failed rarely.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
The num_objects check doesn't make sense, and could only make trimming
happen more often than it should. Sage did not remember why it was
added.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
This helps prevent problems with retrying requests being detected as
duplicates. The problem occurs when the log is trimmed too
aggressively, and an earlier tid is removed from the log, while a
later one is not. The later request will be detected as a duplicate
and responded to immediately, possibly violating the ordering of the
requests.
Partially fixes#1490.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Use upper_bound rather than lower_bound to compute the initial pd within
insert_trace, so that we don't attempt to remove it if it happens to be
in the same frag as the new reply.
Fixes: #1774
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This was broken in def36668a1 it looks like.
Passing uninitialized memory to resolve_addrs(), and needlessly
allocating a buffer.
Signed-off-by: Sage Weil <sage@newdream.net>
This makes backfill restart less of a special case: we send an info AND
log, just like we do normally. Code paths are more similar than before.
The main change here is that the backfill target gets a pg log with recent
history, which allows it to more reliably detect dup operations.
Signed-off-by: Sage Weil <sage@newdream.net>
We need at least one non-incomplete replica during a rw interval in order
to peer. The backfilling/incomplete replicas get log entries, but not
all object writes, so they are (mostly) excluded from the peering process
(find_best_info(), in particular).
We can't do this during the PriorSet calculation because we don't have
their PG::Info yet. But, once we get it, we need to make sure at least one
of the replicas during the last rw interval is not incomplete, or else we
should mark the pg DOWN (just like the PriorSet calculation does).
This logic mostly mirrors that of PriorSet, but additionally requires
the replicas be !incomplete.
Signed-off-by: Sage Weil <sage@newdream.net>
Right now this is only exposed via the monitor command interface:
osd pool create <poolname> [pg_num [pgp_num]]
but it can be expanded to other interfaces as appropriate.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Match prototype and implementation argument names and types
(textually, that is use std:: prefix).
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
If it takes 2*mds_beacon_grace (default 30 seconds total) seconds
to get an ack back, maybe it's the monitor and not us. Try a reconnect,
which will just add the teensiest bit of load if we're wrong.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>