Filestore now properly fails to clone a non-existent object, which means
we should create one.
Fixes: #2062
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
These are also defined internally in ceph_fs.h, so use a guard. Annoying,
but gives us consistent naming (ceph_*/CEPH_*, not LIBCEPHFS_SETATTR_*).
Signed-off-by: Sage Weil <sage@newdream.net>
For now, until we have a better handle on the ext4 bug, and demonstrate
that it is a clear performance win with the full stack.
Signed-off-by: Sage Weil <sage@newdream.net>
Now, push progress is represented by ObjectRecoveryProgress. In
particular, rather than tracking data_subset_*ing, we track the furthest
offset before which the data will be consistent once cloning is complete.
sub_op_push now separates the pull response implementation from the
replica push implementation.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Require it for osd <-> osd and osd <-> mon communication.
This covers all the new encoding changes, except hobject_t, which is used
between the rados command line tool and the OSD for a object listing
position marker. We can't distinguish between specific types of clients,
though, and we don't want to introduce any incompatibility with other
clients, so we'll just have to make do here. :(
Signed-off-by: Sage Weil <sage@newdream.net>
A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
If is_degraded returns true for backfill, the object may not be
in any replica's missing set. Only call start_recovery_op if
we actually started an op. This bug could cause a stuck
in backfill error.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
If is_degraded returns true for backfill, the object may not be
in any replica's missing set. Only call start_recovery_op if
we actually started an op. This bug could cause a stuck
in backfill error.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
We haven't used this feature for years and years, and don't plan to. It
was there to facilitate "read shedding", where the primary OSD would
forward a read request to a replica. However, replicas can't reply back
to the client in that case because OSDs don't initiate connections (they
used to).
Rip this out for now, especially since osd_peer_stat_t just changed.
Signed-off-by: Sage Weil <sage@newdream.net>
We weren't using this, and it had broken (raw) encoding. The constructor
also didn't initialize fields properly.
Clear out the struct and use the new encoding scheme, so we can cleanly
add fields moving forward.
Signed-off-by: Sage Weil <sage@newdream.net>
We recently added a flush on activate, but we are still building the
transaction (the caller queues it), so calling osr.flush() here is totally
useless.
Instead, set a flag 'need_flush', and do the flush the next time we receive
some work.
This has the added benefit of doing the flush in the worker thread, outside
of osd_lock.
Signed-off-by: Sage Weil <sage@newdream.net>
If we are blackholing the disk, we need to make flush() wait forever, or
else the flush() logic will return (the IO wasn't queued!) and higher
layers will continue and (eventually) misbehave.
Signed-off-by: Sage Weil <sage@newdream.net>
We can receive an op with an old SnapContext that includes snaps that we've
already trimmed or are in the process of trimming. Filter them out!
Otherwise we will recreate and add links into collections we've already
marked as removed, and we'll get things like ENOTEMPTY when we try to
remove them. Or just leave them laying around.
Fixes: #1949
Signed-off-by: Sage Weil <sage@newdream.net>
The PG may be doing work relative to a different epoch than what the osd
has. Make sure the PG removal message is queued under that epoch to avoid
confusing/crashing the recipient like so:
2012-02-10 23:26:35.691793 7f387281f700 osd.3 514 queue_pg_for_deletion: 0.0
osd/OSD.cc: In function 'void OSD::handle_pg_remove(OpRequest*)' thread 7f387281f700 time 2012-02-10 23:26:35.691820
osd/OSD.cc: 4860: FAILED assert(pg->get_primary() == m->get_source().num())
Signed-off-by: Sage Weil <sage@newdream.net>