The test for unfound objects was reversed, leading us to try to pull
unfound objects and refrain from pulling objects that we knew how to
get. Should fix bug #585.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
This is a debug tool that can dump out Ceph information at various
epochs. For instance, it can show how the OSDmap changed over time.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
ReplicatedPG::get_object_context takes three parameters. The last two
are "const object_locator_t& oloc" and "bool can_create".
Unfortunately, booleans can degrade to ints, and ints can be used to
initialize objects of type object_locator_t.
So when you make a call like:
> ctx->snapset_obc = get_object_context(snapoid, true);
What happens is that you actually call:
> get_object_context(snapoid, object_locator(1), false);
So you pass an invalid and *not* blank object_locator_t, and pass false
for can_create. This is not what the caller wanted. This change gets rid
of the default parameters and fixes the callers.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Don't loop in ReplicatedPG::start_recovery_ops. There is already a loop
in both recover_replicas and recover_primary that will try to do as many
recovery ops as it can, there's no need to repeat it. Also, the former
loop provably would never execute more than once because of the way
the code was structured.
If there are no more recovery operations to do, and PG::is_all_uptodate
is true at the end of ReplicatedPG::start_recovery_ops, call
finish_recovery.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
This was missed by 184fbf582b, so any fs
created between now and then won't decode properly. It's more important
to make an fs prior to that work, though, so that the upgrade path from
the last stable version works.
Signed-off-by: Sage Weil <sage@newdream.net>
dir_auth_pins is a counter of dentry auth_pins in the current dir; those
need to be added in when stealing.
Signed-off-by: Sage Weil <sage@newdream.net>
We have the dirs split in our cache for some time while journaling it to
disk, before the fragment_notify goes out. Make sure we don't do a
scatterlock gather during that time that will confuse the inode auth (who
has their dirfrags fragmented differently).
Signed-off-by: Sage Weil <sage@newdream.net>
This makes the helper work for merge as well as split. Remove the special
fixups in the caller that were making split work before.
Signed-off-by: Sage Weil <sage@newdream.net>
This makes request lock auth_pins expire, so the fragment moves along.
Otherwise we can end up waiting for the log flush timer to go off.
This isn't a complete solution; in-progress requests won't know to flush.
Signed-off-by: Sage Weil <sage@newdream.net>
Track discover requests by tid. The old system of tracking outstanding
discovers was kludgey and somewhat broken. Also there is a possibility
of getting dup replies if someone does kick_requests().
There is still room for improvement with the logic detemrining when a
discover is sent: we may want to discover multiple dirfrags in parallel,
but the current code will only do one at a time.
Signed-off-by: Sage Weil <sage@newdream.net>
comment
If the inode already exists in our cache, adjust our (existing) fragments.
But it might not. In that case, we just replay the metablob.
Signed-off-by: Sage Weil <sage@newdream.net>
RedHat 5.5 has a /usr/include/linux/fiemap.h, but it is
broken because it does not itself include linux/types.h.
As a result, __u64 and friends are not defined.
We have a Ceph-local copy of fiemap.h, so use it
if the system version is broken.
While we're at it, fix up the configure message to
note we're using a local copy.
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
MDSMonitor: create_new_fs adapted to use the max_mds parameter
max_mds is now a configurable value and create_new_fs will initialize
max_mds to the specified value.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Add a feature bit DIRLAYOUTHASH.
Also fix client request routing for lookups (we were only hashing when
a Dentry pointer was provided, not when a relative path was).
Signed-off-by: Sage Weil <sage@newdream.net>
There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push all those objects out to
the replicas. Formerly, we would not start the second phase until there
were no missing objects at all on the primary.
This change modifies that so that we will start the second phase even if
there are unfound objects. However, we will still wait for all findable
missing objects to be brought to us, of course.
Get rid of uptodate_set. We can find the same information by looking at
the missing and missing_loc sets directly. Keeping the uptodate_set...
er... up-to-date would be very difficult in the presence of all the things
that can modify the missing and missing_loc sets.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging multi-OSD scenarios.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Add items to the bloom filter when trimming, and look for them
in the filter in the few places where a simple existence
check suffices for our needs.
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
You can now add items to a bloom filter and check for their existence.
This is intended to be used when trimming items out of the cache; the
filter is cleared when you mark_complete and is not transferred between
nodes. Neither does it change how you set or remove the STATE_COMPLETE flag.
You must explicitly check the bloom filter as appropriate; likewise, if
you start to fill it in you must always continue filling it in until
you delete the current instance of the filter.
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>