The error hanlding code in MDCache::handle_discover_reply() has two
main issues. MDCache::handle_discover_reply() does not wake waiters
if dir_auth_hint in reply message is equal to itself's nodeid. This
can happen if discover race with subtree importing. Another issue is
that it checks the existence of cached directory fragment to decide
if it should take waiter from inode or from directory fragment. The
check is unreliable because subtree importing can add directory
fragments to the cache.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
When frozen inode is encountered, MDCache::handle_discover() sends
reply immediately if the reply message is not empty. When handling
"discover ino" requests, the reply message always contains the base
directory fragment. But requestor already has the base directory
fragment, the only effect of the reply message is wake the requestor
and make it send same "discover ino" request again. So the requestor
keeps sending "discover ino" requests but can't make any progress.
The fix is set want_base_dir to false for MDCache::discover_ino().
After set want_base_dir to false, also need update the code that
handles "discover ino" error.
This patch also remove unused error handling code for flag_error_dn
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
We should delete dir fragment's bloom filter after exporting the dir
fragment to other MDS. Otherwise the residual bloom filter may cause
problem if the MDS imports dir fragment later.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
If predirty_journal_parents() does not propagate changes in dir's
fragstat into corresponding inode's dirstat, it should mark the
inode as dirfrag dirty. This happens when we modify dir fragments
that are auth subtree roots.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
At that point, the request already auth pins and locks some objects.
So CDir::fetch() should ignore the can_auth_pin check and continue
to fetch freezing dir.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The functional tests for the create operations should add and specify non-default
pools, but we don't have a set of library methods to do that yet (to interact with
the monitor).
Reuse old preferred_pg field. Only use if the new CREATEPOOLID feature
is present, and the value is >= 0.
Verify that the data pool is allowed, or return EINVAL to the client.
Signed-off-by: Sage Weil <sage@inktank.com>
This is a poor interface. The hadoop stuff is shifting to specify this
information on file creation instead.
Signed-off-by: Sage Weil <sage@inktank.com>
If we had a pending failure report, and send a cancellation, take it
out of our pending list so that we don't keep resending cancellations.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously, using the state on active worked, but now we might
go back through WaitRemoteRecoveryReserved without resetting
Active.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We don't want to change missing sets during a chunky
scrub since it would cause !is_clean() and derail
the rest of the scrub. Instead, move the missing,
inconsistent, and authoritative sets into scrubber
and add to during scrub_compare_maps(). Then,
handle repairing objects all at once in scrub_finish().
Signed-off-by: Samuel Just <sam.just@inktank.com>
Add tests for:
- sparse import makes expected sparse images
- sparse export makes expected sparse files
- sparse import from stdin also creates sparse images
- import from partially-sparse file leads to partially-sparse image
- import from stdin with zeros leads to sparse
- export from zeros-image to file leads to sparse file
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Try to accumulate image-sized blocks when importing from stdin, even if
each read is shorter than requested; if we get a full block, and it's
all zeroes, we can seek and make a sparse output file
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
We can get a pattern like so:
- new mon session
- after say 120 seconds, we decide to send a stats msg
- outstanding_pg_stats is finally true, we immediately time out (30 second
grace), and reconnect to a new mon
-> repeat
The problem is that we don't reset the last_sent timestamp when we send.
Or that we do this check after sending instead of before. Fix both.
This should resolve the issue #3661 where osds that don't have pgs
updating are not stats messags to the mon to check in, and are eventually
getting marked down as a result.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
- Replaced ceph.conf with Ceph configuration to clarify
when running multiple clusters on the same hardware.
- Added a [client] entry so people know it can be set too.
- Updated existing auth example.
- Added an authentication section with a link to the cephx guide.
- Added section for running multiple clusters. Per Tommi.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
These are from looking through the shortlog from 0.48.2..next.
The description of the min_size defaults could probably be improved.
I did not look closely at radosgw or cephfs changes.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This avoids the situation where a librados or other user with the default
of 'cephx,none' and no keyring is authenticating against a cluster with
required of 'none' and an annoying warning is generated every time. Now
we only print a helpful message if we actually failed.
Signed-off-by: Sage Weil <sage@inktank.com>
This means we can drop the scrub repair state_clear() call. We probably
can drop others, but lets leave that for another day.
Signed-off-by: Sage Weil <sage@inktank.com>
If both cephx and none are accepted auth methods, and
cephx keyring cannot be found then resort to using
none, instead of failing.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>