Previously we just had to give up on ESTALE. Now
we can attempt to recover!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
The slave also can hold some auth pins from locks which the
master has asked it to grab. It's possible we can intelligently
determine how many, but for now just drop the assert.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Previously it ignored the auth pin required to hold snap xlock, which
is currently always held for a rename on a dir. This would lead to
a permanent hang on the request. Now we account for it!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We were already taking rdlocks on the source tree, to make
sure that each slave MDS could traverse to the source dentry. Now,
if there are slave MDSes, we take rdlocks on each destination
ancestor to make sure the slaves can also traverse there.
This fixes an fsstress bug.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
We were being sloppy before with the ESubtreeMap vs import/export events.
Fix that by doing a few things:
- add an ambig flag to the subtree map items, and set it for in-progress
imports. That means an ESubtreeMap followed by EImportFinish will do
the right thing now.
- adjust the dir_auth on EExport journaling (handle_export_dir_ack) so
that our journaled subtree_map state is always in sync with what we
see during replay.
Also document clearly what the dir_auth variations actually mean.
Signed-off-by: Sage Weil <sage@newdream.net>
If we are in PREPPING, we need to drop the stickydirs() on the inodes, and
not the pins on the dirfrags. Do this in the helper so we can keep the
call chains simple.
Also deal with the case where we get a cancel in PREPPED state.
Signed-off-by: Sage Weil <sage@newdream.net>
The prepping nodes may need to discover bounds from the failed node and
may hang indefinitely. Meanwhile, we won't send out mds_resolve messages
until in-progress migrations complete. Deadlock.
In certain cases the importing node can manufacture the replica. If it
doesn't realize that right off, though, it will get hung up trying to
discover from the wrong node, get referred to the failed node, and block
waiting for recovery. The replica forging is a bit suspect anyway, so
let's avoid the whole thing if we can!
Signed-off-by: Sage Weil <sage@newdream.net>
Use helpers for common code shared between handle_export_cancel and
handle_mds_failure_or_stop.
Also include handling for IMPORT_PREPPING state, even though we don't use
it yet.
Signed-off-by: Sage Weil <sage@newdream.net>
During replay we trim non-auth inodes on EExport or EImportFinish abort.
Subtree trimming may be delayed, too.
Skip parents if the diri is in the same blob, or if it is journaled in the
current segment *and* it is in a subtree that is unambiguously auth. We can't
easily be more precise than that because the actual event we care about on
replay is EExport, but the migrator doesn't twiddle auth bits to false until
later.
Also, reset last_journaled on import.
This fixes replay bugs like
2011-04-13 18:15:18.064029 7f65588ef710 mds1.journal EImportStart.replay 10000000015 bounds []
2011-04-13 18:15:18.064034 7f65588ef710 mds1.journal EMetaBlob.replay 2 dirlumps by unknown0
2011-04-13 18:15:18.064040 7f65588ef710 mds1.journal EMetaBlob.replay dir 10000000010
2011-04-13 18:15:18.064046 7f65588ef710 mds1.journal EMetaBlob.replay missing dir ino 10000000010
mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)', in thread '0x7f65588ef710'
mds/journal.cc: 407: FAILED assert(0)
ceph version 0.25-683-g653580a (commit:653580ae84c471c34872f14a0308c78af71f7243)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0xa53d26]
2: (EMetaBlob::replay(MDS*, LogSegment*)+0x7eb) [0x7a737d]
Fixes: #994
Signed-off-by: Sage Weil <sage@newdream.net>
We were bailing out of mkcephfs with a config with no mds's defined
(because we set -e and grep returns an error here).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We were bailing out of mkcephfs with a config with no mds's defined
(because we set -e and grep returns an error here).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Stop accepting old-style section names of the form $type$id. Instead,
we want section names of the form $type.$id. So [osd0] will no longer
be a valid section name; instead, use [osd.0].
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Normalize key names in md_config_t::get_val and md_config_t::set_val
Remove unused fields from struct config_option.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Rados Gateway: get rid of RGWOp::err. We already have req_state::err and
that represents the same thing.
Standardize nomenclature for errors. 'errno' is our internal
representation of the error. 'code' is what is returned by S3.
'message' is the message at the end. Improve rgw_err.
dump_errno shouldn't modify req_state, but just dump the error.
A new function set_req_state_err sets the error based on an 'errno'.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Reading a config file into any md_config_t structure except g_conf used
to be impossible. This is because the config_option code used to
contain explicit references to g_conf. Those have been removed, so now
any md_config_t should be able to read a configuration file.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Now log_per_instance (the symlink dance) works with both log_file and
log_dir. This will facilitate gradually removing log_dir.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
ReadOp should read the recieve length to prevent buffer error.
Check error codes on WriteOp and ReadOp.
Signed-off-by: Samuel Just <rexludorum@gmail.com>
Triggered by mds_kill_import_at 5. We were clearing the export_locks
prior to calling export_unlock (der!).
Signed-off-by: Sage Weil <sage@newdream.net>