We only want to do this if is_active(). Otherwise, the normal
requeueing code will do its thing, taking care to get the queue orders
correct.
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
We don't need to do quite so many writes. It can be slow when we are
thrashing and aren't doing anything in parallel.
Fixes: #8932
Signed-off-by: Sage Weil <sage@redhat.com>
We could race with another thread that deletes this right after we call
dec(). Our access of cct would then become a use-after-free. Valgrind
managed to turn this up.
Copy it into a local variable before the dec() to be safe, and move the
dout line below to make this possibility explicit and obvious in the code.
Signed-off-by: Sage Weil <sage@redhat.com>
Fixes: #8442
Backport: firefly
Data pools might have strict write alignment requirements. Use pool
alignment info when setting the max_chunk_size for the write.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
osd: set pg flag INCOMPLETE_CLONES when turning off cache pool
Reviewed-by: Greg Farnum <greg@inktank.com>
First patch Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
We were carrying a bare Message*, which could get freed if the op was
canceled (or possibly completed). Instead, just stash the entity_name_t,
the only piece we need. The Connection is properly ref counted so no
worries there.
Fixes: #8926
Signed-off-by: Sage Weil <sage@redhat.com>
When closing journal, it should check must_write_header and update
journal header if must_write_header alreay set.
It can reduce the nosense journal-replay after restarting osd.
Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Reviewed-by: Sage Weil <sage@redhat.com>
This breaks ref cycles between the local_connection and session, and let's
us drop the explicit set_priv() calls in OSD::shutdown().
Signed-off-by: Sage Weil <sage@redhat.com>
Adding the available help arguments from the man page
Fixes: #8112
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
Whitespace removal to make all help options align in a similar fashion
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
We cannot assume that just because cache_mode is NONE that we will have
all clones present; check for the absense of the INCOMPLETE_CLONES flag
here too.
Signed-off-by: Sage Weil <sage@redhat.com>
During recovery, we can clone subsets if we know that all clones will be
present. We skip this on caching pools because they may not be; do the
same when INCOMPLETE_CLONES is set.
Signed-off-by: Sage Weil <sage@redhat.com>
When scrubbing, do not complain about missing cloens when we are in a
caching mode *or* when the INCOMPLETE_CLONES flag is set. Both are
indicators that we may be missing clones and that that is okay.
Fixes: #8882
Signed-off-by: Sage Weil <sage@redhat.com>
Set a flag on the pg_pool_t when we change cache_mode NONE. This
is because object promotion may promote heads without all of the clones,
and when we switch the cache_mode back those objects may remain. Do
this on any cache_mode change (to or from NONE) to capture legacy
pools that were set up before this flag existed.
Signed-off-by: Sage Weil <sage@redhat.com>
If we have a pending pool value but the cache_mode hasn't changed, this is
still a no-op (and we don't need to block).
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
If we are doing a lookup, the main xattr fails, we'll check if there is an
alt xattr. If it exists, but the nlink on the inode is only 1, we will
kill the xattr. This cleans up the mess left over by an incomplete
lfn_unlink operation.
This resolves the problem with an lfn_link to a second long name that
hashes to the same short_name: we will ignore the old name the moment the
old link goes away.
Fixes: #8701
Signed-off-by: Sage Weil <sage@redhat.com>
After we unlink, if the nlink on the inode is still non-zero, remove the
alt xattr. We can *only* do this after the rename or unlink operation
because we don't want to leave a file system link in place without the
matching xattr; hence the fsync_dir() call.
Note that this might leak an alt xattr if we happen to fail after the
rename/unlink but before the removexattr is committed. We'll fix that
next.
Signed-off-by: Sage Weil <sage@redhat.com>
Add a helper to close fd's when we leave scope. This is important when
injecting failures by throwing exceptions.
Signed-off-by: Sage Weil <sage@redhat.com>
When we rename an object (collection_move_rename) to a different name, and
the name is long, we run into problems because the lfn xattr can only track
a single long name linking to the inode. For example, suppose we have
foobar -> foo_123_0 (attr: foobar) where foobar hashes to 123.
At first, collection_add could only link a file to another file in a
different collection with the same name. Allowing collection_move_rename
to rename the file, however, means that we have to convert:
col1/foobar -> foo_123_0 (attr: foobar)
to
col1/foobaz -> foo_234_0 (attr: foobaz)
This is a problem because if we link, reset xattr, unlink we end up with
col1/foobar -> foo_123_0 (attr: foobaz)
if we restart after we reset the attr. This will cause the initial foobar
lookup to since the attr doesn't match, and the file won't be able to be
looked up.
Fix this by allow *two* (long) names to link to the same inode. If we
lfn_link a second (different) name, move the previous name to the "alt"
xattr and set the new name. (This works because link is always followed
by unlink.) On lookup, check either xattr.
Don't even bother to remove the alt xattr on unlink. This works as long
as the old name and new name don't hash to the same shortname and end up
in the same LFN chain. (Don't worry, we'll fix that next.)
Fixes part of #8701
Signed-off-by: Sage Weil <sage@redhat.com>
- rbd-fuse depends on librados2/librbd1
- ceph-devel depends on specific releases of libs and libcephfs_jni1
- librbd1 depends on librados2
- python-ceph does not depend on libcephfs1
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 7cf8132239)
Move files, postun scriptlet, and add dependencies on ceph-common
where appropriate
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit e131b9d5a5)
In the cases where we are taking a write lock and are careful
enough that we know we should succeed (i.e, we assert(got)),
use the get_write_greedy() variant that skips the checks for
waiters (be they ops or backfill) that are normally necessary
to avoid starvation. We don't care about staration here
because our op is already in-progress and can't easily be
aborted, and new ops won't start because they do make those
checks.
Fixes: #8889
Signed-off-by: Sage Weil <sage@redhat.com>
There are several lockers that need to take a write lock
because there is an operation that is already in progress and
know it is safe to do so. In particular, they need to skip
the starvation checks (op waiters, backfill waiting).
Signed-off-by: Sage Weil <sage@redhat.com>
mon: AuthMonitor: always encode full regardless of keyserver having keys
Reviewed-by: Gregory Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>