Previously (in w26f6a8e48ae575f17c850e28e969d55bceefbc0f), for reasons that
are somewhat obscured by passage of time, we did
+ if ((other_wanted & (CEPH_CAP_GRD|CEPH_CAP_GWR)) ||
But then we noticed that the loner may want to RD/WR and we are losing the
loner status for some other reason. So just recently in
b48dfeba3f we changed it to
+ if (((other_wanted|loner_wanted) & (CEPH_CAP_GRD|CEPH_CAP_GWR)) ||
Then we noticed that a non-loner wanting to read and a loner wanting to
read (i.e., no writers!) would lead to MIX, even when we want SYNC.
So in 07b36992da we changed to
+ if (((other_wanted|loner_wanted) & CEPH_CAP_GWR) ||
This appears to be correct. The possible choices (wrt caps wanted):
loner other want
R R SYNC
R R|W MIX
R W MIX
R|W R MIX
R|W R|W MIX
R|W W MIX
W R MIX
W R|W MIX
W W MIX
Which means any writer -> we want MIX. We only want SYNC when there is
nobody who wants to write. Because you can't write in SYNC. Which in
retrospect seems obvious.
Signed-off-by: Sage Weil <sage@inktank.com>
This will catch buffer decoding errors (maybe the block is empty) and
return an error string.
May fix (or possibly paper over) #3459.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
mount -a uses this, but also passes it to mount.fuse.ceph, and libceph
complains:
fuse: unknown option `noauto'
Signed-off-by: Sage Weil <sage@inktank.com>
This handles the remainder of 3581; it's a lot like the problem in
mkcephfs, but it isn't mkcephfs.
Fixes: #3581
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Valgrind outputs a warning for unrecognized system calls,
and does so for the syscall(__SYS_syncfs,...) and
syscall(__NR_syncfs, ...) calls. This patch avoids making
those calls (and the warning, when run in valgrind) if the
syncfs libc call is available.
INFO:teuthology.task.ceph.osd.1.err:--10568-- WARNING: unhandled syscall: 306
INFO:teuthology.task.ceph.osd.1.err:--10568-- You may be able to write your own handler.
INFO:teuthology.task.ceph.osd.1.err:--10568-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
INFO:teuthology.task.ceph.osd.1.err:--10568-- Nevertheless we consider this a bug. Please report
INFO:teuthology.task.ceph.osd.1.err:--10568-- it at http://valgrind.org/support/bug_reports.html.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Fixes: #3535
New object attributes are now configurable. A list
can be specified via the 'rgw extended http attrs'
config param.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3529
Added a new option: rgw_s3_success_create_obj_status.
Expected values are 0, 200, 201, 204. A value of 0
will skip the special handling altogether. Any value
other than the specified will default to 200.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Commit d9dce4e927 broke journal replay
because the commit thread may try to do a commit, and the ops are not
being applied via the normal work queue. Add back in a simpler form of the
old op quiescing (simpler because there is a single thread doing the
replay).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
With retries, it's possible for notifies to be received more than once
when they are resent to different OSDs, since the OSDs only track them
in memory.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #3590
This was triggered when tried to run mds with cephx enabled
against a mon without cephx support. We didn't handle the
returned error at all, so this one fixes it. It also makes
sure that we don't continue initialization until rotating
keys are in place (as the osd does).
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Watches update the on-disk state in the OSD, and aren't idempotent,
so refreshing them must be treated as a separate transaction by the OSD.
Notifies are just in-memory state, and resending them will result in
acceptable behavior:
- if it's the same osd, the resent op will be recognized as a duplicate
- if it's a different osd, a new notify will be triggered since the new osd
can't tell whether the original notify was received by any watchers
Using a new tid for each resend can cause some unecessary extra work,
as the first case turns into the second.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Rename operation can call predirty_journal_parents() several times.
So a directory fragment's rstat can also be modified several times.
But only the first modification is journaled because EMetaBlob::add_dir()
does not update existing dirlump.
For example: when hanlding 'mv a/b/c a/c', Server::_rename_prepare may
first decrease directory a and b's nested files count by one, then
increases directory a's nested files count by one.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Add CRYPTO_CXXFLAGS to unittest_formatter_CXXFLAGS to find pk11pub.h to
be included in src/common/ceph_crypto.h.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Do not generate errors each time we fail to open a config file; only
generate one at the end if a search path was specified and none were
usable, right before we (already) exit. This avoids spamming stderr
about each path we tried in the search list before we found a good one.
Signed-off-by: Sage Weil <sage@inktank.com>
Complain about config parsing errors even when it is the default
config file.
We may also want to fail instead of continuing, but that is a separate
issue.
Signed-off-by: Sage Weil <sage@inktank.com>
osd max backfills: 5 was too low for a default, 10
seems to work better in testing. The message
priority system should minimize disruption of
push and pull operations anyway.
osd recovery max chunk: 1MB was too small for a
default. 8MB is reasonable for a single push
and will allow us to recover an rbd block in
one push rather then 4 reducing client io
latency during log-based recovery.
osd recovery op priority: 10 rather than 30 will
further reduce the client io latency impact of
push and pull operations.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We pass a pointer because it is an optional argument, but we shouldn't
put the bufferlist on the heap or else we have to manage it's life
cycle, and that's fragile (and previously broken).
Signed-off-by: Sage Weil <sage@inktank.com>
A list is overkill; just use a seq and make sure it increments to ensure
the op_submit_finish calls are in order.
Signed-off-by: Sage Weil <sage@inktank.com>