This reverts commit c3107009f6.
This appears to be causing problems in the objecter by corrupting
the stack. Until that is resolved, let's revert.
Signed-off-by: Sage Weil <sage@inktank.com>
Instead of having a hardcoded default, use a configurable one. It is
limited to 65536 until future testing guarantees there is no side-effects
of increasing it past this value, but by being adjustable the user still
has the freedom to specify whatever maximum value he wants.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
We iterate over ops and, if the pool dne and other conditions are true,
we will immediately return ENOENT and cancel an op. Increment the
iterator at the top of the loop to avoid invalidating it.
We also need to switch to a map<>, because hash_map<> mutations may
invalidate any/all iterators.
Fixes: #3613
Signed-off-by: Sage Weil <sage@inktank.com>
It turns out that our suites don't exercise fsync, at least not very much
(I couldn't find it in all the places I looked for it). This tester
was written by Ted T'so and updated by Chris Mason; I just made it
work on a smaller dataset (256MB) because 8GB against a small cluster takes
more time than we want to wait.
Signed-off-by: Greg Farnum <greg@inktank.com>
Skipping the top 4 (it starts at 0) calls in the
backtrace actually skips the call that does the lock.
Skip 3 instead.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
This script was heuristically using short sleep commands in order to
give udev activity time to complete.
There's a command "udevadm settle" which actually looks at the udev
queue and waits until its processing is done. Much, much better.
This rearranges the get_id function a bit too, breaking it into one
function that gets the id and another that loops back and tries
again after a short delay in the event the get_id fails.
Signed-off-by: Alex Elder <elder@inktank.com>
Mark the directory so that upstart will manage the daemon. Eventually,
this should be generalized to allow ceph-disk-* usage with other init
systems.
Signed-off-by: Sage Weil <sage@inktank.com>
We need to distinguish between daemons managed by upstart and sysvinit
(and, eventually, systemd). Only start daemons when 'upstart' is present.
Note that sysvinit will only start daemons when the 'host = ...' line is
in ceph.conf, so there is a similar "opt-in".
Signed-off-by: Sage Weil <sage@inktank.com>
Backfill messages modify the stats on the replica and therefore
must be sent with the same priority as sub_op_modify to ensure
ordering. Using recovery_op_priority caused the following
sequence:
1) Primary(1) sends MOSDPGBackfill FINISH with updated stats (v1)
2) Primary(1) sends SubOp modify for new client op with stats (v2)
3) Replica(2) receives SubOp with stats (v2)
4) Replica(2) receives MOSDPGBackfill FINISH with stats (v1)
5) Replica(2) responds and Primary(1) resets pgtemp making
Replica(2) Primary(2)
6) PG stats on Primary(2) several ops old.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
There are internal ordering requirements which may be sensitive
to assigned priority. We don't want a mix of priorities from
old clients with priorities from new clients causing trouble.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Testing the tick delay with a fork/suspend is causing
corruption in the lockdep code. This approach uses
a config option to sleep the tick thread for a number
of seconds, avoiding the entire fork/suspend mess.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
This way attempting to use format 2 images works when you upgrade the
python bindings before librbd, and attempting to use functions
that librbd does not have results in more understandable errors.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This can happen when:
- mon sends create pg
- it gets created
- osd remaps the pg to a different osd
but osd does not update pg status to the mon
- mkpg resent to the new osd
or something along those lines. It seems unusual, but in the end who
really cares why the mon doesn't know about the pg creation yet.
Note that this check was added in the initial commit where acting/up was
added; there is no specific condition of concern we are protecting against.
Instead, ignore the message. We'll get a query soon anwyay.
This 'fixes' #3614.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
There are some limitations to the number of possible pg's per pool, and
by allowing the 'osd pool create' command to succeed, we were making room
to some anomalous behavior.
Fixes: #3617
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
rbd.list() returns a list of names, but nothing stops them from
going away before rbd.open(); check for ENOENT and ignore if that
happens; warn on other errors
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
obc->watchers now has a ref to the connection as well. This piece of
disconnect_session_watchers essentially parallels remove_watcher and
should generally do the same thing.
Signed-off-by: Samuel Just <sam.just@inktank.com>
If disconnect_session_watches races with watch removal, the session
might no longer have a valid obc ref. In that case, move on to
the next obc.
Note, there is no danger of any obcs being *added* to the session
since the session/connection at this point is dead.
Signed-off-by: Samuel Just <sam.just@inktank.com>
This will catch buffer decoding errors (maybe the block is empty) and
return an error string.
May fix (or possibly paper over) #3459.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
mount -a uses this, but also passes it to mount.fuse.ceph, and libceph
complains:
fuse: unknown option `noauto'
Signed-off-by: Sage Weil <sage@inktank.com>
This handles the remainder of 3581; it's a lot like the problem in
mkcephfs, but it isn't mkcephfs.
Fixes: #3581
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Valgrind outputs a warning for unrecognized system calls,
and does so for the syscall(__SYS_syncfs,...) and
syscall(__NR_syncfs, ...) calls. This patch avoids making
those calls (and the warning, when run in valgrind) if the
syncfs libc call is available.
INFO:teuthology.task.ceph.osd.1.err:--10568-- WARNING: unhandled syscall: 306
INFO:teuthology.task.ceph.osd.1.err:--10568-- You may be able to write your own handler.
INFO:teuthology.task.ceph.osd.1.err:--10568-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
INFO:teuthology.task.ceph.osd.1.err:--10568-- Nevertheless we consider this a bug. Please report
INFO:teuthology.task.ceph.osd.1.err:--10568-- it at http://valgrind.org/support/bug_reports.html.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Currently, handle_watch_timeout will gladly write to an object while
that object is degraded or is being scrubbed. Now, we queue a
callback to be called on scrub completion or finish_degraded_object
to recall handle_watch_timeout. The callback mechanism assumes that
the registered callbacks assume they will be called with the pg
lock -- and no other locks -- already held.
The callback will release the obc and pg refs unconditionally. Thus,
we need to replace the unconnected_watchers pointer with NULL to
ensure that unregister_unconnected_watcher fails to cancel the
event and does not release the resources a second time.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Session refs are not really valid on their own, the
corresponding Connection must remain live for at least
as long as the Session.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Fixes: #3535
New object attributes are now configurable. A list
can be specified via the 'rgw extended http attrs'
config param.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3529
Added a new option: rgw_s3_success_create_obj_status.
Expected values are 0, 200, 201, 204. A value of 0
will skip the special handling altogether. Any value
other than the specified will default to 200.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Commit d9dce4e927 broke journal replay
because the commit thread may try to do a commit, and the ops are not
being applied via the normal work queue. Add back in a simpler form of the
old op quiescing (simpler because there is a single thread doing the
replay).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>