Previously, using the state on active worked, but now we might
go back through WaitRemoteRecoveryReserved without resetting
Active.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We don't want to change missing sets during a chunky
scrub since it would cause !is_clean() and derail
the rest of the scrub. Instead, move the missing,
inconsistent, and authoritative sets into scrubber
and add to during scrub_compare_maps(). Then,
handle repairing objects all at once in scrub_finish().
Signed-off-by: Samuel Just <sam.just@inktank.com>
Add tests for:
- sparse import makes expected sparse images
- sparse export makes expected sparse files
- sparse import from stdin also creates sparse images
- import from partially-sparse file leads to partially-sparse image
- import from stdin with zeros leads to sparse
- export from zeros-image to file leads to sparse file
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Try to accumulate image-sized blocks when importing from stdin, even if
each read is shorter than requested; if we get a full block, and it's
all zeroes, we can seek and make a sparse output file
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
We can get a pattern like so:
- new mon session
- after say 120 seconds, we decide to send a stats msg
- outstanding_pg_stats is finally true, we immediately time out (30 second
grace), and reconnect to a new mon
-> repeat
The problem is that we don't reset the last_sent timestamp when we send.
Or that we do this check after sending instead of before. Fix both.
This should resolve the issue #3661 where osds that don't have pgs
updating are not stats messags to the mon to check in, and are eventually
getting marked down as a result.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
This avoids the situation where a librados or other user with the default
of 'cephx,none' and no keyring is authenticating against a cluster with
required of 'none' and an annoying warning is generated every time. Now
we only print a helpful message if we actually failed.
Signed-off-by: Sage Weil <sage@inktank.com>
This means we can drop the scrub repair state_clear() call. We probably
can drop others, but lets leave that for another day.
Signed-off-by: Sage Weil <sage@inktank.com>
If both cephx and none are accepted auth methods, and
cephx keyring cannot be found then resort to using
none, instead of failing.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
If we do a scrub repair, we need to go from clean to recovery again to
copy objects around.
This fixes a simple repair of a missing object, either on the primary or
replica.
Signed-off-by: Sage Weil <sage@inktank.com>
We set SCRUBBING when we queue a pg for scrub. If we dequeue and
call scrub() but abort for some reason (!active, degraded, etc.), clear
that state bit.
Bug is easily reproduced with 'ceph osd scrub N' during cluster startup
when PGs are peering; some PGs can get left in the scrubbing state.
Signed-off-by: Sage Weil <sage@inktank.com>
Add ceph osd ls to help; make help for ceph osd tell N bench look
more like injectargs, which says <osd-id or *> to make it clear you
can benchmark all osds simultaneously
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Just call the common daemonize function. Otherwise we end up
not initializng stdout / stderr correctly.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
We need to signal the cond in the same interval where we hold the lock
*and* modify the queue. Otherwise, we can have a race like:
queue has 1 item, max is 1.
A: enter submit_entry, signal cond, wait on condition
B: enter submit_entry, signal cond, wait on condition
C: flush wakes up, flushes 1 previous item
A: retakes lock, enqueues something, exits
B: retakes lock, condition fails, waits
-> C is never woken up as there are 2 items waiting
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
handle_notify_timeout and remove_notify currently do not clean up this
state leaving dangling Notification*. Further, we only use this mapping
in unwatch in order to determine which notifies to update. We can
accomplish the same thing by iterating through the obc->notifs mapping
since all notifications relevant for a given watch would have been for
the same obc as the watch.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Fixes: #3638
rgw exit timeout secs : number of seconds to wait for process
to exit cleanly before forcing exit. If set to 0, it'l wait
indefinitely.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3648
Cannot assign a NULL pointer into stl string. This is only
relevant to swift, when uploading an object without specifying
content type, and when the suffix cannot be determined.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3653
No need to initialize keystone, including the keystone
revocation thread which was verbose if key stone was
not set up. This removes some unuseful errors from the
log.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3649
No need to have an extra configurable to use keystone. Use keystone
whenever keystone url has been specified. Also, fix a bad error
handling that turned a failure to authenticate into successfully
authenticating a bad user.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
This requires us to copy bufferlists in two cases since bufferlist
does not have a const interator at this time.
Signed-off-by: Samuel Just <sam.just@inktank.com>
If we return an error, send the message to stderr. This makes things
more easily scriptable because error messages won't take the place of
expected output.
Signed-off-by: Sage Weil <sage@inktank.com>