All our interfaces are in place, so now we can actually take and
drop the locks.
1) Take locks in ReplicatedPG::recover_backfill. This is the entry
into the backfill code path, and covers all objects which are
added to backfills_in_flight (via prep_backfill_object_push()). If we
can't get the lock right away, we stop the backfill movement there
until we can do so.
2) Drop the locks in ReplicatedPG::on_peer_recover(), called when the
push is completed.
2b) Further drop the locks on all backfills_in_flight objects in
_clear_recovery_state(), for when we cancel peering.
Signed-off-by: Greg Farnum <greg@inktank.com>
We previously inferred whether there was useful work to be done
by looking at the number of ops started, but with the upcoming
introduction of the rw_manager read locking on backfill, we could
start no ops while still having work to do. Switch around the
interfaces to specify these as separate pieces of information.
Signed-off-by: Greg Farnum <greg@inktank.com>
We want backfill to take read locks on the objects it's pushing. Add
a get_backfill_read(hobject_t) function, a corresponding drop_backfill_read(),
and a backfill_waiting_on_read member in ObjState. Check that member when
getting a write lock, and in put_write(). Tell callers to requeue the recovery
if necessary, and clean up the backfill block when its read lock is dropped.
Signed-off-by: Greg Farnum <greg@inktank.com>
http://tracker.ceph.com/issues/5374Fixes#5374
This adds options parsing to have a user, password and tenant,
to be able to ask for a token.
This token is then used to authenticate against keystone, instead
of relying on the admin token.
Otherwise, you can still use the admin token to authenticate.
This doesn't change the existing behaviour.
Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
If we get dup pool rename requests that are racing, make sure the second
one comes back with 'success' if the rename entry already exists in the
pending_inc map.
Signed-off-by: Sage Weil <sage@inktank.com>
rebuild_page_aligned relies on rebuild to create memory that is aligned
according to list::is_page_aligned(). However, when the bufferlist only
contains a single ptr and that its size is not list::is_n_page_size(),
rebuild will not create the expected alligned bufferlist.
The allocation of the ptr is moved out of rebuild which is now given the
ptr as an argument. The rebuild_page_aligned function always require an
aligned ptr with buffer::create_page_aligned(_len) for consistency.
The test
bufferlist bl;
bufferptr ptr(buffer::create_page_aligned(2));
ptr.set_offset(1);
ptr.set_length(1);
bl.append(ptr);
EXPECT_FALSE(bl.is_page_aligned());
bl.rebuild_page_aligned();
EXPECT_FALSE(bl.is_page_aligned());
demonstrated the problem. It was assumed to be a feature but should have
been identified as a bug. The last ligne is replaced with
EXPECT_TRUE(bl.is_page_aligned());
Most tests related to is_page_aligned() wrongfully assumed that
bufferptr ptr(2);
is never page aligned. Most of the time it is not but sometime it is
when the pointer address is by chance on a CEPH_PAGE_SIZE boundary,
which triggered #6614. Non aligned ptr are created as follows instead:
bufferptr ptr(buffer::create_page_aligned(2));
ptr.set_offset(1);
ptr.set_length(1);
http://tracker.ceph.com/issues/6614fixes: #6614
Signed-off-by: Loic Dachary <loic@dachary.org>
'ceph osd pool rename' takes two arguments: source pool and dest pool.
If by chance 'source pool' does not exist and 'destination pool' does,
then, in order to assure it's idempotent, we want to assume that if
'source pool' no longer exists is because it was already renamed.
However, while we will return success in such case, we want to make sure
to let the user know that we made such assumption. Mostly to warn the
user of such a thing in case of a mistake on the user's part (say, the
user didn't notice that the source pool didn't exist, while the dest did),
but also to make sure that the user is not surprised by the command
returning success if the user expected an ENOENT or EEXIST.
Fixes: #6635
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
On a virgin centos-6.4, after yum-builddep ceph and following
http://ceph.com/docs/next/install/building-ceph/ instructions to:
cd ceph
./autogen.sh
./configure
make
it fails because make is not installed. It probably is not a problem for
most people because there are few developers who did not install make.
Signed-off-by: Loic Dachary <loic@dachary.org>
With this branch we make copy-get significantly easier to extend by applying our standard encode/decode stuff to it, instead of doing an inline encode-onto-the-payload. We also add some infrastructure for dealing with completion of RepGathers.
Reviewed-by: Sage Weil <sage@inktank.com>
We don't bump the encoding version -- and stick it in the middle --
since it's still brand-new. For simplicity, we encode it unconditionally
rather than trying to embed it alongside the attrs or with its own
"complete" flag in the cursor.
Signed-off-by: Greg Farnum <greg@inktank.com>
This one is encoded with version information. We are not doing anything
to control which op gets sent by the client, but after discussion with
Sam we think this op isn't accessible enough to clients (right now it's
only triggered by a client sending copy-from, which can only happen via
ceph-test-rados) to require compatibility versioning.
Signed-off-by: Greg Farnum <greg@inktank.com>
In order to introduce versioning of copy-get, we need to make it a
different op that has the versioning infrastructure from the start.
Signed-off-by: Greg Farnum <greg@inktank.com>
It was getting long, isn't terribly dependent on access to do_osd_ops()
state, and will be easier to make generic as its own function.
Signed-off-by: Greg Farnum <greg@inktank.com>
Right now this is very primitive, but we're about to extend it to
deal with request versioning appropriately, and adding in some
extra fields.
Sadly we are doing a little extra copying in the Objecter as a result, but
too bad -- being able to do updates will be worth it.
Signed-off-by: Greg Farnum <greg@inktank.com>
Make a few changes to make sure we trigger it when appropriate. We'll use
this shortly for object promotion, and perhaps for other things in future.
Signed-off-by: Greg Farnum <greg@inktank.com>
This version is a user version, and since we're in the OSD we
should call it such. (In particular, we may want to keep track
of the internal version too when doing cache promotes.)
Signed-off-by: Greg Farnum <greg@inktank.com>
There's no failure it can actually run into, and handling error
codes in some of its callers is going to be a pain.
While we're here, document the parameters.
Signed-off-by: Greg Farnum <greg@inktank.com>
Fixes: #6574
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
64 characters isn't all that long. 4096 ought to be enough for anyone.
Fixes: #6072
Backport: dumpling, cuttlefish
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Add a chkconfig line for RHEL based distros to make chkconfig start rbdmap earlier on boot and stop later on shutdown. This will help prevent shutdown/reboot from hanging your system forever in the event that some daemon has a file held open on an rbd mounted filesystem.
Signed-off-by: Adam Twardowski <adam.twardowski@gmail.com>(cherry picked from commit 80384a1a24)
0x21 '!' is the first character that doesn't need encoding, so we can
expand the lower bound check.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Add a chkconfig line for RHEL based distros to make chkconfig start rbdmap earlier on boot and stop later on shutdown. This will help prevent shutdown/reboot from hanging your system forever in the event that some daemon has a file held open on an rbd mounted filesystem.
Signed-off-by: Adam Twardowski <adam.twardowski@gmail.com>
This fixes copy operations for objects that contain unsafe characters,
like a newline, which would return a 403 otherwise, since the GET to
the source rgw would be unable to verify the signature on a partially
valid bucket name.
Fixes: #6604
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This is useful outside of the s3 interface. Rename url_escape()
url_encode() for consistency with the exsting common url_decode()
function. This is in preparation for the next commit, which needs
to escape url-unsafe characters in another place.
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Send the last marker whether the log is truncated in the same format
as data log list, so clients don't have more needless complexity
handling the difference. Keep bucket index logs the same, since they
contain the marker already, and are not used in exactly the same way
metadata and data logs are.
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Consumers of this api need to know their position in the log. It's
readily available when fetching the log, so return it. Without the
marker in this call, a client could not easily or efficiently figure
out its position in the log, since it would require getting the global
last marker in the log, and then reading all the log entries.
This would be slow for large logs, and would be subject to races that
would cause potentially very expensive duplicate work.
Returning this atomically while fetching the log entries simplifies
all of this.
Fixes: #6615
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
There's no reason to restrict returning the marker to the case where
less than the whole log is returned, since there's already a truncated
flag to tell the client what happened.
Giving the client the last marker makes it easy to consume when the
log entries do not contain their own marker. If the last marker is not
returned, the client cannot get the last marker without racing with
updates to the log.
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>