During recovery we submit transactions like:
- delete a/foo
- move tmp/foo to a/foo
This prevents the EEXIST check in collection_move from doing any good,
since the destination never exists. We need to do that remove at least
sometimes, because we may be overwriting an existing/older version of the
object.
So,
- set the guard after we do the move, so that
- the delete won't be repated, and
- the EEXIST check will work
Also check the guard for good measure (although that doesn't do anything
specifically useful in this scenario).
Fixes: #2164
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
leveldb adds -I flags to CFLAGS and CXXFLAGS, but if these macros are
overridden in the make command line, the flags are dropped, and the
build fails. leveldb should probably use AM_CFLAGS instead, but the
spec file can specify the preferred CFLAGS in the configure command
line, and then everything will work as expected.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If the first write that creates an object includes a truncate_seq and
truncate_size, we were taking the truncte patch and doing a truncate op
in our transaction prior to the write, and then setting the object_info
size appropriately. However, if the object doesn't exist, the truncate
op fails even though the oi.size gets set.
Later, this turns up as a scrub error (see #2080).
Fix this by skipping the truncate if it is a new object. Instead, we
should just initialize our truncate_{seq,size} metadata so that we're all
up to date for any later writes.
Alternatively, we could touch the object and then truncate it (up) to the
large size, but this is sort of a waste; data beyond a short object eof is
defined to be zeros, so all we would accomplish is making recovery work
harder by copying zeros around.
Fixes: #2080
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
This breaks because:
- we don't have the head or current snapset
- get_object_context() creates a new snapset, which is wrong
We probably can only do this if we are certain we can construct/modify
the old snapset and end up with the correct one.
Signed-off-by: Sage Weil <sage@newdream.net>
We specifically want to use this during recovery to avoid loading the obc
or ssc for a previous version of the object and populating the watchers.
We know we won't have any existing obc here because it is missing (old or
dne).
For the snapset context, we provide it explicitly when we recover the head
or snapset object (which we always do first). For clones, we re-use the
existing get_snapset_context(), which will either have the ssc open or
can load it from the head/snapset object.
Signed-off-by: Sage Weil <sage@newdream.net>
We set degraded if we don't have enough "active" replicas, which excludes
the backfill target. We need to recheck that when we finish recovery and
the backfill target is now complete.
Fixes: #2160
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
This covers:
- the push/pull changes in 0.43 (which we forgot to protect against; see
#2132)
- the new omap stuff for 0.44
Maybe we could make this finer grained so that ceph-osd would fail only
when mismatched versions are talking _and_ there is actual omap data in
play, but it's not worth the effort at this point.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This makes 'make check' happy, otherwise we need to create
a bucket name that starts with a period. This version is better.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
The source object may either not exist or be the wrong size
during replay if the destination object was deleted in a future
already-applied operation. This should not impact correctness
of the replay.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
We want to be able to provide alternative default config values, than
the ones we set in common/config_opts.h. This can be useful when we
want different default for different modules (e.g., rgw, rgw-admin).
Just passing it on the command line won't do because then we'd override
any config set by the user, so we need to process that before the regular
parsing (but after initializing the config context).
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
It's turned on by default. So now we're using the
'rgw enable ops log' config param in ceph.conf, instead
of RGW_SHOULD_LOG_DEFAULT in the apache conf.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Need this to make a linker error go away on my squeeze dev box. We
probably need to make sure librgw doesn't touch fcgi, once that is
revisited down the line. Opened #2166.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We need to sync the object_map too. We can _almost_ check to see if there
are keys for the object and only do it then, except that they may have
existed previously and then been deleted.
So, always sync. leveldb is reasonably nice about this... it should just
be another fsync.
Signed-off-by: Sage Weil <sage@newdream.net>
The old strategy was to initiate a commit after any non-idempotent
transaction. This only worked if the transaction was idempotent with
respect to itself, or could be replayed partially without problems,
and in reality that isn't the case. For example:
- clone A -> B
- write to A
- <sync>
If we crash before the sync, and replay the clone A->B, we corrupt B with
the new A data.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>