Commit Graph

30790 Commits

Author SHA1 Message Date
Loic Dachary
08c17b7c5c qa: cleanup cephtool/test.sh tmp files
When run in a shared environment ( as opposed as a machine created for
the purpose of running this test only ), it is important to cleanup
leftovers to avoid poluting the /tmp space. Create a common temporary
directory for all tmp files.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-14 17:31:04 +01:00
Sage Weil
66a4f8a291 Merge pull request #1071 from ceph/wip-max-file-size
allow mds max file size to be adjusted

Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-01-13 17:43:49 -08:00
Sage Weil
c5cacf4e56 Merge pull request #1058 from ceph/wip-cache-snap
snap/clone promotion, flush, and other goodies

This is now passing the thrashing with both cache and snap ops:
  sage-2014-01-13_15:45:26-rados:thrash-wip-cache-snap-testing-basic-plana

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-01-13 16:50:17 -08:00
Sage Weil
be8db8c338 osd/ReplicatedPG: use get_object_context in trim_object
find_object_context() has all the logic to choose a particular clone given
a logical snap.  In the trim case, we want none of that: we just need to
pull the obc for a specific clone instance.  Note that this changes
none of the failure cases (previous we asserted r == 0).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:50 -08:00
Sage Weil
b5ae76e8fe ceph_test_rados: do not delete in-use snaps
There are a bunch of ops that read from snaps.  Do not delete a snap
while they are in use.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
8b39719a10 osd/OSDMonitor: fix 'osd tier add ...' pool mangling
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
d41a1d3d82 osd/ReplicatedPG: update ObjectContext's object_info_t for new hit_set objects
We were fabricating an object_info_t correctly and writing it to disk, but
it was not reflected by the in-memory ObjectContext.  If something came
along quickly (like backfill) and tried to use it, the info would be
invalid.

Fix this by fabricating it in the obc and copying it to the new_obs for
the update.

Fixes: #7122
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
10547e6713 osd/ReplicatedPG: always return ENOENT on deleted snap
Previously, if a snap was deleted but the clone was there and we hadn't
trimmed it yet, we would still return the data.  Instead, return ENOENT
unconditionally (even it's not removed yet).  This makes the behavior from
the client perspective more predictable and conistent.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
8cab9e7657 ceph_test_rados_api_tier: partial test for promote vs snap trim race
This reliably returns ENODEV due to the test at the finish of flush.  Not
because we are actually racing with trim, though: the trimmer doesn't run
at all.  I believe it captures the important property, though.  Namely:
we should not write a promoted object that is "behind" the snap trimmer's
progress.  The fact that we are in front of it (the trimmer hasn't started
yet) should not matter since the object is logically deleted anyway.

We probably want to make the OSD return ENODEV on read in the normal case
when you try to access a clone that is pending trimming.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
8221a2a54d osd/ReplicatedPG: cleanly abort flush if the object no longer exists
If the object no longer exists (for example, because the snap trimmer just
killed it) clean up the flush state without trying to mark the object
clean.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
f3ce2549c5 osd/Replicated: mark obc !exists on snap trim
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
48306e47d0 mon: debug propagate_snaps_to_tiers
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
6719d30288 osd: fix propagation of removed snaps to other tiers
When we update removed_snaps we do not update snap_seq.  Drop this broken
optimization.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
7e80fa068e osd/ReplicatedPG: handle promote that races with snap deletion
If we are promoting a clone and realize that the object is no longer
defined for any snaps, abort the copy and delete any temp object.

If the defined snaps have changed, make sure they are updated in memory
so that on promote completion the snapshot metadata is correct.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
cd42368e3c osd/ReplicatedPG: simplify copy-from temp object handling
Previously the caller was generating a temp object name and passing it
down in severaly different ways.  Instead, generate one when we realize
that we need it, and store it in *one* place (CopyResults), where
the completions can get at the information.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
1a7335d535 ceph_test_rados_misc: test bad version for copy-from
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
7daab5ac61 osd/ReplicatedPG: adjust flow in process_copy_chunk
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
0b816c3342 osd/ReplicatedPG: make CopyResults inline in CopyOp
No reason to put this on the heap.  Make the lifetime match that of the
CopyOp.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
d00116c6ac ceph_test_rados: flush can also fail due to snap trimming
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
7eede85f8f osd/ReplicatedPG: handle promotion of rollback, src_oids, etc.
Make other find_object_context() callers handle the case where the object
in question needs to be promoted.  We add a flag here that forces a promote
for these secondary objects so that the entire operation happens in the
same pool.  Forwarding is not allowed in this case.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
ac446b5df3 osd/ReplicatedPG: preserve clean/dirty state on clone
If we have a clean object and clone it in make_writeable(), the clone
should also be clean (it does not need to be written back to the base
pool).  If the object was dirty, the clone should be dirty.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
27eb4c5e93 ceph_test_rados: improve read debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
627bdead1e osd/ReplicatedPG: infer snaps from head when promoting oldest clean clone
Consider:

 - base and cache have same object foo; marked clean in cache pool
 - modify + clone foo in cache pool.  foo clone is clean.
 - foo clone is evicted
 - foo clone is read, and promoted
 - we read foo@something from base pool, and get the head's content

copy-get does not provide us with a snaps list.  Instead, we use the
snap_seq from the head to infer what the snaps vector was in the cache
pool and will be in the base pool when we flush the updates to the object.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
21f3dcbd33 osd: include snap_seq in copy-get results
This is needed by the cache layer when reading a logical snap from a head
object on the backend in order to correctly recreate the clone in the
cache layer.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
c6b73eb469 osd/ReplicatedPG: always set obc->ssc SnapSetContext for clones
This can be useful!

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:46 -08:00
Sage Weil
934de77c66 osd/ReplicatedPG: do not promote nonexistent clones
Do not promote a clone for a snap that we know doesn't exist.  If
find_object_context() didn't give us a missing_oid, there is nothing to
promote.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:46 -08:00
Sage Weil
55b83f16d2 ceph_test_rados: is_dirty on non-flushing objects only
This makes its results reliable.  Otherwise, we can't mix the is_dirty
test with flush, which eliminates much of its value.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:45 -08:00
Sage Weil
af5a407cb2 ceph_test_rados: assert on read error
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:44 -08:00
Sage Weil
b70c476ab4 ceph_test_rados: make flush clean correct snap in model 2014-01-13 16:19:44 -08:00
Sage Weil
ac635513d3 ceph_test_rados: IsDirty on random snaps
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:44 -08:00
Sage Weil
6f4f651357 ceph_test_rados: test flush/evict on snaps
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:44 -08:00
Sage Weil
9688642c82 ceph_test_rados: don't update any state on successful cache-evict
- we didn't touch the user_version
- we didn't change the clean/dirty state

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:44 -08:00
Sage Weil
fc9f8ad59b ceph_test_rados_api_tier: test flush on snaps/clones
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
b2f752a9e1 osd/ReplicatedPG: construct appropriate snapc for flush/writeback
Construct a snap context that will trigger the appropriate cloning (if any)
on the base pool.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
5b8d957b9c osd: add pg_log_entry_t event type CLEAN
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
c91166eca0 osd/ReplicatedPG: refuse to flush when older dirty clones are present
If the next oldest clone is dirty, we cannot flush.  That is, we must
always flush starting with the oldest dirty clone.

Note that we can never have a sequence like dirty -> clean -> dirty,
because clones are only dirty on creation, are created in order, and cannot
be flushed (cleaned) out of order.  Thus checking the previous clone is
sufficient (and thankfully cheap).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
6bff648de9 vstart.sh: allow MDS=0
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
de8e8b5d09 osd/ReplicatedPG: make cache-[try-]flush CACHE instead of WR ops
This will allow us to send a flush op on a snap.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
4e8259db4f osd/ReplicatedPG: allow cache-evict on snaps
We do three things here:

 - make cache-evict a CACHE instead of WR op, allowing us to submit it
   on snaps (not just head)
 - allow eviction of a snap
 - verify that all snaps are missing before evicting a head

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
90e352ca73 osd: add rados CACHE mode (different from RD and WR)
It is useful to distinguish cache operations from read and modify
operations.  Specifically, we will allow cache ops to be sent for
snaps and also allow those ops to result in a write.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
1f4350e212 ceph_test_rados_api_tier: test promotion of clones
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
c05765e8bb osd/ReplicatedPG: update snap_mapper for promoted clones
A clone that comes into existence via promotion takes an entirely
different path than a typical clone (which comes into existence via a
CLONE op in make_writeable()).  Make sure snap_mapper is updated
accordingly.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
5c94d530fb osd/ReplicatedPG: only encode SnapSet on head objects in finish_ctx
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:41 -08:00
Sage Weil
38fe575d56 osd/ReplicatedPG: always encode snaps in finish_ctx
On promote we use finish_ctx to build the final log entries, and need to
encode the snaps vector in that case.  (Normally this is done by
make_writeable or explicitly by the snap trimmer.)

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:40 -08:00
Sage Weil
bfd4530189 osd/ReplicatedPG: mirror SnapSet info when promoting head
When we promote the head for an object, get the list of snaps from the
backend pool and construct an appropriate SnapSet.  Note that this is
always placed on the head in the cache pool, since we will have a
whiteout object in this case.

Also note that the SnapSet's list of snapids will not include any snaps
for which there were no clones.  This is fine, since it is only used for
creating clones, and we've already done that.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:40 -08:00
Sage Weil
0554735872 osd/osd_types: SnapSet::from_snap_set
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
c70edf3e03 osd/ReplicatedPG: add PROMOTE log entry type
This is an alternative to MODIFY that indicates the object was just
promoted from another tier.  Thanksfully, is_modify() is used in very
few places!

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
b840aae1e7 osd/ReplicatedPG: adjust clone stats when promoting clones
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
6dd0a1f0d6 osd/ReplicatedPG: include snaps in copy-get results
When promoting a snapped object, we need to also get the set of snaps over
which the clone is defined.  This is not strictly available except via the
list-snaps rados call, but that is only used on the snapdir object much
earlier when the head (whiteout) is promoted, and is not conveniently
available now.  Adding it to the internal copy-get is not exposed via
librados (copy-get is not exposed at all) so I don't think this is a
problem.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
d22ecf3e06 osd/ReplicatedPG: using missing_oid to decide which object to promote
find_object_context() now tells us which object it could use if it
doesn't find it on disk.  Promote that one.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00