Commit Graph

30831 Commits

Author SHA1 Message Date
Greg Farnum
045e1d75a7 OSDMap: add primary-specifying pg_to_acting_osds
This works the same as pg_to_up_acting_osds

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
93d481a5d2 mon, osdmaptool: switch to primary-specifying pg_to_up_acting_osds
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
9749f30cdf OSDMap: implement pg_to_up_acting_osds with primary interface
Use our pointer calling conventions instead of a reference for the
new version of the function.

Right now we're just setting the primaries equal to the first member
of up and acting (or -1 if none), but very shortly we'll modify our
private OSDMap functions to export them based on the contents of temp_primary.
While in general anybody querying for the mapping information will
need to pay attention to whom the primary is as well, we have lots
of callers who will need real code changes to do so. To serve them,
we keep a version that does not export the primary, but asserts
that the primary matches the first entry in its list.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
5b699782d1 OSDMap: switch pg_to_osds to have an explicit primary param
Use pointers instead of references for the out params, too!

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
5367d92e0b OSDMap: rename _raw_to_temp_osds() -> _get_temp_osds()
This function does not (and never has!) used the raw vector, so remove it
and don't use a name which implies it is doing any sort of conversion.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
69a2ec27f0 OSDMap: unify the pg_to_acting_osds and pg_to_up_acting_osds implementations
These were the same except for a call to _raw_to_up_osds(). Move the
existing pg_to_up_acting_osds into a private function taking a pointer,
only fill in the up vector if it's a non-NULL pointer, and call it via
the obvious header implementations.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
c1a95f83ac OSDMap: remove get_pg_primary() function
This was used only by SyntheticClient, and that wants get_pg_acting_primary()
anyway. Delete the easily-misused get_pg_primary() and switch.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
7a9c1712f4 OSDMap: doc the different pg->OSD mapping functions
Some of these look like what you should use for mapping and they absolutely
are not suitable for that. Make it clearer.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
268ae82ac3 osd: do not misuse calc_pg_role
We've been using the role returned from this to determine if we're
the primary or not. Don't.
This is mostly about removing a few asserts; while in there I also
redirected some calls to use static dereference instead of going through
the osdmap lookup path.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:05 -08:00
Greg Farnum
a09d4f171e PG: do not use role == 0 as a determinant of primacy
We already have an is_primary() function to use instead.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-01-15 16:33:04 -08:00
Josh Durgin
c60ae09b38 Merge pull request #978 from ceph/wip-3454
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-01-15 15:28:31 -08:00
Yehuda Sadeh
644afd67b9 radosgw-admin: add temp url params to usage
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-01-15 15:12:40 -08:00
athanatos
980ef0e8b3 Merge pull request #1089 from dachary/wip-mailmap
mailmap: add athanatos <sam.just@inktank.com>

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-01-15 10:25:28 -08:00
athanatos
73e469c966 Merge pull request #963 from dachary/wip-erasure-code-api
erasure code interface helpers

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-01-15 10:22:47 -08:00
John Wilkins
970f9387bd doc: Updated paths for OSDs using the OS disk.
fixes: #6682

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-01-15 10:08:28 -08:00
Loic Dachary
1ffe4226c2 mailmap: add athanatos <sam.just@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-15 09:23:09 +01:00
Sage Weil
4050eae32c Merge pull request #1084 from dachary/wip-cephtool-test
qa: cleanup cephtool/test.sh tmp files

Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-14 21:57:48 -08:00
Gregory Farnum
f19adc919d Merge pull request #1085 from dachary/ceph-master
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-01-14 15:12:34 -08:00
Loic Dachary
4b5f2570e9 common: fix bufferlist::append(istream) test
bufferlist::append(istream) now filters out empty lines; reflect this in
the test

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-15 00:08:41 +01:00
Sage Weil
e55a08964f doc/release-notes: v0.75
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-14 09:37:52 -08:00
Loic Dachary
08c17b7c5c qa: cleanup cephtool/test.sh tmp files
When run in a shared environment ( as opposed as a machine created for
the purpose of running this test only ), it is important to cleanup
leftovers to avoid poluting the /tmp space. Create a common temporary
directory for all tmp files.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-14 17:31:04 +01:00
Ken Dreyer
03d7d97d5d Merge branch 'next' 2014-01-14 16:16:41 +00:00
Loic Dachary
a520026eb4 Merge pull request #1076 from dachary/wip-vector-op
erasure-code: use uintptr_t instead of long long

Reviewed-by: Andreas Peters <andreas.joachim.peters@cern.ch>
2014-01-14 08:10:59 -08:00
Loic Dachary
dc4e212d6a Merge pull request #1078 from ceph/wip-mon-pgmap
mon: make 'pg getmap' not include a trailing newline

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-01-13 22:38:09 -08:00
Sage Weil
66a4f8a291 Merge pull request #1071 from ceph/wip-max-file-size
allow mds max file size to be adjusted

Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-01-13 17:43:49 -08:00
Sage Weil
c5cacf4e56 Merge pull request #1058 from ceph/wip-cache-snap
snap/clone promotion, flush, and other goodies

This is now passing the thrashing with both cache and snap ops:
  sage-2014-01-13_15:45:26-rados:thrash-wip-cache-snap-testing-basic-plana

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-01-13 16:50:17 -08:00
Sage Weil
be8db8c338 osd/ReplicatedPG: use get_object_context in trim_object
find_object_context() has all the logic to choose a particular clone given
a logical snap.  In the trim case, we want none of that: we just need to
pull the obc for a specific clone instance.  Note that this changes
none of the failure cases (previous we asserted r == 0).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:50 -08:00
Sage Weil
b5ae76e8fe ceph_test_rados: do not delete in-use snaps
There are a bunch of ops that read from snaps.  Do not delete a snap
while they are in use.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
8b39719a10 osd/OSDMonitor: fix 'osd tier add ...' pool mangling
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
d41a1d3d82 osd/ReplicatedPG: update ObjectContext's object_info_t for new hit_set objects
We were fabricating an object_info_t correctly and writing it to disk, but
it was not reflected by the in-memory ObjectContext.  If something came
along quickly (like backfill) and tried to use it, the info would be
invalid.

Fix this by fabricating it in the obc and copying it to the new_obs for
the update.

Fixes: #7122
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
10547e6713 osd/ReplicatedPG: always return ENOENT on deleted snap
Previously, if a snap was deleted but the clone was there and we hadn't
trimmed it yet, we would still return the data.  Instead, return ENOENT
unconditionally (even it's not removed yet).  This makes the behavior from
the client perspective more predictable and conistent.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
8cab9e7657 ceph_test_rados_api_tier: partial test for promote vs snap trim race
This reliably returns ENODEV due to the test at the finish of flush.  Not
because we are actually racing with trim, though: the trimmer doesn't run
at all.  I believe it captures the important property, though.  Namely:
we should not write a promoted object that is "behind" the snap trimmer's
progress.  The fact that we are in front of it (the trimmer hasn't started
yet) should not matter since the object is logically deleted anyway.

We probably want to make the OSD return ENODEV on read in the normal case
when you try to access a clone that is pending trimming.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
8221a2a54d osd/ReplicatedPG: cleanly abort flush if the object no longer exists
If the object no longer exists (for example, because the snap trimmer just
killed it) clean up the flush state without trying to mark the object
clean.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
f3ce2549c5 osd/Replicated: mark obc !exists on snap trim
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:49 -08:00
Sage Weil
48306e47d0 mon: debug propagate_snaps_to_tiers
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
6719d30288 osd: fix propagation of removed snaps to other tiers
When we update removed_snaps we do not update snap_seq.  Drop this broken
optimization.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
7e80fa068e osd/ReplicatedPG: handle promote that races with snap deletion
If we are promoting a clone and realize that the object is no longer
defined for any snaps, abort the copy and delete any temp object.

If the defined snaps have changed, make sure they are updated in memory
so that on promote completion the snapshot metadata is correct.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
cd42368e3c osd/ReplicatedPG: simplify copy-from temp object handling
Previously the caller was generating a temp object name and passing it
down in severaly different ways.  Instead, generate one when we realize
that we need it, and store it in *one* place (CopyResults), where
the completions can get at the information.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
1a7335d535 ceph_test_rados_misc: test bad version for copy-from
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
7daab5ac61 osd/ReplicatedPG: adjust flow in process_copy_chunk
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:48 -08:00
Sage Weil
0b816c3342 osd/ReplicatedPG: make CopyResults inline in CopyOp
No reason to put this on the heap.  Make the lifetime match that of the
CopyOp.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
d00116c6ac ceph_test_rados: flush can also fail due to snap trimming
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
7eede85f8f osd/ReplicatedPG: handle promotion of rollback, src_oids, etc.
Make other find_object_context() callers handle the case where the object
in question needs to be promoted.  We add a flag here that forces a promote
for these secondary objects so that the entire operation happens in the
same pool.  Forwarding is not allowed in this case.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
ac446b5df3 osd/ReplicatedPG: preserve clean/dirty state on clone
If we have a clean object and clone it in make_writeable(), the clone
should also be clean (it does not need to be written back to the base
pool).  If the object was dirty, the clone should be dirty.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
27eb4c5e93 ceph_test_rados: improve read debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
627bdead1e osd/ReplicatedPG: infer snaps from head when promoting oldest clean clone
Consider:

 - base and cache have same object foo; marked clean in cache pool
 - modify + clone foo in cache pool.  foo clone is clean.
 - foo clone is evicted
 - foo clone is read, and promoted
 - we read foo@something from base pool, and get the head's content

copy-get does not provide us with a snaps list.  Instead, we use the
snap_seq from the head to infer what the snaps vector was in the cache
pool and will be in the base pool when we flush the updates to the object.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
21f3dcbd33 osd: include snap_seq in copy-get results
This is needed by the cache layer when reading a logical snap from a head
object on the backend in order to correctly recreate the clone in the
cache layer.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:47 -08:00
Sage Weil
c6b73eb469 osd/ReplicatedPG: always set obc->ssc SnapSetContext for clones
This can be useful!

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:46 -08:00
Sage Weil
934de77c66 osd/ReplicatedPG: do not promote nonexistent clones
Do not promote a clone for a snap that we know doesn't exist.  If
find_object_context() didn't give us a missing_oid, there is nothing to
promote.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:46 -08:00
Sage Weil
55b83f16d2 ceph_test_rados: is_dirty on non-flushing objects only
This makes its results reliable.  Otherwise, we can't mix the is_dirty
test with flush, which eliminates much of its value.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:45 -08:00