Commit Graph

30929 Commits

Author SHA1 Message Date
Sage Weil
9688642c82 ceph_test_rados: don't update any state on successful cache-evict
- we didn't touch the user_version
- we didn't change the clean/dirty state

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:44 -08:00
Sage Weil
fc9f8ad59b ceph_test_rados_api_tier: test flush on snaps/clones
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
b2f752a9e1 osd/ReplicatedPG: construct appropriate snapc for flush/writeback
Construct a snap context that will trigger the appropriate cloning (if any)
on the base pool.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
5b8d957b9c osd: add pg_log_entry_t event type CLEAN
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
c91166eca0 osd/ReplicatedPG: refuse to flush when older dirty clones are present
If the next oldest clone is dirty, we cannot flush.  That is, we must
always flush starting with the oldest dirty clone.

Note that we can never have a sequence like dirty -> clean -> dirty,
because clones are only dirty on creation, are created in order, and cannot
be flushed (cleaned) out of order.  Thus checking the previous clone is
sufficient (and thankfully cheap).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
6bff648de9 vstart.sh: allow MDS=0
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
de8e8b5d09 osd/ReplicatedPG: make cache-[try-]flush CACHE instead of WR ops
This will allow us to send a flush op on a snap.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:43 -08:00
Sage Weil
4e8259db4f osd/ReplicatedPG: allow cache-evict on snaps
We do three things here:

 - make cache-evict a CACHE instead of WR op, allowing us to submit it
   on snaps (not just head)
 - allow eviction of a snap
 - verify that all snaps are missing before evicting a head

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
90e352ca73 osd: add rados CACHE mode (different from RD and WR)
It is useful to distinguish cache operations from read and modify
operations.  Specifically, we will allow cache ops to be sent for
snaps and also allow those ops to result in a write.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
1f4350e212 ceph_test_rados_api_tier: test promotion of clones
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
c05765e8bb osd/ReplicatedPG: update snap_mapper for promoted clones
A clone that comes into existence via promotion takes an entirely
different path than a typical clone (which comes into existence via a
CLONE op in make_writeable()).  Make sure snap_mapper is updated
accordingly.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:42 -08:00
Sage Weil
5c94d530fb osd/ReplicatedPG: only encode SnapSet on head objects in finish_ctx
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:41 -08:00
Sage Weil
38fe575d56 osd/ReplicatedPG: always encode snaps in finish_ctx
On promote we use finish_ctx to build the final log entries, and need to
encode the snaps vector in that case.  (Normally this is done by
make_writeable or explicitly by the snap trimmer.)

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:40 -08:00
Sage Weil
bfd4530189 osd/ReplicatedPG: mirror SnapSet info when promoting head
When we promote the head for an object, get the list of snaps from the
backend pool and construct an appropriate SnapSet.  Note that this is
always placed on the head in the cache pool, since we will have a
whiteout object in this case.

Also note that the SnapSet's list of snapids will not include any snaps
for which there were no clones.  This is fine, since it is only used for
creating clones, and we've already done that.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:40 -08:00
Sage Weil
0554735872 osd/osd_types: SnapSet::from_snap_set
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
c70edf3e03 osd/ReplicatedPG: add PROMOTE log entry type
This is an alternative to MODIFY that indicates the object was just
promoted from another tier.  Thanksfully, is_modify() is used in very
few places!

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
b840aae1e7 osd/ReplicatedPG: adjust clone stats when promoting clones
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
6dd0a1f0d6 osd/ReplicatedPG: include snaps in copy-get results
When promoting a snapped object, we need to also get the set of snaps over
which the clone is defined.  This is not strictly available except via the
list-snaps rados call, but that is only used on the snapdir object much
earlier when the head (whiteout) is promoted, and is not conveniently
available now.  Adding it to the internal copy-get is not exposed via
librados (copy-get is not exposed at all) so I don't think this is a
problem.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
d22ecf3e06 osd/ReplicatedPG: using missing_oid to decide which object to promote
find_object_context() now tells us which object it could use if it
doesn't find it on disk.  Promote that one.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
c3c1541c73 osd/ReplicatedPG: make find_object_context() pass missing_oid
Prevoiusly we would return a snapid that we are blocked on if it is
missing.  This is necessary because the missing clone does not always
match the logical snap we are trying to read.

Extend this to return a full hobject_t that is the missing object we want.
For the missing clone case, this cleans things up slightly.  More
importantly, it lets find_object_context also tell us which on-disk
object is missing that, if it could be promoted, would help.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 16:19:39 -08:00
Sage Weil
33b5ef4030 mon/PGMap: make decode version match encode version
These should have been bumped way back in 091809b8.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 15:52:18 -08:00
Sage Weil
a5aaab3c57 ceph-dencoder: include offset in 'stray data' error message
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 15:52:16 -08:00
Sage Weil
1308225cc4 buffer: do not append trailing newline when appending empty istream
If we call

 bl.append(some_istream);

do not include a \n if the istream is empty (which is apparently is not
the same thing as eof).  This was causing 'ceph pg getmap' to include a
trailing newline.

Probably we don't want this newline at all!  But all callers need to be
fixed for that change.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-13 15:52:13 -08:00
athanatos
196e3d6278 Merge pull request #931 from ceph/wip-5858-rebase
Wip 5858 rebase

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-01-13 14:25:51 -08:00
Ken Dreyer
946d603695 v0.75 2014-01-13 21:07:01 +00:00
John Wilkins
90343708b3 doc: Added comment and example for SSL enablement in rgw.conf
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-01-13 12:57:02 -08:00
David Zafman
c0d92b6744 osd: Implement multiple backfill target handling
Fixes: #5858

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-13 11:39:05 -08:00
David Zafman
a657fad47a osd: Interim backfill changes
Make peer_backfill_info a map which holds a
BackfillInterval for all backfill targets.
Initially see if recover_backfill() can just backfill
the first one and mark them all finished.

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-13 11:39:05 -08:00
Sage Weil
09ae4bc2aa Merge pull request #1077 from ceph/wip-7141
DBObjectMap::clear_keys_header: use generate_new_header, not _generate_n...

Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-13 11:22:49 -08:00
Samuel Just
4c92dc6f5c DBObjectMap::clear_keys_header: use generate_new_header, not _generate_new_header
We aren't holding the header_lock here, so we need the locked version.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-01-13 11:02:46 -08:00
Loic Dachary
93a9b686b6 erasure-code: use uintptr_t instead of long long
Checking the pointer alignment using a cast to long long raises a
warning when --Wpointer-to-int-cast is given.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-13 18:16:09 +01:00
Sage Weil
5f165ed391 Merge pull request #1075 from dachary/wip-crush
improve crushtool --build useability and documentation

Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-13 08:46:04 -08:00
Gregory Farnum
0fd6a2446f Merge pull request #1072 from ceph/wip-tier-snap
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-01-13 08:33:52 -08:00
Loic Dachary
0082d88c7e doc: format man pages with s/2013/2014/
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 18:06:06 +01:00
Loic Dachary
b4054fcc8d doc: copyright s/2013/2014/
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 18:06:06 +01:00
Loic Dachary
efbdd16380 doc: update the crushtool manual page
* add information about CEPH_ARGS
* rework the --build documentation and example
* add an Author section
* replace vi with emacs for no good reason
* cleanup whitespace

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 18:06:06 +01:00
Loic Dachary
283793ac44 doc: crushtool man page nroff format
also includes a modification from a prior patch series that was not
formatted to nroff.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 18:06:06 +01:00
Loic Dachary
d3393e9d10 crush: tests for crushtool --build
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 18:06:06 +01:00
Loic Dachary
26f7fa96b0 crush: crushtool copyright notice update
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:33 +01:00
Loic Dachary
b705e52304 crush: crushtool emacs compile helper
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:33 +01:00
Loic Dachary
d3d75a2165 crush: crushtool --build informative messages
* dump the crush tree created by --build at debug level 1.

* display a warning at debug level 1 if there is more than one root. In
  most cases it is not what the user wants and it may be confusing
  because the ruleset will only apply to the first of root and have less
  devices under it as expected.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:33 +01:00
Loic Dachary
5b28405a85 crush: crushtool --build uses OSDMap helpers for rulesets
Instead of creating a ruleset from scratch, use the
OSDMap::build_simple_crush_rulesets helper. It is more likely to match
the user expecations.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:33 +01:00
Loic Dachary
1368229e04 crush: print --build debug information when verbose 2
instead of verbose 0

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:32 +01:00
Loic Dachary
5b95d183c8 crush: display args on crushtool failure
When the number of args provided to --build is not a multiple of 3,
display the arguments which do not comply.

For instance the --debug_crush 0 option is not consumed by global_init
in crushtool because, unlike most ceph tools, the arguments are not
passed to global_init. As a result --debug_crush 0 become part of the
arguments and triggers the failure.

   crushtool --debug_crush 0 --build --num_osds 320 node straw 4
   remaining args: [--debug_crush,0,node,straw,4]
   layers must be specified with 3-tuples of (name, buckettype, size)

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:32 +01:00
Loic Dachary
2765f81aff crush: parse CEPH_ARGS in crushtool
The arguments are not given to global_init because the -c option would
conflict. Reading arguments from CEPH_ARGS the way other ceph tools do
is the only way to control verbosity ( via --debug_crush 0 for instance ).

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:32 +01:00
Loic Dachary
cf9a764957 osd: factorize build_simple_crush_map* rulesets creation
Group the rulesets created by build_simple_crush_map* into a helper:
build_simple_crush_rulesets()

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:32 +01:00
Loic Dachary
76765503fa osd: ostream is enough for build_simple*
There is no need to specialize the argument into stringstream. It is
replaced by a ostream which is convenient to display errors directly to
cerr if appropriate.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-01-12 17:48:32 +01:00
Yan, Zheng
dae248f273 Merge pull request #998 from ceph/wip-omapdirfrag2
use OMAP to store dirfrags
2014-01-10 15:48:05 -08:00
Sage Weil
cec8d85853 mds: require CEPH_FEATURE_OSD_TMAP2OMAP
Require that all OSDs support TMAP2OMAP before starting the MDS.  This
avoids doing some work and then crashing with EOPNOTSUPP, and gives us
a more informative message in the logs.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-11 07:41:40 +08:00
Sage Weil
1d8429de57 osd/OSDMap: get_up_osd_features()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-11 07:40:37 +08:00