Commit Graph

28573 Commits

Author SHA1 Message Date
Yehuda Sadeh
d8cfe80c0b Merge pull request #495 from kri5/wip-5820
rgw: rgw-admin throw an error when invalid flag is passed

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-23 13:16:16 -07:00
Sage Weil
7372b6a7c8 Merge pull request #533 from ceph/wip-osd-healthy-tuanble
osd: add 'osd heartbeat min healthy ratio' tunable

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-23 12:45:06 -07:00
Yehuda Sadeh
057588f41a Merge pull request #535 from ceph/wip-readdir-r-sucks
Fix readdir_r invocation

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-23 12:00:30 -07:00
Sage Weil
99a2ff7da9 os: make readdir_r buffers larger
PATH_MAX isn't quite big enough.

Backport: dumpling, cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-23 11:45:35 -07:00
Sage Weil
2df66d9fa2 os: fix readdir_r buffer size
The buffer needs to be big or else we're walk all over the stack.

Backport: dumpling, cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-23 11:45:08 -07:00
Joao Eduardo Luis
7a091d3161 os: KeyValueDB: expose interface to obtain estimated store size
On LevelDBStore, instead of using leveldb's GetApproximateSizes() function,
we will instead assess what's the store's raw size from the contents of
the store dir (this means .sst's, .log's, etc).  The reason behind this
approach is that GetApproximateSizes() would expect us to provide a range
of keys for which to obtain an approximate size; on the other hand, what we
really want is to obtain the size of the store -- not the size of the
data (besides, with the compaction issues we've been seeing, we wonder
how reliable such approximation would be).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-23 19:05:25 +01:00
Sage Weil
fe5010380a mon/Paxos: fix another uncommitted value corner case
It is possible that we begin the paxos recovery with an uncommitted
value for, say, commit 100.  During last/collect we discover 100 has been
committed already.  But also, another node provides an uncommitted value
for 101 with the same pn.  Currently, we refuse to learn it, because the
pn is not strictly > than our current uncommitted pn... even though it is
the next last_committed+1 value that we need.

There are two possible fixes here:

 - make this a >= as we can accept newer values from the same pn.
 - discard our uncommitted value metadata when we commit the value.

Let's do both!

Fixes: #6090
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-23 10:38:53 -07:00
Yehuda Sadeh
0373d749ce rgw: bucket meta remove don't overwrite entry point first
Fixes: #6056

When removing a bucket metadata entry we first unlink the bucket
and then we remove the bucket entrypoint object. Originally
when unlinking the bucket we first overwrote the bucket entrypoint
entry marking it as 'unlinked'. However, this is not really needed
as we're just about to remove it. The original version triggered
a bug, as we needed to propagate the new header version first (which
we didn't do, so the subsequent bucket removal failed).

Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-23 10:10:57 -07:00
Alfredo Deza
f040020fb2 ceph-disk: specify the filetype when mounting
Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-23 08:15:06 -07:00
Sage Weil
f4040238c4 doc/release-notes: v0.67.2
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-23 08:12:46 -07:00
Yehuda Sadeh
12ca952569 Merge pull request #528 from kri5/wip-radosgw-admin-help
rgw: Adds --system option help to radosgw-admin

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-23 07:17:39 -07:00
Christophe Courtaut
3a4f1ceda5 rgw: Adds --system option help to radosgw-admin
Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
2013-08-23 10:22:14 +02:00
Sage Weil
5637516a30 osd: add 'osd heartbeat min healthy ratio' tunable
This was hard-coded to 1/3; make it tunable.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 21:44:31 -07:00
Sage Weil
b003e5fddc Merge pull request #532 from dmick/next
PGMonitor: pg dump_stuck should respect --format (plain works fine)

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-22 21:34:57 -07:00
Sandon Van Ness
40f43a028e QA: Compile fsstress if missing on machine.
Some distro's have a lack of ltp-kernel packages and all we need is
fstress. This just modified the shell script to download/compile
fstress from source and copy it to the right location if it doesn't
currently exist where it is expected. It is a very small/quick
compile and currently only SLES and debian do not have it already.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-08-22 19:52:16 -07:00
Sandon Van Ness
4b97fcb5c1 QA: Compile fsstress if missing on machine.
Some distro's have a lack of ltp-kernel packages and all we need is
fstress. This just modified the shell script to download/compile
fstress from source and copy it to the right location if it doesn't
currently exist where it is expected. It is a very small/quick
compile and currently only SLES and debian do not have it already.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-08-22 19:44:40 -07:00
Dan Mick
ab4e85da6a PGMonitor: pg dump_stuck should respect --format (plain works fine)
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-08-22 18:53:34 -07:00
Sage Weil
a0f3c643b6 init-ceph: behave if incompletely installed
e.g., Debian 'removed, config remains' state

Fixes: #5695
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 17:59:46 -07:00
Sage Weil
309569a6d0 mon/MonClient: release pending outgoing messages on shutdown
This fixes a small memory leak when we have messages queued for the mon
when we shut down.  It is harmless except for the valgrind leak check
noise that obscures real leaks.

Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 17:46:45 -07:00
Sage Weil
27b0411908 Merge remote-tracking branch 'gh/next' 2013-08-22 17:23:09 -07:00
Greg Farnum
226059e020 MOSDOpReply: set reassert_version for very old clients
I think this must make every sufficiently-old client fail on replay --
very bad!

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-22 15:32:48 -07:00
Sage Weil
98583b59aa yasm-wrapper: more futzing to behave on fedora 19
Some new arguments, and behave (return success) when the touch target isn't
specified.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 14:56:21 -07:00
Yehuda Sadeh
3d55534268 rgw: fix crash when creating new zone on init
Moving the watch/notify init before the zone init,
as we might need to send a notification.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-22 14:31:06 -07:00
Gary Lowell
5c5980bc84 ceph.spec.in: remove trailing paren in previous commit
Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-22 13:30:20 -07:00
Gary Lowell
9b667cef82 ceph.spec.in: Don't invoke debug_package macro on centos.
If the redhat-rpm-config package is installed, the debuginfo rpms will
be built by default.   The build will fail when the package installed
and the specfile also invokes the macro.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-22 11:07:16 -07:00
athanatos
67f160eb5a Merge pull request #414 from dachary/wip-5510
replace ObjectContext pointers with shared_ptr

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-22 10:24:52 -07:00
Sage Weil
a74247722a Merge pull request #527 from ceph/wip-mon-fix-verbose-output
mon: remove lingering debug output

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-22 09:17:16 -07:00
Sage Weil
b8a34d6eac Merge pull request #520 from ceph/wip-crc
This is better, faster intel optimized code.

Reviewed-by: Yehuda Sadeh <yehuda.sadeh@inktank.com>
2013-08-22 09:16:19 -07:00
Sage Weil
02e14c7390 Makefile: move all crc code into libcrc.la
This is simpler.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 09:15:05 -07:00
Sage Weil
e55809acd2 crc32c: add intel optimized crc32c implementation
This is from Intel's ISA-L library and licensed under BSD 3-clause.

It needs to build with yasm, which means we go through all sorts of pain
to make this work with libtool:

 - strip out args it doesn't understand with yasm-wrapper
 - detect whether it is recent enough during configure

The code is conditional on:

 - build-time support (yasm)
 - run-time support (sse4.2)

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 09:15:05 -07:00
Sage Weil
f008ac427c arch: add cpu probing
For now, just a check to see if we have SSE4.2.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 09:14:59 -07:00
Sage Weil
841a695527 yasm-wrapper: hide libtool insanity from yasm
libtool passes all kinds of crap to yasm that yasm does not understand.
Hide it with this ugly wrapper.  Sigh.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 09:14:58 -07:00
Sage Weil
6f833fe747 Merge pull request #529 from dachary/master
doc: fix erasure code formatting warnings and errors
2013-08-22 09:01:20 -07:00
Joao Eduardo Luis
55fa2e862e mon: Monitor: remove lingering debug message from f087d84b
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-22 16:46:50 +01:00
Loic Dachary
157f2227f4 doc: fix erasure code formatting warnings and errors
http://tracker.ceph.com/issues/4929 refs #4929

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 17:45:39 +02:00
Sage Weil
5a5a576e86 Merge pull request #525 from ksperis/rbdmap.init-fix
init-rbdmap: fix error on stop rbdmap

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-22 08:34:03 -07:00
Sage Weil
d70fd35595 mon/Paxos: ignore do_refresh() return value
Makes coverity happy.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 08:17:56 -07:00
Alexandre Oliva
617dc36d47 enable mds rejoin with active inodes' old parent xattrs
When the parent xattrs of active inodes that the mds attempts to open
during rejoin lack pool info (struct_v < 5), this field will be filled
in with -1, causing the mds to retry fetching a backtrace with a pool
number that matches the expected value, which fails and causes the
err==-ENOENT branch to be taken and retry pool 1, which succeeds, but
with pool -1, and so keeps on bouncing between the two retry cases
forever.

This patch arranges for the mds to go along with pool -1 instead of
insisting that it be refetched, enabling it to complete recovery
instead of eating cpu, network bandwidth and metadata osd's resources
like there's no tomorrow, in what AFAICT is an infinite and very busy
loop.

This is not a new problem: I've had it even before upgrading from
Cuttlefish to Dumpling, I'd just never managed to track it down, and
force-unmounting the filesystem and then restarting the mds was an
easier (if inconvenient) work-around, particularly because it always
hit when the filesystem was under active, heavy-ish use (or there
wouldn't be much reason for caps recovery ;-)

There are two issues not addressed in this patch, however.  One is
that nothing seems to proactively update the parent xattr when it is
found to be outdated, so it remains out of date forever.  Not even
renaming top-level directories causes the xattrs to be recursively
rewritten.  AFAICT that's a bug.

The other is that inodes that don't have a parent xattr (created by
even older versions of ceph) are reported as non-existing in the mds
rejoin message, because the absence of the parent xattr is signaled as
a missing inode (?failed to reconnect caps for missing inodes?).  I
suppose this may cause more serious recovery problems.

I suppose a global pass over the filesystem tree updating parent
xattrs that are out-of-date would be desirable, if we find any parent
xattrs still lacking current information; it might make sense to
activate it as a background thread from the backtrace decoding
function, when it finds a parent xattr that's too out-of-date, or as a
separate client (ceph-fsck?).

Backport: dumpling, cuttlefish
Signed-off-by: Alexandre Oliva <oliva@gnu.org>
Reviewed-by: Zheng, Yan <zheng.z.yan@intel.com>
2013-08-22 08:13:29 -07:00
Laurent Barbe
b419924b18 init-rbdmap: fix error on stop rbdmap
Avoid an error on stop service if many /dev/rbd* exist.

Signed-off-by: Laurent Barbe <laurent@ksperis.com>
2013-08-22 12:12:49 +02:00
Sage Weil
9242d01cc0 ceph-monstore-tool: shut up coverity
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-21 21:55:10 -07:00
Yan, Zheng
123f79bea8 store: fix issues reported by coverity
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-21 21:55:10 -07:00
Loic Dachary
d980f581e3 ReplicatedPG: create ObjectContext with SharedPtrRegistry
All new ObjectContext are replaced with calls to
SharedPtrRegistry::lookup_or_create to ensure that they are all
registered. Because the constructor is invoked with no argument, care
is taken to always initialize the destructor_callback data member
immediately afterwards.

ReplicatedPG::get_object_context contains a redundant call to
get_snapset_context that is removed.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
bd9f73d8bc ReplicatedPG: replace object_contexts.find with object_contexts.lookup
The std::map equivalent of find is SharedPtrRegistry::lookup

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
95349c028e ReplicatedPG: add Context to cleanup the PG after an ObjectContext deletion
ReplicatedPG::C_PG_ObjectContext is added to encapsulate a
call to ReplicatedPG::object_context_destructor_callback method
which is reponsible for

  * manually de-allocating the SnapSetContext of the ObjectContext if
    any. It will eventually be managed by a SharedPtrRegistry.

ReplicatedPG::C_PG_ObjectContext must be added to the destructor_callback
member of ObjectContext immediately after it is created.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
833a225008 ReplicatedPG: replace map iterators with SharedPtrRegistry::get_next
SharedPtrRegistry does not provide an iterator equivalent to

    map<hobject_t, ObjectContext*>::iterator i

It is replaced with a thread safe get_next method roughly used
as follows:

    pair<hobject_t, ObjectContextRef> i;
    while (object_contexts.get_next(i.first, &i))

All occurences of the iterator are replaced with get_next style
traversal.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
8c745944c9 ReplicatedPG: remove lookup_object_context method
Both ReplicatedPG::lookup_object_context and
ReplicatedPG::_lookup_object_context methods are provided by
SharedPtrRegistry.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
13f6807e9a ReplicatedPG: remove reference counting logic
ObjectContext manual reference counting and managing the
object_contexts object involves calls to

* obc->ref++ and obc->get()
* put_object_context and put_object_contexts
* register_object_context
* assertions on obc->registered

They are all removed because SharedPtrRegistry provides the
same service.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
ff70e764dc ReplicatedPG: ObjectContext * becomes ObjectContextRef
The map of hobject_t to ObjectContext is made a
SharedPtrRegistry owned by ReplicatedPG

    -  map<hobject_t, ObjectContext*> object_contexts;
    +  SharedPtrRegistry<hobject_t, ObjectContext> object_contexts;

All ObjectContext pointers are changed into ObjectContextRef, i.e.
shared_ptr.

In Watch.h std::tr1::shared_ptr<ObjectContext> is used instead
of ObjectContextRef because Watch.h is included before it is
defined.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Loic Dachary
7e85c6320c ReplicatedPG: ObjectContext is made compatible with SharedPtrRegistry
When creating a new object SharedPtrRegistry::lookup_or_create uses
the default ObjectContext constructor with no argument. The existing
ObjectContext constructor is modified to have no argument and the
initialization that was previously done within the constructor is done
by the caller (that only happens three times).

The ObjectContext::get method is removed: its only purpose is to
increment the ref.

The ObjectContext::registered data member is removed as well as all
the associated assert()

The ObjectContext::destructor_callback data member Context is added
and called by the destructor. It will allow the caller to perform
additional cleanup, if necessary.

All ObjectContext * data members are replaced with shared_ptr.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Loic Dachary
1688fb4842 ReplicatedPG: add Mutex to protect snapset_contexts
snapset_contexts_locks is added and locked in each function where
snapset_contexts or the SnapSetContext::ref data member needs to be
accessed or modified.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00