Commit Graph

28442 Commits

Author SHA1 Message Date
Sage Weil
f008ac427c arch: add cpu probing
For now, just a check to see if we have SSE4.2.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 09:14:59 -07:00
Sage Weil
841a695527 yasm-wrapper: hide libtool insanity from yasm
libtool passes all kinds of crap to yasm that yasm does not understand.
Hide it with this ugly wrapper.  Sigh.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 09:14:58 -07:00
Sage Weil
6f833fe747 Merge pull request #529 from dachary/master
doc: fix erasure code formatting warnings and errors
2013-08-22 09:01:20 -07:00
Joao Eduardo Luis
55fa2e862e mon: Monitor: remove lingering debug message from f087d84b
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-22 16:46:50 +01:00
Loic Dachary
157f2227f4 doc: fix erasure code formatting warnings and errors
http://tracker.ceph.com/issues/4929 refs #4929

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 17:45:39 +02:00
Sage Weil
5a5a576e86 Merge pull request #525 from ksperis/rbdmap.init-fix
init-rbdmap: fix error on stop rbdmap

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-22 08:34:03 -07:00
Sage Weil
d70fd35595 mon/Paxos: ignore do_refresh() return value
Makes coverity happy.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-22 08:17:56 -07:00
Alexandre Oliva
617dc36d47 enable mds rejoin with active inodes' old parent xattrs
When the parent xattrs of active inodes that the mds attempts to open
during rejoin lack pool info (struct_v < 5), this field will be filled
in with -1, causing the mds to retry fetching a backtrace with a pool
number that matches the expected value, which fails and causes the
err==-ENOENT branch to be taken and retry pool 1, which succeeds, but
with pool -1, and so keeps on bouncing between the two retry cases
forever.

This patch arranges for the mds to go along with pool -1 instead of
insisting that it be refetched, enabling it to complete recovery
instead of eating cpu, network bandwidth and metadata osd's resources
like there's no tomorrow, in what AFAICT is an infinite and very busy
loop.

This is not a new problem: I've had it even before upgrading from
Cuttlefish to Dumpling, I'd just never managed to track it down, and
force-unmounting the filesystem and then restarting the mds was an
easier (if inconvenient) work-around, particularly because it always
hit when the filesystem was under active, heavy-ish use (or there
wouldn't be much reason for caps recovery ;-)

There are two issues not addressed in this patch, however.  One is
that nothing seems to proactively update the parent xattr when it is
found to be outdated, so it remains out of date forever.  Not even
renaming top-level directories causes the xattrs to be recursively
rewritten.  AFAICT that's a bug.

The other is that inodes that don't have a parent xattr (created by
even older versions of ceph) are reported as non-existing in the mds
rejoin message, because the absence of the parent xattr is signaled as
a missing inode (?failed to reconnect caps for missing inodes?).  I
suppose this may cause more serious recovery problems.

I suppose a global pass over the filesystem tree updating parent
xattrs that are out-of-date would be desirable, if we find any parent
xattrs still lacking current information; it might make sense to
activate it as a background thread from the backtrace decoding
function, when it finds a parent xattr that's too out-of-date, or as a
separate client (ceph-fsck?).

Backport: dumpling, cuttlefish
Signed-off-by: Alexandre Oliva <oliva@gnu.org>
Reviewed-by: Zheng, Yan <zheng.z.yan@intel.com>
2013-08-22 08:13:29 -07:00
Laurent Barbe
b419924b18 init-rbdmap: fix error on stop rbdmap
Avoid an error on stop service if many /dev/rbd* exist.

Signed-off-by: Laurent Barbe <laurent@ksperis.com>
2013-08-22 12:12:49 +02:00
Sage Weil
9242d01cc0 ceph-monstore-tool: shut up coverity
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-21 21:55:10 -07:00
Yan, Zheng
123f79bea8 store: fix issues reported by coverity
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-21 21:55:10 -07:00
Loic Dachary
d980f581e3 ReplicatedPG: create ObjectContext with SharedPtrRegistry
All new ObjectContext are replaced with calls to
SharedPtrRegistry::lookup_or_create to ensure that they are all
registered. Because the constructor is invoked with no argument, care
is taken to always initialize the destructor_callback data member
immediately afterwards.

ReplicatedPG::get_object_context contains a redundant call to
get_snapset_context that is removed.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
bd9f73d8bc ReplicatedPG: replace object_contexts.find with object_contexts.lookup
The std::map equivalent of find is SharedPtrRegistry::lookup

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
95349c028e ReplicatedPG: add Context to cleanup the PG after an ObjectContext deletion
ReplicatedPG::C_PG_ObjectContext is added to encapsulate a
call to ReplicatedPG::object_context_destructor_callback method
which is reponsible for

  * manually de-allocating the SnapSetContext of the ObjectContext if
    any. It will eventually be managed by a SharedPtrRegistry.

ReplicatedPG::C_PG_ObjectContext must be added to the destructor_callback
member of ObjectContext immediately after it is created.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
833a225008 ReplicatedPG: replace map iterators with SharedPtrRegistry::get_next
SharedPtrRegistry does not provide an iterator equivalent to

    map<hobject_t, ObjectContext*>::iterator i

It is replaced with a thread safe get_next method roughly used
as follows:

    pair<hobject_t, ObjectContextRef> i;
    while (object_contexts.get_next(i.first, &i))

All occurences of the iterator are replaced with get_next style
traversal.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
8c745944c9 ReplicatedPG: remove lookup_object_context method
Both ReplicatedPG::lookup_object_context and
ReplicatedPG::_lookup_object_context methods are provided by
SharedPtrRegistry.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
13f6807e9a ReplicatedPG: remove reference counting logic
ObjectContext manual reference counting and managing the
object_contexts object involves calls to

* obc->ref++ and obc->get()
* put_object_context and put_object_contexts
* register_object_context
* assertions on obc->registered

They are all removed because SharedPtrRegistry provides the
same service.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:59 +02:00
Loic Dachary
ff70e764dc ReplicatedPG: ObjectContext * becomes ObjectContextRef
The map of hobject_t to ObjectContext is made a
SharedPtrRegistry owned by ReplicatedPG

    -  map<hobject_t, ObjectContext*> object_contexts;
    +  SharedPtrRegistry<hobject_t, ObjectContext> object_contexts;

All ObjectContext pointers are changed into ObjectContextRef, i.e.
shared_ptr.

In Watch.h std::tr1::shared_ptr<ObjectContext> is used instead
of ObjectContextRef because Watch.h is included before it is
defined.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Loic Dachary
7e85c6320c ReplicatedPG: ObjectContext is made compatible with SharedPtrRegistry
When creating a new object SharedPtrRegistry::lookup_or_create uses
the default ObjectContext constructor with no argument. The existing
ObjectContext constructor is modified to have no argument and the
initialization that was previously done within the constructor is done
by the caller (that only happens three times).

The ObjectContext::get method is removed: its only purpose is to
increment the ref.

The ObjectContext::registered data member is removed as well as all
the associated assert()

The ObjectContext::destructor_callback data member Context is added
and called by the destructor. It will allow the caller to perform
additional cleanup, if necessary.

All ObjectContext * data members are replaced with shared_ptr.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Loic Dachary
1688fb4842 ReplicatedPG: add Mutex to protect snapset_contexts
snapset_contexts_locks is added and locked in each function where
snapset_contexts or the SnapSetContext::ref data member needs to be
accessed or modified.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Loic Dachary
e1be37a375 PG: remove unused PG::_cond
http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Loic Dachary
be04918d44 sharedptr_registry: add a variant of get_next() and the empty() method
The SharedPtrRegistry::get_next() method with a value of type VPtr
instead of V is added because it is sometime more convenient to not
copy the value when walking the registry. The
SharedPtrRegistry::empty() predicate method is added.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-22 02:10:58 +02:00
Josh Durgin
17859e1477 Merge branch 'next' 2013-08-21 16:29:29 -07:00
Josh Durgin
8784564669 objecter: fix keys of dump_linger_ops
The registering flag no longer exists, and registered was using the
wrong property due to a copy-paste error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-08-21 16:26:09 -07:00
Josh Durgin
38a0ca66a7 objecter: resend unfinished lingers when osdmap is no longer paused
Plain Ops that haven't finished yet need to be resent if the osdmap
transitions from full or paused to unpaused.  If these Ops are
triggered by LingerOps, they will be cancelled instead (since
should_resend = false), but the LingerOps that triggered them will not
be resent.

Fix this by checking the registered flag for all linger ops, and
resending any of them that aren't paused anymore.

Fixes: #6070
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-08-21 16:01:04 -07:00
Yehuda Sadeh
d26ba3ab03 rgw: change cache / watch-notify init sequence
Fixes: #6046
We were initializing the watch-notify (through the cache
init) before reading the zone info which was much too
early, as we didn't have the control pool name yet. Now
simplifying init/cleanup a bit, cache doesn't call watch/notify
init and cleanup directly, but rather states its need
through a virtual callback.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-21 11:13:09 -07:00
John Wilkins
d5a877867a Merge branch 'master' of https://github.com/ceph/ceph 2013-08-21 11:02:26 -07:00
John Wilkins
576dce03f0 doc: Clarified quorum requirements.
fixes: #5412

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-08-21 11:01:48 -07:00
Sage Weil
b0f4be99ea Merge pull request #524 from ceph/wip-mon-delta
mon: add 'pg dump delta' to get just the rate info

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-21 11:00:45 -07:00
John Wilkins
deb43d9463 doc: Fixed typo.
fixes: #5968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-08-21 10:56:23 -07:00
Sage Weil
f28b01d2af Merge pull request #523 from dachary/master
doc: fix erasure code formatting warnings and errors
2013-08-21 10:36:54 -07:00
Loic Dachary
bebba3c858 doc: fix erasure code formatting warnings and errors
http://tracker.ceph.com/issues/4929 refs #4929

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-08-21 18:09:03 +02:00
Sage Weil
8437304c93 build-depend on yasm
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-21 08:31:12 -07:00
Sage Weil
33783e5f4b crc32c: note intel crc code copyrights
It's a BSD 3-clause.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-21 08:31:12 -07:00
Sage Weil
6ee1591d19 crc32c: add intel baseline algorithm
This is than the sctp code but probably slower.  We'll add it anywhere
just as a reference and to have a baseline for comparing performance.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-21 08:31:12 -07:00
Christophe Courtaut
552bfe5be2 vstart.sh: Adds more ENV variables to configure dev cluster
This patch adds a few ENV variables, so you can use vstart.sh
multiple time to launch multiple clusters

CEPH_DIR -> The working directory of the cluster
CEPH_DEV_DIR -> the dev directory of the cluster
CEPH_OUT_DIR -> the output directory of the cluster
CEPH_RGW_PORT -> the default radosgw port to start with

All theses new variables are set to default values if not specified,
which ones does not change the previous behaviour of vstart.sh

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
2013-08-21 13:45:05 +02:00
Sage Weil
a35ab949fd Merge remote-tracking branch 'gh/next' 2013-08-20 22:40:13 -07:00
Sage Weil
2af59d5e81 ceph-disk: partprobe after creating journal partition
At least one user reports that a partprobe is needed after creating the
journal partition.  It is not clear why sgdisk is not doing it, but this
fixes ceph-disk for them, and should be harmless for other users.

Fixes: #5599
Tested-by: lurbs in #ceph
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 22:39:09 -07:00
Sage Weil
cf8dbd248b Merge remote-tracking branch 'gh/wip-6004' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-08-20 16:57:46 -07:00
Sage Weil
edf2c3449e .gitignore: ignore test-driver
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 16:54:20 -07:00
Sage Weil
9833e9dabe fuse: fix warning when compiled against old fuse versions
client/fuse_ll.cc: In function 'void invalidate_cb(void*, vinodeno_t, int64_t, int64_t)':
warning: client/fuse_ll.cc:540: unused variable 'fino'

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 16:54:10 -07:00
Sage Weil
6abae35a39 json_spirit: remove unused typedef
In file included from json_spirit/json_spirit_writer.cpp:7:0:
json_spirit/json_spirit_writer_template.h: In function 'String_type json_spirit::non_printable_to_string(unsigned int)':
json_spirit/json_spirit_writer_template.h:37:50: warning: typedef 'Char_type' locally defined but not used [-Wunused-local-typedefs]
         typedef typename String_type::value_type Char_type;

(Also, ha ha, this file uses \r\n.)

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 16:54:05 -07:00
Sage Weil
c9cdd19d1c gtest: add build-aux/test-driver to .gitignore
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 16:54:02 -07:00
Sage Weil
e8e50f60bd crc32c: remove old intel implementation
The license is not LGPL compatible.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 16:42:53 -07:00
Sage Weil
a286090602 common/crc32c: refactor a bit
- the generic function without the _le suffix (useless)
- use a static global so that detection only happens once
- make the structure a bit cleaner to plug in new implementations

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 16:42:53 -07:00
Dan Mick
0ccb9be3b6 Merge pull request #517 from dmick/wip-6049
mon/PGMap: OSD byte counts 4x too large (conversion to bytes overzealous)

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-20 12:18:43 -07:00
Sage Weil
981eda9f77 mon/Paxos: always refresh after any store_state
If we store any new state, we need to refresh the services, even if we
are still in the midst of Paxos recovery.  This is because the
subscription path will share any committed state even when paxos is
still recovering.  This prevents a race like:

 - we have maps 10..20
 - we drop out of quorum
 - we are elected leader, paxos recovery starts
 - we get one LAST with committed states that trim maps 10..15
 - we get a subscribe for map 10..20
   - we crash because 10 is no longer on disk because the PaxosService
     is out of sync with the on-disk state.

Fixes: #6045
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-20 11:27:23 -07:00
Sage Weil
7e0848d8f8 mon/Paxos: return whether store_state stored anything
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-20 11:27:09 -07:00
Sage Weil
b9dee2285d mon/Paxos: cleanup: use do_refresh from handle_commit
This avoid duplicated code by using the helper created exactly for this
purpose.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-20 11:26:57 -07:00
Sage Weil
6ef1970340 pybind: fix Rados.conf_parse_env test
This happens after we connect, which means we get ENOSYS always.
Instead, parse_env inside the normal setup method, which had the added
benefit of being able to debug these tests.

Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-20 11:23:46 -07:00