Commit Graph

34999 Commits

Author SHA1 Message Date
Sage Weil
0e07f7f045 osd: fix theoretical use-after-free of OSDMap
In practice, the map will remain pinned for a while, but this
will make coverity happy.

*** CID 1231685:  Use after free  (USE_AFTER_FREE)
/osd/OSD.cc: 6223 in OSD::handle_osd_map(MOSDMap *)()
6217
6218           if (o->test_flag(CEPH_OSDMAP_FULL))
6219            last_marked_full = e;
6220           pinned_maps.push_back(add_map(o));
6221
6222           bufferlist fbl;
>>>     CID 1231685:  Use after free  (USE_AFTER_FREE)
>>>     Calling "encode" dereferences freed pointer "o".
6223           o->encode(fbl);
6224
6225           hobject_t fulloid = get_osdmap_pobject_name(e);
6226           t.write(coll_t::META_COLL, fulloid, 0, fbl.length(), fbl);
6227           pin_map_bl(e, fbl);
6228           continue;

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 14:51:31 -07:00
Sage Weil
44a0e3766a Merge pull request #2259 from ceph/wip-9039
Wip 9039

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-16 13:41:41 -07:00
Sage Weil
34fe7a8214 Merge pull request #2217 from ceph/wip-problem-osds
mon: 'ceph osd blocked-by' for histogram of peers OSDs are waiting for

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-16 13:15:10 -07:00
Sage Weil
14614e013f qa/workunits/rest/test.py: fix 'df' test to use total_used_bytes
This changed back in ee2dbdb0f5

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:06:02 -07:00
Sage Weil
ee9e1eadab Merge pull request #2271 from ceph/wip-9053
paxos: fix problem with disjoint quorum members

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-08-16 09:18:19 -07:00
Alfredo Deza
a14a700acc Merge pull request #2270 from ceph/wip-init-ceph
init-ceph: don't use bashism

Reviewed-by: Alfredo Deza <adeza@redhat.com>
2014-08-15 19:42:59 -04:00
Sage Weil
0d6d1aa7e0 init-ceph: don't use bashism
-z STRING
              the length of STRING is zero

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 16:41:43 -07:00
Alfredo Deza
dc611e864b Merge pull request #2247 from ceph/wip-ceph-disk
ceph-disk: fix various dmcrypt bugs

Reviewed-by: Alfredo Deza <adeza@redhat.com>
2014-08-15 19:40:15 -04:00
Loic Dachary
082db05c81 Merge pull request #2269 from ceph/wip-osd-mon-feature
osd: fix mon feature requirement

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-08-16 00:19:59 +02:00
Sage Weil
1d0c66ae3a Merge remote-tracking branch 'gh/next' 2014-08-15 15:01:23 -07:00
Boris Ranto
7df67a544f Fix -Wno-format and -Werror=format-security options clash
This causes build failure in latest fedora builds, ceph_test_librbd_fsx adds -Wno-format cflag but the default AM_CFLAGS already contain -Werror=format-security, in previous releases, this was tolerated but in the latest fedora rawhide it no longer is, ceph_test_librbd_fsx builds fine without -Wno-format on x86_64 so there is likely no need for the flag anymore

Signed-off-by: Boris Ranto <branto@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-15 15:00:59 -07:00
Sage Weil
ae0b9f1776 osd: fix feature requirement for mons
These features should be set on the client_messenger, not
cluster_messenger.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 14:29:11 -07:00
Sage Weil
d9e96b1708 Merge pull request #2268 from ceph/wip-9119
Wip 9119

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-15 14:11:10 -07:00
Samuel Just
0db3e51165 ReplicatedPG::maybe_handle_cache: do not forward RWORDERED reads
Even with READFORWARD, we can't forward RWORDERED reads.

Fixes: #9119
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-08-15 14:04:20 -07:00
Samuel Just
5040413054 ReplicatedPG::cancel_copy: clear cop->obc
Otherwise, an objecter callback might still be hanging
onto this reference until after the flush.

Fixes: #8894
Introduced: 589b639af7
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-08-15 14:04:20 -07:00
Sage Weil
eb589428dd Merge pull request #2264 from ceph/wip-crush-features
do not require crush features for rules that aren't being used

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-08-15 13:55:36 -07:00
Sage Weil
2f0e2951d7 unittest_osdmap: test EC rule and pool features
TODO: tiering feature bits.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 13:54:11 -07:00
Sage Weil
e4d238bbcf Merge pull request #2266 from kevincox/removewirehsark
Remove Old Wireshark Dissectors

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-15 13:41:15 -07:00
Samuel Just
cab479367a Merge pull request #2070 from somnathr/wip-sd-filestore-optimization
Wip sd filestore optimization

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-15 13:37:54 -07:00
Kevin Cox
0b2761036f Remove Old Wireshark Dissectors
Remove the two old Wireshark plugins.  They do not build and are
superseded by the dissector which is inside Wireshark.

Signed-Off-By: Kevin Cox <kevincox@kevincox.ca>
2014-08-15 15:27:13 -04:00
Sage Weil
16dadb86e0 osd: only require crush features for rules that are actually used
Often there will be a CRUSH rule present for erasure coding that uses the
new CRUSH steps or indep mode.  If these rules are not referenced by any
pool, we do not need clients to support the mapping behavior.  This is true
because the encoding has not changed; only the expected CRUSH output.

Fixes: #8963
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 08:55:27 -07:00
Sage Weil
1d95486780 crush: add is_v[23]_rule(ruleid) methods
Add methods to check if a *specific* rule uses v2 or v3 features.  Refactor
the existing checks to use these.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 08:55:27 -07:00
Loic Dachary
cb4c564933 Merge pull request #2213 from dachary/wip-9025-chunk-remapping
erasure-code: chunk remapping

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-15 12:43:03 +02:00
Sage Weil
c54f1e4d66 mon/Paxos: share state and verify contiguity early in collect phase
We verify peons are contiguous and share new paxos states to catch peons
up at the end of the round.  Do this each time we (potentially) get new
states via a collect message.  This will allow peons to be pulled forward
and remain contiguous when they otherwise would not have been able to.
For example, if

  mon.0 (leader)  20..30
  mon.1 (peon)    15..25
  mon.2 (peon)    28..40

If we got mon.1 first and then mon.2 second, we would store the new txns
and then boot mon.1 out at the end because 15..25 is not contiguous with
28..40.  However, with this change, we share 26..30 to mon.1 when we get
the collect, and then 31..40 when we get mon.2's collect, pulling them
both into the final quorum.

It also breaks the 'catch-up' work into smaller pieces, which ought to
smooth out latency a bit.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-14 16:56:14 -07:00
Sage Weil
3e5ce5f0dc mon/Paxos: verify all new peons are still contiguous at end of round
During the collect phase we verify that each peon has overlapping or
contiguous versions as us (and can therefore be caught up with some
series of transactions).  However, we *also* assimilate any new states we
get from those peers, and that may move our own first_committed forward
in time.  This means that an early responder might have originally been
contiguous, but a later one moved us forward, and when the round finished
they were not contiguous any more.  This leads to a crash on the peon
when they get our first begin message.

For example:

 - we have 10..20
 - first peon has 5..15
   - ok!
 - second peon has 18..30
   - we apply this state
 - we are now 18..30
 - we finish the round
   - send commit to first peon (empty.. we aren't contiguous)
   - send no commit to second peon (we match)
 - we send a begin for state 31
   - first peon crashes (it's lc is still 15)

Prevent this by checking at the end of the round if we are still
contiguous.  If not, bootstrap.  This is similar to the check we do above,
but reverse to make sure *we* aren't too far ahead of *them*.

Fixes: #9053
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-14 16:55:58 -07:00
Loic Dachary
5c2d2320c0 erasure-code: remap chunks if not sequential
If the remap vector is not empty, use it to figure out the sequence of
data chunks.

http://tracker.ceph.com/issues/9025 Fixes: #9025

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-15 01:07:22 +02:00
Loic Dachary
164cfe8591 erasure-code: parse function for the mapping parameter
Each D letter is a data chunk. For instance:

    _DDD_DDD

is going to parse into:

   [ 1, 2, 3, 5, 6, 7 ]

the 0 and 4 positions are not used by chunks and do not show in the
mapping. Implement ErasureCode::parse to support a reasonable default
for the mapping parameter.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-15 01:07:22 +02:00
Loic Dachary
298da45c5c erasure-code: ErasureCodeInterface::get_chunk_mapping()
Add support for erasure code plugins that do not sequentially map the
chunks encoded to the corresponding index. This is mostly transparent to
the caller, except when it comes to retrieving the data chunks when
reading. For this purpose there needs to be a remapping function so the
caller has a way to figure out which chunks actually contain the data
and reorder them.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-15 01:07:21 +02:00
Sage Weil
8fb472995e Merge remote-tracking branch 'gh/next' 2014-08-14 16:02:22 -07:00
Somnath Roy
b24db81ea0 FileStore: Introduced a RLock instead of WLock
While calling index->collection_version, there is no need to
hold WLock at the index level. RLock should be sufficient.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 15:28:30 -07:00
Somnath Roy
3e7848d52b FileStore: No need to hold Index lock during omap calls
The Index lock is held during all the omap calls which is
not necessary.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 15:28:30 -07:00
Somnath Roy
cfff9f6ac3 FileStore: FDCache lookup is rearranged
In lfn_open() there is no point of building the Index if the
cache lookup is successful and caller is not asking for Index.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 15:28:30 -07:00
Somnath Roy
78d70daff4 FileStore: Index caching is introduced for performance improvement
IndexManager now has a Index caching. Index will only be created if not
found in the cache. Earlier, each op is creating an Index object and other
ops requesting the same index needed to wait till previous op is done.
Also, after finishing lookup, this Index object was destroyed.
Now, a Index cache is been implemented to persists these Indexes since
there is a major performance hit because each op is creating and destroying
these. A RWlock is been introduced in the CollectionIndex class and that is
responsible for sync between lookup and create.
Also, since these Index objects are persistent there is no need to use
smart pointers. So, Index is a wrapper class of CollecIndex* now.
It is the responsibility of the users of Index now to lock explicitely
before using them. Index object is sufficient now for locking and no need
to hold IndexPath for locking. The function interfaces of lfn_open,lfn_find
are changed accordingly.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 15:28:30 -07:00
Somnath Roy
b04d84db8c shared_cache: pass key (K) by const ref in interface methods
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 15:28:30 -07:00
Greg Farnum
95ac43f34c FileStore: remove the fdcache_lock
With the changes to the shared_cache, we no longer need the fdcache_lock
to prevent us from inserting a second fd for the same hobject into the cache.

Signed-off-by: Greg Farnum <greg@inktank.com>

Merged conflict fixed.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>

Conflicts:
	src/os/FileStore.cc
2014-08-14 15:28:30 -07:00
Greg Farnum
a9f76d4303 FDCache: implement a basic sharding of the FDCache
This is just a basic sharding. A more sophisticated implementation would
rely on something other than luck for keeping the distribution equitable.
The minimum FDCache shard size is 1.

Signed-off-by: Greg Farnum <greg@inktank.com>

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 15:26:37 -07:00
Greg Farnum
4c2828ed14 shared_cache: expose prior existence when inserting an element
The LRU now handles you attempting to insert multiple values for the
same key, by telling you that you've done so and returning the
existing value before it manages to muck up existing data.
The param 'existed' is not mandatory, default value is NULL.

Signed-off-by: Greg Farnum <greg@inktank.com>

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 14:11:12 -07:00
Sage Weil
435c6d6c4d Merge pull request #2235 from kevincox/wireshark
doc: Add documentation about Wireshark dissector.

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-14 13:50:04 -07:00
Yehuda Sadeh
a1e79dbb80 rgw_admin: add --min-rewrite-stripe-size for object rewrite
A new param to check whether the object has requires restriping,
checking whether a specific object stripe is bigger than the specified
size. By default it is set to 0, and in that case it'll always be
restriped. Having it set to 4M + 1 will make sure that only the objects
that weren't striped before (using default settings) will be restriped.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-08-14 13:45:37 -07:00
Kevin Cox
46d8c97e72 doc: Add documentation about Wireshark dissector.
Signed-Off-By: Kevin Cox <kevincox@kevincox.ca>
2014-08-14 16:42:56 -04:00
Yehuda Sadeh
6a555434ee rgw: fix compilation
RGWRadosPutObj couldn't refer to the ceph context.

Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-08-14 13:35:12 -07:00
Greg Farnum
f6771f2004 shared_cache: use a single lookup for lookup() too
We didn't convert this one to use iterators before.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-08-14 13:28:27 -07:00
Sage Weil
cec40dae17 qa/workunits/cephtool: verify setmaxosd doesn't let you clobber osds
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-14 13:18:07 -07:00
Anand Bhat
a1c3afb60a OSDMonitor: Do not allow OSD removal using setmaxosd
Description: Currently setmaxosd command allows removal of OSDs by providing
a number less than current max OSD number. This causes abrupt removal of
OSDs causing data loss as well as kernel panic when kernel RBDs are involved.
Fix is to avoid removal of OSDs if any of the OSDs in the range between
current max OSD number and new max OSD number is part of the cluster.

Fixes: #8865

Signed-off-by: Anand Bhat <anand.bhat@sandisk.com>
2014-08-14 12:58:50 -07:00
Yehuda Sadeh
16a43609ea rgw: pass set_mtime to copy_obj_data()
Sometimes we need to set the mtime when copying object data (e.g., when
we rewrite the obj).

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-08-14 11:38:13 -07:00
Yehuda Sadeh
800eff2482 rgw: copy_obj_data() uses atomic processor
Fixes: #9089

copy_obj_data was not using the current object write infrastructure,
which means that the end objects weren't striped.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-08-14 11:28:38 -07:00
Sage Weil
a8cabfa664 Merge pull request #2257 from ceph/wip-8784
rgw: call throttle_data() even if renew_state() failed

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-14 11:27:13 -07:00
Yehuda Sadeh
5d3a7e595f rgw: copy object data if target bucket is in a different pool
Fixes: #9039
Backport: firefly

The new manifest does not provide a way to put the head and the tail in
separate pools. In any case, if an object is copied between buckets in
different pools, we may really just want the object to be copied, rather
than reference counted.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-08-14 11:25:32 -07:00
Sage Weil
8393fdea40 Merge pull request #2251 from ceph/wip-9102
ceph-disk: linter cleanup

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-14 08:36:29 -07:00
Sage Weil
321d4defd4 Merge pull request #2255 from ceph/wip-9062
msg/PipeConnection: make methods behave on 'anon' connection

Reviewed-by: John Spray <john.spray@redhat.com>
2014-08-14 06:50:07 -07:00