Commit Graph

5520 Commits

Author SHA1 Message Date
Sage Weil
0a0c10f0ef Merge remote-tracking branch 'gh/next'
Conflicts:
	src/os/CollectionIndex.h
2014-08-26 17:33:59 -07:00
Samuel Just
e81723a3d4 Merge pull request #2330 from ceph/wip-9211
osd/OSDMap: encode blacklist in deterministic order

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-26 12:24:05 -07:00
Sage Weil
4672e50922 osd/OSDMap: encode blacklist in deterministic order
When we use an unordered_map the encoding order is non-deterministic,
which is problematic for OSDMap.  Construct an ordered map<> on encode
and use that.  This lets us keep the hash table for lookups in the general
case.

Fixes: #9211
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-26 08:16:29 -07:00
xinxin shu
dbe6c798bb don't update op_rw_rlatency/op_w_rlatency when rlatency is zero
Signed-off-by: xinxin shu <xinxin.shu@intel.com>
2014-08-26 14:19:13 +08:00
xinxin shu
f3bf24687f fix wrong value of op_w_latency perf counter
Fixes: #9217

Signed-off-by: xinxin shu <xinxin.shu@intel.com>
2014-08-26 14:19:03 +08:00
John Spray
a67421a5de osd: update handle_osd_map call
I had changed the implementation in Objecter
to avoid a spurious get/put cycle in "osdc/Objecter: fix resource
management", but this guy was still going a get() before
calling handle_osd_map.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-08-25 01:45:22 +01:00
Sage Weil
7a2ec05cc8 osd: include ETIMEDOUT in notify reply on timeout
If a notify operation times out (all watchers to not ACK in time), include
an ETIMEDOUT in the final error message back to the client, so that they
know about it.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-25 01:34:18 +01:00
Sage Weil
c7b7bdd994 osdc/Objecter: take over ownership of OSDMap
Instead of taking a pointer to an existing OSDMap in our constructor,
allocate our own, so that we completely own it.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-25 01:34:02 +01:00
Sage Weil
af15f9e52d osd/OSDMap: return const string from get_pool_name
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-25 01:34:01 +01:00
Sage Weil
1848e99083 osd/OSDMap: make lookup_pg_pool_name const
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-25 01:34:01 +01:00
Sage Weil
0c7dd662b6 osd: let Objecter dispatch directly
No need for our ObjecterDispatcher wrapper, now!

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-25 01:34:00 +01:00
Sage Weil
09a8543812 osdc/Objecter: make Objecter a Dispatcher
Note that it's not actually doing it yet, though!

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-25 01:34:00 +01:00
Yehuda Sadeh
9b811a33d5 objecter: split objecter initialization
Separate objecter initialization to non cluster related work (e.g.,
internal data structures, other registrations), and to operations that
can initiate cluster interaction. This is so that we don't hit a rare
race where we can get called indirectly from one of the dispatcher callbacks
e.g., into handle_osd_map() when not yet being initialized.
This requires that objecter->init() should be called before
messenger->add_dispatcher_head(), and objecter->start() after it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-08-25 01:34:00 +01:00
Yehuda Sadeh
09af405da2 osd: adapt to new Objecter interface
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-08-25 01:33:55 +01:00
Sage Weil
6cf583c4b7 common/shared_cache: take a cct
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-22 09:04:37 -07:00
Adam Crume
07ab36f9e7 lttng: Remove tracing-specfic local variables when lttng disabled
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:35 -07:00
Adam Crume
63273a282e lttng: Replace Boost dependencies with custom string code
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:32 -07:00
Adam Crume
2a11a5cc92 lttng: Disable LTTng by default, add --with-lttng configure option
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:31 -07:00
Adam Crume
772148e25d lttng: Remove 'ver' from trace in code for CEPH_OSD_OP_NOTIFY
'ver' is obsolete and variable exists only for proper deserialization

Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:28 -07:00
Adam Crume
e1e157fba2 lttng: Split up libtracepoints
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:28 -07:00
Adam Crume
e312be618f lttng: Trace ReplicatedPG::do_osd_ops
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:28 -07:00
Adam Crume
1802bc2535 lttng: Add rmw_flags to tracepoint in PG::queue_op
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:27 -07:00
Adam Crume
ae5994644c lttng: Trace OpRequest
Signed-off-by: Adam Crume <adamcrume@gmail.com>
2014-08-21 10:57:27 -07:00
Noah Watkins
33b87f9227 tracing: automake-ify tracepoint generation
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2014-08-21 10:57:27 -07:00
Noah Watkins
3ac99e3f72 lttng: add pg and osd tracepoints
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2014-08-21 10:57:27 -07:00
Pavan Rallabhandi
e45f5c2c33 TrackedOp:_dump_op_descriptor is renamed to _dump_op_descriptor_unlocked
Caller don't need to hold lock before calling _dump_op_descriptor(),so,
to reflect this it is renamed to _dump_op_descriptor_unlocked().

Signed-off-by: Pavan Rallabhandi <pavan.rallabhandi@sandisk.com>
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-20 11:41:43 -07:00
Guang Yang
ad6a2be402 Implement the collection hint transaction, add a new transation type as expected number of objects.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:10:47 +00:00
Guang Yang
7d266d1304 Add a new transaction OP (collection hint) to ObjectStore.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:08:51 +00:00
Guang Yang
da37273de7 Add a new field 'expected_num_objects' to pg_pool_t which denotes the expected number of objects on this pool.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:08:51 +00:00
Sage Weil
313e60b360 Merge pull request #2010 from ceph/wip-misplaced
osd: track misplaced objects separately from degraded objects

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-17 20:49:05 -07:00
Sage Weil
0e07f7f045 osd: fix theoretical use-after-free of OSDMap
In practice, the map will remain pinned for a while, but this
will make coverity happy.

*** CID 1231685:  Use after free  (USE_AFTER_FREE)
/osd/OSD.cc: 6223 in OSD::handle_osd_map(MOSDMap *)()
6217
6218           if (o->test_flag(CEPH_OSDMAP_FULL))
6219            last_marked_full = e;
6220           pinned_maps.push_back(add_map(o));
6221
6222           bufferlist fbl;
>>>     CID 1231685:  Use after free  (USE_AFTER_FREE)
>>>     Calling "encode" dereferences freed pointer "o".
6223           o->encode(fbl);
6224
6225           hobject_t fulloid = get_osdmap_pobject_name(e);
6226           t.write(coll_t::META_COLL, fulloid, 0, fbl.length(), fbl);
6227           pin_map_bl(e, fbl);
6228           continue;

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 14:51:31 -07:00
Sage Weil
5168907fe2 osd: track last_fullsized in pg_stat_t
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
dbc3f65046 osd: track last_undegraded pg stat
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
1907066fee osd/osd_types: add last_undegraded, last_undersized to pg_stat_t
Keep track of the last time the PG was known to not be degraded or
undersized.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:45 -07:00
Sage Weil
6d6767d34c osd/PG: track PG_STATE_UNDERSIZED separately from DEGRADED
DEGRADED means there are objects without complete reduncancy; also check
for needs_recovery().

UNDERSIZED means acting set is too small.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:18:54 -07:00
Sage Weil
b037e47a36 osd: add PG_STATE_UNDERSIZED
This is a distinct concept from degraded.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:18:54 -07:00
Sage Weil
6c0a213436 osd/PG: account for misplaces separately than degraded
A degraded object does not have enough replicas or shards, while a
misplaced object is not stored in the correct place.  Account for them
separately.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:18:54 -07:00
Sage Weil
a3149994e8 osd: num_objects_misplaced
Signed-off-by: Sage Weil <sage@inktank.com>
2014-08-16 13:18:53 -07:00
Sage Weil
34fe7a8214 Merge pull request #2217 from ceph/wip-problem-osds
mon: 'ceph osd blocked-by' for histogram of peers OSDs are waiting for

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-16 13:15:10 -07:00
Loic Dachary
082db05c81 Merge pull request #2269 from ceph/wip-osd-mon-feature
osd: fix mon feature requirement

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-08-16 00:19:59 +02:00
Sage Weil
1d0c66ae3a Merge remote-tracking branch 'gh/next' 2014-08-15 15:01:23 -07:00
Sage Weil
ae0b9f1776 osd: fix feature requirement for mons
These features should be set on the client_messenger, not
cluster_messenger.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 14:29:11 -07:00
Sage Weil
d9e96b1708 Merge pull request #2268 from ceph/wip-9119
Wip 9119

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-15 14:11:10 -07:00
Samuel Just
0db3e51165 ReplicatedPG::maybe_handle_cache: do not forward RWORDERED reads
Even with READFORWARD, we can't forward RWORDERED reads.

Fixes: #9119
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-08-15 14:04:20 -07:00
Samuel Just
5040413054 ReplicatedPG::cancel_copy: clear cop->obc
Otherwise, an objecter callback might still be hanging
onto this reference until after the flush.

Fixes: #8894
Introduced: 589b639af7
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-08-15 14:04:20 -07:00
Samuel Just
cab479367a Merge pull request #2070 from somnathr/wip-sd-filestore-optimization
Wip sd filestore optimization

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-15 13:37:54 -07:00
Sage Weil
16dadb86e0 osd: only require crush features for rules that are actually used
Often there will be a CRUSH rule present for erasure coding that uses the
new CRUSH steps or indep mode.  If these rules are not referenced by any
pool, we do not need clients to support the mapping behavior.  This is true
because the encoding has not changed; only the expected CRUSH output.

Fixes: #8963
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-15 08:55:27 -07:00
Loic Dachary
5c2d2320c0 erasure-code: remap chunks if not sequential
If the remap vector is not empty, use it to figure out the sequence of
data chunks.

http://tracker.ceph.com/issues/9025 Fixes: #9025

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-15 01:07:22 +02:00
Greg Farnum
4c2828ed14 shared_cache: expose prior existence when inserting an element
The LRU now handles you attempting to insert multiple values for the
same key, by telling you that you've done so and returning the
existing value before it manages to muck up existing data.
The param 'existed' is not mandatory, default value is NULL.

Signed-off-by: Greg Farnum <greg@inktank.com>

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-14 14:11:12 -07:00
Sage Weil
473f4bd395 Merge remote-tracking branch 'gh/next' 2014-08-13 14:10:31 -07:00