Commit Graph

32027 Commits

Author SHA1 Message Date
Samuel Just
a576eb3204 PG: do not serve requests until replicas have activated
There are two problems:
1) We choose the min last_update amoung peers with the max local-les
value as an upper bound on requests which could have been reported to
the client as committed.  We then, for ec pools, roll back to that point
to ensure that we don't inadvertently commit to an update which fewer
than K replicas actually saw.  If the primary sets local-les, accepts an
update from a client, and there is a new interval before any of the
replicas have been activated, we will end up being forced to use that
update which no other replica has seen as the new last_update.  This
will cause the object to become unfound.  We don't have this problem as
long as all active replicas agree on last_update before we accept IO.

2) Even for replicated pools, we would then immediately respond to the
request which created the primary-only update with a commit since it is
in the log and we have no outstanding repops.  If we then lose that
primary before any of the replicas in the new interval record the new
log, we will not only lose the object, but also the log entry recording
it, which will result in a lost write.

For these reasons, it seems like we need to wait for the replicas to
activate before we can process new requests essentially because whatever
update we select as last_update is essentially regarded as committed as
soon as we accept IO.

Fixes: #7649
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-12 10:38:17 -07:00
Samuel Just
83731a75d7 ReplicatedPG::finish_ctx: clear object_info if !obs.exists
Otherwise, we see a different object_info_t depending on whether the
transaction deleting the object clears before another op recreating it appears.
In particular, we use oi.version to set the prior_version on the log entries in
finish_ctx.  If the oi is allowed to stick around the recreation log event will
have a prior version of the deletion event when it should have a prior version
of eversion_t().

Fixes: #7655
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-09 12:22:12 -07:00
Sage Weil
40dc3f8b2c Merge pull request #1405 from ceph/wip-7575
osd: Add hit_set_flushing to track current flushes and prevent races

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-09 12:21:35 -07:00
Danny Al-Gaaf
a7afa1453b config.cc: add debug_ prefix to subsys logging levels
Add debug_ prefix also for 'ceph --admin-daemon *.asok config show'
as already done e.g. by 'ceph-osd --show-config'.

Fixes: #7602

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-09 10:32:39 -07:00
Sage Weil
2474e5322d Merge pull request #1408 from ceph/wip-da-fix-doc
Fixes and updates for doc

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-09 09:56:18 -07:00
Danny Al-Gaaf
54ffdcc45d get-involved.rst: update information
Added #ceph-devel IRC channel, more mailing lists, wiki and planet.ceph.com.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 02:18:28 +01:00
Danny Al-Gaaf
d1a888e0f2 swift/containerops.rst: fix some typos
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 01:02:43 +01:00
Danny Al-Gaaf
93b95a2874 radosgw/troubleshooting.rst: s/ceph-osd/OSD/
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 00:58:57 +01:00
Danny Al-Gaaf
2223a372d6 radosgw/config-ref.rst: fix typo
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 00:30:49 +01:00
Danny Al-Gaaf
87618d4508 session_authentication.rst: fix some typos
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 00:19:08 +01:00
Danny Al-Gaaf
682c695898 release-process.rst: fix some typos
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 00:07:39 +01:00
Danny Al-Gaaf
72ee3389af doc: s/osd/OSD/ if not part of a command
First attempt to unify usage of OSD over rst files.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-09 00:01:40 +01:00
Danny Al-Gaaf
e666019434 doc/dev/logs.rst; fix some typos
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 23:31:11 +01:00
Danny Al-Gaaf
bbd1c4bab5 filestore-filesystem-compat.rst: fix typo
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 23:25:53 +01:00
Danny Al-Gaaf
ae123a6dd5 corpus.rst: fix typo
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 23:22:38 +01:00
Danny Al-Gaaf
cf9f017d4e config.rst: fix typo
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 23:16:24 +01:00
Danny Al-Gaaf
5aaecc7210 cephx_protocol.rst: fix typo
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 23:11:25 +01:00
Danny Al-Gaaf
2cbb0a402b architecture.rst: fix typos
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 11:27:15 +01:00
Danny Al-Gaaf
a4cbb192ab rados/operations/control.rst: fix typo
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-03-08 11:13:52 +01:00
Sage Weil
db0c37829c Merge remote-tracking branch 'gh/wip-7210' into firefly
Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-07 15:23:31 -08:00
Sage Weil
1c8c61897d qa/workunits/cephtool/test.sh: fix 'osd thrash' test
- fix the wait check for osds to come back up
- make sure they get marked back in, too

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2014-03-07 15:21:38 -08:00
Sage Weil
20754779ab Merge pull request #1403 from ceph/wip-7642
mon: fix check for primary-affinity feature bit, and fix a race in similar checks

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-03-07 15:05:30 -08:00
Sage Weil
b62f9f076a mon/OSDMonitor: feature feature check bit arithmetic
Make sure all features are present (instead of just any of them).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-07 14:44:42 -08:00
Sage Weil
21c225959d Merge pull request #1404 from ceph/wip-7652
mon: fix infininte pg create msgs for erasure pools

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-07 14:19:58 -08:00
Sage Weil
8d52fb70e1 mon/PGMap: send pg create messages to primary, not acting[0]
For erasure pools, these may not match.

In the case of #7652, this caused pg_create messages to be send
indefinitely.  register_pg() added it to the list for acting_primary, and
when we got the (non-creating) pg stat update we removed it from the list
for acting[0].

Fixes: #7652
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-07 14:02:26 -08:00
Sage Weil
c8b34f19b3 mon/PGMonitor: improve debugging on PGMap updates slightly
Chasing #7652
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-07 13:56:31 -08:00
Sage Weil
819cce2d41 mon/OSDMonitor: make osdmap feature checks non-racy
The check for OSD features may race with the boot of an OSD that does not
have the necessary features.  Check the pending info too, and if there is
a missing feature, return -EAGAIN.  In the callers, wait on -EAGAIN.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-07 13:29:15 -08:00
Sage Weil
b9bcc1590c mon/OSDMonitor: prevent set primary-affinity unless all OSDs support it
Make sure all running OSDs support the feature before we start using it
(even if the config option is on!).

Fixes: #7642
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-07 13:29:15 -08:00
Joao Eduardo Luis
38fd666ac6 qa: workunits/mon/rbd_snaps_ops.sh: ENOTSUP on snap rm from copied pool
'rados cppool' copies the contents but that doesn't make the destination
pool an unmanaged snaps pool.  Therefore, we must get an ENOTSUP when
we try to remove an unmanaged snap from a not-unmanaged pool.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-03-07 19:49:56 +00:00
Joao Eduardo Luis
c13e1b7929 mon: OSDMonitor: don't remove unamanaged snaps from not-unmanaged pools
Although we should allow creating unmanaged snaps on not-unamanaged pools,
as long as those pools don't have any managed snapshots in them, we cannot
allow removal -- because the pool will not have any unmanaged snapshots.

Fixes: 7210

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-03-07 19:49:50 +00:00
David Zafman
135c27ec74 osd: Add hit_set_flushing to track current flushes and prevent races
When flushing a HitSet track in hit_set_flushing map so that
agent_load_hit_sets() doesn't try to read it too soon.

Fixes: #7575

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-03-07 11:48:21 -08:00
Sage Weil
8221a8ecba Merge pull request #1394 from ceph/wip-7610
obj_bencher: allocate contentsChars to object_size, not op_size

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 21:11:25 -08:00
Sage Weil
23db6782bb Merge pull request #1397 from ceph/wip-7638
ReplicatedPG::trim_object: use old_snaps for rollback

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 20:06:59 -08:00
Sage Weil
4a0c3a6673 Merge pull request #1398 from ceph/wip-7634
ReplicatedPG: use hobject_t for snapset_contexts map

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 20:05:19 -08:00
Samuel Just
0037ee4550 Merge pull request #1395 from ceph/wip-7637
osd: fix agent thread shutdown

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-06 19:19:12 -08:00
Sage Weil
09668a4958 osd: fix agent thread shutdown
We had an old invariant that agent_queue would have at least 1 entry in
it to simplify some other code paths, but it turns out that it is simpler
not to do that.

In particular, this was triggering a failed assertion on shutdown when we
assert that the queue is empty.

Dump offending items on shutdown if they are there, tho, to catch any
future bugs.

Fixes: #7637
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-06 16:12:30 -08:00
Samuel Just
06b96ffdc8 Merge pull request #1389 from ceph/wip-firefly-misc
fix rest tests; fix COLL_MOVE_RENAME dump

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-06 15:51:40 -08:00
Sage Weil
d4b4468c88 Merge pull request #1393 from dachary/wip-7072
logrotate: copy/paste daemon list from ceph-*-all-starter.conf

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 15:51:15 -08:00
Loic Dachary
7411c3c6a4 logrotate: copy/paste daemon list from *-all-starter.conf
Each upstart/*-all-starter.conf use the same script to find the list of
daemons and their ids. Copy it over to the corresponding logrotate.conf
script instead of using a less reliable script based on initctl list
output.

If logrotate fails to run initctl reload on a daemon, it will keep
writing to the rotated log file, even after it is deleted and until it
fills the disk. By using the exact same shell snippet as the upstart
scripts used to start the daemon, all of them will be sent the HUP
signal and reopen the log file that was just rotated.

http://tracker.ceph.com/issues/7072 fixes #7072

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-03-07 00:47:58 +01:00
Sage Weil
6f7c8c79f5 Merge pull request #1392 from ceph/wip-7632
ReplicatedPG: consistently use ctx->at_version.version for stashed objec...

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 15:34:59 -08:00
Sage Weil
57c7e19819 Merge pull request #1391 from ceph/wip-7393
ReplicatedPG: clean up num_dirty adjustments

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 15:30:59 -08:00
Samuel Just
b6872b255c ReplicatedPG::trim_object: use old_snaps for rollback
We need to rollback the old value of snaps, not the
new one.

Fixes: #7638
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-06 15:01:04 -08:00
Samuel Just
b5b67d19aa ReplicatedPG: use hobject_t for snapset_contexts map
Otherwise, two objects with different namespaces but
the same object_t will end up clobbering each other's
contexts.

Fixes: #7634
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-06 14:40:12 -08:00
Sage Weil
b436930779 qa/workunits/rest/test.py: do not test 'osd thrash'
This wreaks havoc on our QA because it marks osds up and down and then
immediately after that we try to scrub and some osds are still down.

Adjust the CLI test to wait for all OSDs to come back up after thrashing.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-06 13:46:10 -08:00
Sage Weil
237f0fb455 os/ObjectStore: dump COLL_MOVE_RENAME
This got missed way back in ef7cffc34f
(pre-0.71).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-06 13:44:39 -08:00
Samuel Just
f888ab41bd ReplicatedPG: consistently use ctx->at_version.version for stashed object
Otherwise, two ops might end up using the same version number.

Fixes: #7632
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-06 12:11:31 -08:00
Samuel Just
eca7e633c8 ReplicatedPG: clean up num_dirty adjustments
Previously, a _delete_head() followed by a recreation on an object in
the same transaction would result in num_dirty being decremented in
_delete_head() without the flag being cleared.  make_writeable() would
then see exists and was_dirty and therefore not increment num_dirty
resulting in a mismatch.  Rather than trying to maintain the num_dirty
number in _delete_head(), rollback_to(), and make_writeable(), it seems
simpler to do the adjustment once in make_writeable based on undirty,
ctx->obc->obs.oi, and ctx->new_obs->oi.

Fixes: 7393
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-06 12:05:10 -08:00
Samuel Just
d171418058 obj_bencher: allocate contentsChars to object_size, not op_size
Otherwise, our attempt to sanitize object_size bytes of
data.object_contents will be doomed to memory corruption.

Fixes: #7610
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-06 11:12:25 -08:00
Sage Weil
7403b23544 Merge pull request #1386 from ceph/wip-7624
ReplicatedPG: ensure clones are readable after find_object_context

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 11:01:30 -08:00
Sage Weil
cf2f3adfa6 Merge pull request #1387 from ceph/wip-7618
ReplicatedPG::wait_for_degraded_object: only recover if found

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-06 10:59:47 -08:00