Commit Graph

32979 Commits

Author SHA1 Message Date
Sage Weil
66170f394d osd/osd_types: pg_interval_t: dump primary
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-21 21:26:26 -07:00
Sage Weil
000233f732 osd: change in up set primary constitutes a peering interval change
In several places, a change in the up_primary triggers a new peering
interval, but the palces that actually generate the new past intervals,
including check_new_interval(), did not enforce that.  This becomes
somewhat obvious when you see that those callers are ignoring the
up_primary output argument for pg_to_up_acting_osds().

Fix this by adding arguments to check_new_interval and fixing the callers
to pass them in properly.  Add a unit test case to verify this.

Note that the past interval struct itself does not record who the
up_primary was; possibly it should.

Fixes: #8139
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-21 21:26:25 -07:00
Sage Weil
5562e26e65 osd: use parent pgid (as appropriate) in generate_past_intervals()
Feed in the ancestor pg_t (if any) when we are looking at intervals for
previous maps that may have preceded a recent split.

Fixes: #8139
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-21 21:26:07 -07:00
Sage Weil
8fb2388d82 osd_types: pg_t: add get_ancestor() method
Give us the ancestor for when the pool had a past value for pg_num.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-17 21:17:40 -07:00
Samuel Just
79e7db7505 Merge pull request #1688 from ceph/wip-8048
osd/ReplicatedPG: check clones for degraded

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-17 13:18:21 -07:00
Sage Weil
ac014510ff Merge pull request #1685 from ceph/wip-8132
mon: set leader commands prior to first election

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-04-17 13:18:01 -07:00
Sage Weil
3d0e80acd9 osd/ReplicatedPG: check clones for degraded
We check whether the head is degraded, and we check whether a clone is
unreadable, but in the case where we have a cache op on a degraded object,
we don't check.  That leads to an assert when the repop hits the replica
and the object is in the peer's missing set.

Fix this by adding a check on the clone when write_ordered is true.  Note
that checking write_ordered is better than whether it is a cache op because
we want to preserve write ordering even for reads that are flagged by the
client.

Fixes: #8048
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-17 13:11:54 -07:00
Sage Weil
224a0f5749 Merge pull request #1674 from ceph/wip-8086
ReplicatedPG::agent_work: skip hitset objects before getting object cont...

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-17 12:49:58 -07:00
Yehuda Sadeh
26f4d5b061 Merge pull request #1687 from ceph/wip-8130
osdc/Objecter: fix osd target for newly-homeless op

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2014-04-17 10:50:40 -07:00
Sage Weil
93c0515fd9 osdc/Objecter: fix osd target for newly-homeless op
If we recalculate the mapping and find that there is no primary, we need
to set the 'osd' field to -1.  Otherwise, the caller will try to resend
to a dead session with bad results.

This was introduced in the refactor 860d72770c.

Fixes: #8130
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-17 10:48:26 -07:00
Sage Weil
fe71a12d78 Merge pull request #1684 from onlyjob/debian
spelling corrections

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-17 10:07:40 -07:00
Sage Weil
b0338ca361 Merge pull request #1671 from ceph/wip-7699
mds: Fix respawn (add path resolution)

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-17 10:05:22 -07:00
Sage Weil
3a794d5fe1 Merge pull request #1677 from ceph/wip-poolset-noblock
mon: Don't block on EAGAIN from `osd pool set`

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-17 10:03:26 -07:00
Sage Weil
881680ee93 mon: set leader commands prior to first election
If we have just started and receive a command, we currently will reply with
EINVAL because the leader commands are empty.  Note that this race is very
difficult to reach because the (old) peon needs to forward a command to
the mon while it still thinks it has quorum, and the message needs to get
sent after the leader mon has restarted and reset its connection but before
it has declared a new election.

To fix this, we should assume at startup time that our commands are
valid.  If it is an internal command that does not require quorum, that
is fine.  If it does require quorum, we will retry the command after the
election completes and we will revalidate the command then.

Fixes: #8132
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-17 09:33:44 -07:00
John Spray
40e8dbbb6b mon: EBUSY instead of EAGAIN when pgs creating
In 69321bf, EAGAIN changed behaviour to block indefinitely
rather than returning to user.  Change the return for
`osd pool set` operations that are blocked by creating PGs
to return EBUSY instead of EAGAIN, so that they are excepted
from this blocking behaviour.

Signed-off-by: John Spray <john.spray@inktank.com>
2014-04-17 15:28:22 +01:00
Gregory Farnum
2e375b6f4e Merge pull request #1675 from guangyy/wip-bench
Make rados/rest bench work for multiple write instances without metadata conflict.

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-16 21:57:41 -07:00
Dmitry Smirnov
f22e2e9a02 spelling corrections 2014-04-17 12:43:30 +10:00
Samuel Just
75a5bd5d49 Merge pull request #1681 from ceph/wip-8043
mon/OSDMonitor: require force argument to split a cache pool

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-16 18:16:11 -07:00
Sage Weil
6d58e3c9fc Merge pull request #1682 from ceph/wip-8020
OSD: split pg stats during pg split

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-16 18:13:01 -07:00
Samuel Just
18caa1cd8f OSD: split pg stats during pg split
Fixes: #8020
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-16 18:10:02 -07:00
Samuel Just
5e4a5dc6ea osd_types::osd_stat_sum_t: fix floor for num_objects_omap
Introduced in a130a4452e
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-16 18:08:35 -07:00
David Zafman
a3d759ebdd Merge branch 'wip-8100'
Reviewed-by: Mark Nelson <mark.nelson@inktank.com>
2014-04-16 15:09:12 -07:00
David Zafman
a3d452acdf common/obj_bencher: Fix error return check from read that is negative on error
Fixed read return value in d99f1d9f68

Fixes: #8100

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-16 15:06:55 -07:00
Sage Weil
24da7d0c17 Merge pull request #1680 from ceph/wip-7786
civetweb: update subproject
2014-04-16 11:49:58 -07:00
David Zafman
4db1984c2b osd/ReplicatedPG: add missing whitespace in debug output
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-16 11:08:23 -07:00
Guang Yang
8c7a5ab861 Use string instead of char* when saving arguments for rest-bench 2014-04-16 01:28:16 +00:00
Sage Weil
015df934af mon/OSDMonitor: require force argument to split a cache pool
There are several perils when splitting a cache pool:

 - split invalidstes pg stats, which disables the agent
 - a scrub must be manually triggered post-split to rebuild stats
 - the pool may fill the OSDs during that period.
 - or, the pool may end up beyond the 'full' mark and once scrub does
   complete and the agent activate we may block IO for a long time while
   we catch up with flush/evict

Make it a bit harder for users to shoot themselves in the foot.

Fixes: #8043
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-15 13:57:21 -07:00
John Spray
aa6df59e99 mds: Fix respawn (add path resolution)
Previously assumed that ceph-mds executable was in
PWD - now use /proc/self/exe to find the
executable whereever it may be.  Leave in old version
as a fallback for non-linux environments.

Also add a 'respawn' command so that it's easy to test
respawn with `ceph mds tell <id> respawn`

Fixes: #7966
2014-04-15 12:28:09 +01:00
Guang Yang
308758b787 Make rados/rest bench work for multiple write instances without metadata conflict.
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
2014-04-15 07:48:37 +00:00
Yan, Zheng
908fa5edcc Merge pull request #1666 from ceph/wip-mds
Wip mds
2014-04-15 08:13:01 +08:00
Samuel Just
9f6f7d35a1 Merge pull request #1673 from ceph/wip-stress-watch
ceph_test_stress_watch: test over cache pool

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-14 16:12:31 -07:00
Samuel Just
895b6d4d58 Merge pull request #1667 from ceph/wip-8089
osd: fix dup request ahndling for ENOENT and cache ops

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-14 16:11:47 -07:00
Samuel Just
898ee4894e Merge pull request #1654 from ceph/wip-7940
Wip 7940

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-14 16:10:42 -07:00
Samuel Just
cab29ac19d Merge pull request #1664 from ceph/wip-8085
osd: handle misdirected pg command

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-14 16:09:50 -07:00
Samuel Just
a2323a61ce Merge pull request #1660 from ceph/wip-hitset-missing
osd: handle hitset-get on a missing hit_set object

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-14 16:07:41 -07:00
Sage Weil
37ed4b60ba ceph_test_stress_watch: test over cache pool
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-14 15:57:28 -07:00
Josh Durgin
4388d876ca Merge pull request #1661 from ceph/wip-objecter
objecter: make linger watch the correct pool/object

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-14 15:36:10 -07:00
Josh Durgin
ab4a35f75e Merge pull request #1672 from ceph/wip-strerror
Use cpp_strerror() wherever possible, and use autoconf for portability

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-14 13:57:36 -07:00
Dan Mick
d0a7632a31 Use cpp_strerror() wherever possible, and use autoconf for portability
strerror_r is not portable; on Gnu libc it returns char * and sometimes
does not fill in the supplied buffer.  Use autoconf to test which
version this platform uses and adapt.

Clean up the random calls to strerror and strerror_r (along with all
their private little one-use buffers) and regularize the code to use
cpp_strerror almost everywhere.  Where changed, any negation of the
error code is also removed, since cpp_strerror() will do that.

Note: some tools were using their own calls to strerror/strerror_r, so
will now get a (%d) in their output that wasn't there before; hence
the change to test/cli/monmaptool/print-nonexistent.t

Fixes: #8041
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2014-04-14 13:07:17 -07:00
Josh Durgin
29d83fef77 Merge pull request #1668 from ceph/wip-librados-tests
ceph_test_rados_api_*: fix build warnings and memset ranges

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-14 11:44:34 -07:00
Samuel Just
502cc61406 ReplicatedPG::agent_work: skip hitset objects before getting object context
Otherwise, we might read the attr on a hitset object we are in the
process of deleting.

Fixes: #8086
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-14 11:08:00 -07:00
Loic Dachary
64cd332e6e Merge pull request #1622 from dachary/wip-mailmap
mailmap updates

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
2014-04-14 11:52:17 +02:00
Yan, Zheng
7c17fc4ae6 mds: don't modify inode when calculating client ranges
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-14 17:31:41 +08:00
Yan, Zheng
3dd88006b0 Merge pull request #1669 from ceph/wip-client-debug
client: print inode max_size
2014-04-14 16:39:40 +08:00
Yan, Zheng
65ec24e392 client: print inode max_size
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-14 16:38:22 +08:00
Sage Weil
d6c71b7624 osd/ReplicatedPG: add missing whitespace in debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-13 21:59:23 -07:00
Sage Weil
171d5c50f6 ceph_test_rados_api_*: fix build warnings, memset ranges
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-13 21:37:31 -07:00
Sage Weil
8905e3e228 osd/ReplicatedPG: handle dup ops earlier in do_op
Current the dup op checks happen in execute_ctx, long after we handle
cache ops or get the obc and (potentially) return ENOENT.  That means that
object deletions and cache ops both aren't properly idempotent.

This is easy to fix by moving the check earlier in do_op.

Fixes: #8089
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-13 21:33:16 -07:00
Yan, Zheng
26659a5ae4 mds: don't issue/revoke caps before client has caps
If early reply is not allowed, MDS does not send reply to client immediately
after Locker::issue_new_caps adds new caps. So MDS can revoke the caps before
sending reply to client.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-13 20:38:04 +08:00
Yan, Zheng
bd8aa6f46e mds: do file recover after authpin inode
MDCache::do_file_recover may call Locker::evel_gather, which may change
filelock to stable state. So we should authpin the inode (for unstable
lock state) first.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-13 20:33:43 +08:00