Commit Graph

24484 Commits

Author SHA1 Message Date
David Zafman
6a3aa2a2cc Missed adding rados_types.hpp to package
Caused by 3bd48cbbad
feature 4207 implementation

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit e1e2d5d217)
2013-03-19 13:02:00 -07:00
Sage Weil
efc4b1268e mon/Paxos: set state to RECOVERING during restart
This ensures that the paxos state is not active when the PaxosService
restart() methods run right afterwards, and that EAGAIN waiters will get
requeued appropriately.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-19 10:15:41 -07:00
Sage Weil
bee5046333 mon/PaxosService: handle non-zero return values
If 7aec13f749 we started passing non-zero
return values to these completions; now we have to deal with them
accordingly.

RetryMessage behaves just like the Monitor variant.

Propose and Committed update state but otherwise ignore non-zero
return values.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-18 23:09:51 -07:00
Sage Weil
7aec13f749 mon/PaxosService: fix proposal waiter handling
- Cancel the propsal waiters with EAGAIN on election, etc.
- Drop the wakeup helper and open-code the one caller.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-03-18 21:00:06 -07:00
Josh Durgin
35ab2a4189 Merge branch 'wip-rbd-import' into next
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-03-16 15:45:58 -07:00
Samuel Just
de8edb732e FileJournal: queue_pos \in [get_top(), header.max_size)
If queue_pos == header.max_size when we create the entry
header magic, the entry will be rejected at get_top() on
replay.

Fixes: #4436
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-15 11:07:12 -07:00
Samuel Just
f1b031b3cf OSD: expand_pg_num after pg removes
Otherwise:
1) expand_pg_num removes a splitting pg entry
2) peering thread grabs pg lock and starts split
3) OSD::consume_map grabs pg lock and starts removal

At step 2), we run afoul of the assert(is_splitting)
check in split_pgs.  This way, the would be splitting
pg is marked as removed prior to the splitting state
being updated.

Backport: bobtail
Fixes: #4449
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-15 11:07:10 -07:00
Samuel Just
8222cbc8f3 PG: ignore non MISSING pg query in ReplicaActive
1) Replica sends notify
2) Prior to processing notify, primary queues query to replica
3) Primary processes notify and activates sending MOSDPGLog
to replica.
4) Primary does do_notifies at end of process_peering_events
and sends to Query.
5) Replica sees MOSDPGLog and activates
6) Replica sees Query and asserts.

In the above case, the Replica should simply ignore the old
Query.

Fixes: #4050
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-15 11:07:07 -07:00
Sage Weil
11650c5a8c mon: only try to bump max if leader
I broke this in 4637752db6 when I
restructured this function.  Only try to increase the max if we are
the leader.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-14 21:10:46 -07:00
Gary Lowell
6f15dba931 debian/control: Fix for moved file
The ceph-mds.conf file moced from the ceph package to the
ceph-mds package.  Add replaces/breaks statements to the
control file to handle this on upgrade.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-03-14 17:16:24 -07:00
Sage Weil
efd153e9e2 debian: add start ceph-mds-all on ceph-mds install
This ensures that when we then start individual mds instances, we can
stop ceph-mds-all and they will get stopped.  We do the same already for
ceph-all.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 41897fcba1)
2013-03-14 12:33:57 -07:00
Gary Lowell
ce7ee72792 ceph_common.sh: Fix sed regex in get_local_daemon_list
In get_local_daemon_list() the sed expression trimming the cluster
name from the host name was trimming too much if the host name
contained hyphens.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-03-14 10:02:14 -07:00
Josh Durgin
8c774aace2 rbd: clean up do_import() a bit
Move declarations above error conditons so we can goto done almost
everywhere. Remove cpp_strerror printing, since it will be done by the
caller.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-14 09:22:58 -07:00
Josh Durgin
3091283895 rbd: remove fiemap use from import
On some kernels and filesystems fiemap can be racy and provide
incorrect data even after an fsync. Later we can use SEEK_HOLE and
SEEK_DATA, but for now just detect zero runs like we do with stdin.

Basically this adapts import from stdin to work in the case of a file
or block device, and gets rid of other cruft in the import that used
fiemap.

Fixes: #4388
Backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-13 17:28:35 -07:00
Sage Weil
4637752db6 mon: simplify assign_global_id()
Simplify the logic a bit so it is easier to follow.

Small behavior change: we will successfully allocate and return a gid that
== the max when we can't bump it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-13 15:38:48 -07:00
Joao Eduardo Luis
436e5be950 mon: AuthMonitor: don't return global_id right away if we're increasing it
This only happens on the Leader and leads to duplicate global_ids.

Fixes: #4285

Signed-off-by: Joao Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-13 15:38:43 -07:00
Joao Eduardo Luis
b99367bfb2 mon: Paxos: only finish a queued proposal if there's actually *any*
When proposing an older value learned during recovery, we don't create
a queued proposal -- we go straight through Paxos.  Therefore, when
finishing a proposal, we must be sure that we have a proposal in the queue
before dereferencing it, otherwise we will segfault.

Fixes: #4250

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-03-13 15:03:00 -07:00
Yehuda Sadeh
88725316dd rgw: set up curl with CURL_NOSIGNAL
Fixes: #4425
Backport: bobtail
Apparently, libcurl needs that in order to be thread safe. Side
effect is that if libcurl is not compiled with c-ares support,
domain name lookups are not going to time out.
Issue affected keystone.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-03-13 10:45:07 -07:00
Josh Durgin
1ac8f6abaf Merge branch 'wip-rbd-flatten-cache' into next
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-03-11 16:57:47 -07:00
Sage Weil
9eb0d91b86 debian: stop ceph-mds before uninstalling ceph-mds
Fixes: #4384
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-11 17:09:37 -07:00
Josh Durgin
46e8fc00b2 librbd: invalidate cache when flattening
The cache stores which objects don't exist. Flatten bypasses the cache
when doing its copyups, so when it is done the -ENOENT from the cache
is treated as zeroes instead of 'need to read from parent'.

Clients that have the image open need to forgot about the cached
non-existent objects as well. Do this during ictx_refresh, while the
parent_lock is held exclusively so no new reads from the parent can
happen until the updated parent metadata is visible, so no new reads
from the parent will occur.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-11 15:49:55 -07:00
Josh Durgin
f2a23dc0b0 ObjectCacher: add a method to clear -ENOENT caching
Clear the exists and complete flags for any objects that have exists
set to false, and force any in-flight reads to retry if they get
-ENOENT instead of generating zeros.

This is useful for getting the cache into a consistent state for rbd
after an image has been flattened, since many objects which previously
did not exist and went up to the parent to retrieve data may now exist
in the child.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-11 15:49:46 -07:00
Josh Durgin
f6f876fe51 ObjectCacher: keep track of outstanding reads on an object
Reads always use C_ReadFinish as a callback (and they are the only
user of this callback). Keep an xlist of these for each object, so
they can remove themselves as they finish. To prevent racing requests
and with discard removing objects from the cache, clear the xlist in
the object destructor, so if the Object is still valid the set_item
will still be on the list.

Make the ObjectCacher constructor take an Object* instead of the pool
and object id, which are derived from the Object* anyway.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-11 15:44:32 -07:00
Sage Weil
2450db1b92 Revert "Use start-stop-daemon --chuid option to setuid to www-data."
This reverts commit a99ed038ec.

On second thought, this will require a bit more care to ensure that all
of the paths radosgw needs to read/write from have the correct permissions
in the packages and so forth.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-11 13:23:13 -07:00
Joao Eduardo Luis
a52c48d639 mon: Paxos: increase trim tolerance from 3 to 30.
This increase only means that we'll keep more versions around before we
trim.  It doesn't change the number of versions we'll keep around after
trimming (that's still as much as 'paxos_max_join_drift', i.e. 10), nor
does it change the criteria used to consider a monitor as having drifted
(same rule applies, 'paxos_max_join_drift').

This change however will enable the leader to put off trimming for a longer
period of time, giving a better chance for a monitor to join the cluster.
See, after going through the probing phase, at which point a monitor may
only be, say, 5 versions off, the same monitor may end up getting into the
quorum only to find that in-between probing and finally triggering an
election some 6 versions might have come to existence.  Before this patch,
by then the state had been trimmed and the monitor would have to bootstrap
to perform a full store sync.  With this patch in place, the monitor would
be able to sync the remaining 11 versions.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-11 12:50:43 -07:00
Joao Eduardo Luis
41987f380f mon: Paxos: bootstrap leader if he has fallen behind upon reaching collect
Fixes: #4256

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-11 12:50:24 -07:00
Sage Weil
263bff8d4b Merge pull request #100 from jaharkes/init
Fixes for RadosGW init script

Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-11 12:00:52 -07:00
Jan Harkes
a99ed038ec Use start-stop-daemon --chuid option to setuid to www-data.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
2013-03-11 12:26:02 -04:00
Jan Harkes
44f1cc5bc4 Fix radosgw actually reloading after rotating logs.
The --signal argument to Debian's start-stop-daemon doesn't
make it send a signal, but defines which signal should be send
when --stop is specified.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
2013-03-11 12:26:02 -04:00
Sage Weil
07820f032f mon/MonMap: don't crash on dup IP in mon host
If the mon_host line has an IP twice, we shouldn't crash.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-09 22:12:00 -08:00
Dan Mick
c7aa897ce0 ceph_common.sh: add warning if 'host' contains dots
This is a common error and there's no reason the script can't
at least tell you it's a really bad idea.  One might argue it
could even successfully proactively truncate the host parameter
at the first dot, but that's a little controlling, perhaps.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-08 17:10:11 -08:00
Samuel Just
31a110c5d0 Merge branch 'wip-osd-map' into next
Fixes: 4369
Backport: bobtail
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-03-08 12:45:41 -08:00
Sage Weil
881e9d850c osd: mark down connections from old peers
Close out any connection with an old peer.  This avoids a race like:

- peer marked down
- we get map, mark down the con
- they reconnect and try to send us some stuff
- we share our map to tell them they are old and dead, but leave the con
  open
...
- peer marks itself up a few times, eventually reuses the same port
- sends messages on their fresh con
- we discard because of our old con

This could cause a tight reconnect loop, but it is better than wrong
behavior.

Other possible fixes:
 - make addr nonce truly unique (augment pid in nonce)
 - make a smarter 'disposable' msgr state (bleh)

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-08 12:40:36 -08:00
Sage Weil
ba7e815a18 osd/PG: rename require_same_or_newer_map -> is_same_or_newer_map
This avoids confusion with the OSD method of the same name, and better
matches what the function tests (and does not do).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-08 12:40:36 -08:00
Yehuda Sadeh
4384e59ad0 rgw: set attrs on various list bucket xml results (swift)
Fixes: #4247
The list buckets operation was missing some attrs on the different
xml result entities. This fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-03-08 08:31:42 -08:00
Yehuda Sadeh
7cb6ee2807 formatter: add the ability to dump attrs in xml entities
xml entities may have attrs assigned to them. Add the ability
to set them. A usage example:

formatter->open_array_section_with_attrs("container",
     FormatterAttrs("name", "foo", NULL));

This will generate the following xml entity:
<container name="foo">

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-03-08 08:31:37 -08:00
Yehuda Sadeh
6669e73fa5 rgw: don't iterate through all objects when in namespace
Fixes: #4363
Backport: argonaut, bobtail
When listing objects in namespace don't iterate through all the
objects, only go though the ones that starts with the namespace
prefix

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-03-08 06:55:03 -08:00
Sage Weil
1e2864a020 osd: increate default pg log size from 1000 -> 3000
This reduces the probability that we will fail to detect a dup op.  See
#4368.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-07 10:31:27 -08:00
Sage Weil
e19b8f5fb0 Merge remote-tracking branch 'gh/wip-log-max' into next
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-07 09:29:44 -08:00
Josh Durgin
e6caf69cf4 config: note which options are overridden by common_preinit()
Defaults for these differ based on the context in which they're used.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-06 17:49:57 -08:00
Josh Durgin
7c208d2f8e common: reduce default in-memory logs for non-daemons
The default of 100000 can result in hundreds of MBs of extra memory
used. This was most obvious when using librbd with caching enabled,
since there was a dout(0) accidentally left in the ObjectCacher.

refs: #4352
backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-06 17:47:28 -08:00
Sage Weil
a58eec90ca init-ceph: fix run dir
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-06 17:09:58 -08:00
Sage Weil
de2c5b3fb7 osd: add ctor for clone_info
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 439d0e334d)
2013-03-06 15:13:16 -08:00
Josh Durgin
cb3ee33532 ObjectCacher: fix debug log level in split
Level 0 should never be used for this kind of debugging.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-03-06 12:32:43 -08:00
Sage Weil
de31531da3 Merge pull request #90 from grosskur/fix-debian-libsnappy
debian: require libsnappy-dev for ceph
2013-03-06 07:54:13 -08:00
Alan Grosskurth
a319f5cb74 debian: require libsnappy-dev for ceph
Debian builds are currently broken without this requirement.
2013-03-06 02:21:12 -08:00
Gary Lowell
e694ea58c2 release-process.rst: Fix typos
Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-03-05 22:08:15 -08:00
Sage Weil
8184b68c37 Merge branch 'wip-prepare'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alexandre Marangone <alexandre.marangone@inktank.com>
Tested-by: Tamil Muthamizhan <tamil.muthamizhan@inktank.com>
2013-03-05 13:33:05 -08:00
Sage Weil
32407c994f ceph-disk-prepare: move in-use checks to the top, before zap
Move the in-use checks to the very top, before we (say) zap!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-05 13:08:26 -08:00
Sage Weil
8550e5c6ab doc/release-notes: v0.58
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-05 11:03:08 -08:00