Commit Graph

20005 Commits

Author SHA1 Message Date
Sage Weil
03445290da msgr: move incoming queue to separate class
This extricates the incoming queue and its funky relationship with
DispatchQueue from Pipe and moves it into IncomingQueue.  There is now a
single IncomingQueue attached to each Pipe.  DispatchQueue is now no
longer tied to Pipe.

This modularizes the code a bit better (tho that is still a work in
progress) and (more importantly) will make it possible to move the
incoming messages from one pipe to another in accept().

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
0dbc541695 msgr: make D_CONNECT constant non-zero, fix ms_handle_connect() callback
A while ago we inadvertantly broke ms_handle_connect() callbacks because
of a check for m being non-zero in the dispatch_entry() thread.  Adjust the
enums so that they get delivered again.

This fixes hangs when, for example, the ceph tool sends a command, gets a
connection reset, and doesn't get the connect callback to resend after
reconnecting to a new monitor.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
2429556a51 msgr: fix pipe replacement assert
We may replace an existing pipe in the STANDBY state if the previous
attempt failed during accept() (see previous patches).

This might fix #1378.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
204bc594be msgr: do not try to reconnect con with CLOSED pipe
If we have a con with a closed pipe, drop the message.  For lossless
sessions, the state will be STANDBY if we should reconnect.  For lossy
sessions, we will end up with CLOSED and we *should* drop the message.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
e6ad6d25a5 msgr: move to STANDBY if we replace during accept and then fail
If we replace an existing pipe during accept() and then fail, move to
STANDBY so that our connection state (connect_seq, etc.) is preserved.
Otherwise, we will throw out that information and falsely trigger a
RESETSESSION on the next connection attempt.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:53:59 -07:00
Sage Weil
a1fe589209 mon: initialize quorum_features
This could cause us to incorrectly encode new features into the monstore
that an old mon won't understand.

This is overly conservative; we probably need to persist the set of quorum
features that are supported and use those.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 16:05:16 -07:00
Samuel Just
2472034c4f OSD::do_command: unlock pg only if we had it
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-02 09:51:37 -07:00
Samuel Just
841451f2fe MOSDSubOp: set hobject_incorrect_pool in decode_payload
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-02 09:49:52 -07:00
Sage Weil
deceb709ea filestore: initialize m_filestore_do_dump
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 07:10:33 -07:00
Sage Weil
0810ab6de6 osdmap: check new pool name on rename
Ensure the new pool name doesn't already exist, both in the current and
project map.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-29 19:56:07 -07:00
Sage Weil
5a93550912 osd: handle pool name changes properly
* Remove the old name from the name->id map.

Fixes: #2676
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-29 19:54:35 -07:00
Sage Weil
a8d7fd959d mon: 'osd pool rename <oldname> <newname>'
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-29 14:51:32 -07:00
Yehuda Sadeh
15ebf2028e rest-bench: mark request as complete later
We marked a request as complete in the callback, however
it might be that we're still inside S3_runall_request_context()
which means that request is not really complete yet.
Possibly fixes bug #2652.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-06-28 11:28:35 -07:00
Samuel Just
335b918dc0 DBObjectMap: clones must inherit spos from parent
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-28 09:53:22 -07:00
Samuel Just
cc1da95895 filestore: sync object_map object in lfn_remove when nlink > 1
In the following sequence:

1) create (a, 1)
2) setattr (a, 1)
3) link (a, 1), (b, 1)
4) remove (a, 1)

If we play 1-4 and then replay 1-4 again, we will end up removing
(b, 1)'s attributes since nlink for (a, 1) the second time through
is 1.  We fix this by marking spos on the object_map header for
(a, 1) when we remove (a, 1) but not eh attributes.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-28 09:53:19 -07:00
Sage Weil
9d6013e0db debian: move metadata server into ceph-mds
Also adjust the recommends and depends, so that libcephfs1 and ceph-fuse
hang off of ceph-mds instead of ceph.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 20:25:13 -07:00
Sage Weil
915f321096 debian: move mount.ceph and cephfs into ceph-fs-common
Based on patches from Laszlo Boszormenyi (GCS) <gcs@debian.hu>.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 20:25:13 -07:00
Sage Weil
0d9b558f5c debian: arch linux-any
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 20:25:13 -07:00
Laszlo Boszormenyi (GCS)
89492329d1 debian: build with libnss instead of crypto++
Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
2012-06-27 20:25:13 -07:00
Sage Weil
9d7f048073 doc/config-cluster/authentication: keyring default locations, simplify key management
- keyrings have new default locations that everyone should use.
- the user key setup is vastly simplified if you use the
  'ceph auth get-or-create' command.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 17:49:23 -07:00
Joao Eduardo Luis
16d55651e8 mon: MonmapMonitor: Use default port when the specified on 'add' is zero
Fixes a bug triggered by using the ceph tool to 'mon add' with a port set
to zero. We now default to the monitor's default port (6789) instead, and
we will fail if that port is already assigned to some other monitor.

Fixes: bug #2661

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-06-27 16:20:47 -07:00
Samuel Just
17f433aa56 OSD: disconnect_session_watches: handle race with watch disconnect
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
2012-06-27 07:10:32 -07:00
Greg Farnum
840ae24449 mon: don't tick the PaxosServices if we are currently slurping.
They aren't prepared to deal with the on-disk state being inconsistent.

Signed-off-by: Greg Farnum <greg@inktank.com>
2012-06-25 14:45:06 -07:00
Sage Weil
ef6beec992 objecter: do not feed session to op_submit()
The linger_send() method was doing this, but it is problematic because the
new Op doesn't get its pgid or acting vector set correctly.  The result is
that the request goes to the right OSD, but has the wrong pgid, and makes
the OSD complain about misdirected requests and drop it on the floor.  It
didn't affect the test results because we weren't testing whether the
watch was working in that case.

Instead, we'll just recalculate and get the same value the parent linger
op did.  Which is fine, and goes through all the usual code paths so
nothing is missed.

Also, increment num_homeless_ops before we recalc_op_target(), so that we
don't (harmlessly, but confusingly) underflow.

Fixes: #2022
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-25 14:44:30 -07:00
Samuel Just
4e45d60f6e ObjectStore::Transaction: initialize pool_override in all constructors
use_pool_override and pool_override weren't initialized in these two
constructors.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-24 13:33:36 -07:00
Samuel Just
9fcc3dee9a osd_types.cc: remove hobject_t decode asserts
These asserts were useful for ensuring that pool is passed
in in the correct places, but they prevent the encoder
testing from working.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-21 17:10:30 -07:00
Sage Weil
80649d08b9 mon: note that monmap may be reencoded later
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:30 -07:00
Sage Weil
77d836c5b8 mon: encoding new monmap using quorum feature set
It is probably unlikely that someone will expand the mon cluster with a
mixed feature set, but we know the quorum features here, so we should use
them.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:30 -07:00
Sage Weil
de5b323659 mon: conditionally encode mon features for remote mon
The only time we encode these is when forwarding messages.  Encoding using
the destination's feature set.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:30 -07:00
Sage Weil
c399d903dd mon: conditionally encode PGMap[::Incremental] with quorum features
This allows a mon cluster to transition to the new encoding during a
rolling upgrade.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:30 -07:00
Sage Weil
0aaf7334a9 mon: conditionally encode auth incremental with quorum feature bits
If the quorum does not yet all have the MONENC feature, stick to the old
encoding.

It might be more polite to require a super-quorum before switching over,
and take note so that thereafter we can stick to the new encoding, but
that has more moving parts and I'm not sure it's worth the complexity.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:29 -07:00
Sage Weil
06288a9d10 mon: track intersection of quorum member features
When we form a quorum, also note the intersection of the quorum members'
feature bits.  This will inform decisions about what encodings we use.

This is an imperfect strategy because the quorum may change, and we may
have a mon with old code join in and not understand what is going on.
However, it does ensure that a majority of the members run new code, so in
the absence of other failures we can make progress.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:29 -07:00
Sage Weil
2355b233ea mon: conditionally encode old monmap when peer lacks feature
This allows a rolling upgrade from 0.47.2 to 0.48.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-21 17:10:29 -07:00
Samuel Just
2fe9816305 OSD,PG,ObjectStore: handle messages with old hobject_t encoding
Messages that embed an hobject_t need to have the pool field fixed
on messages from old peers.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-21 17:10:29 -07:00
Sage Weil
448f5b02b1 logrotate: reload all upstart instances
upstart doesn't let you wildcard all instances of a given job, so we
slog through initctl list output, and reload any running daemons.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Tommi Virtanen <tv@inktank.com>
2012-06-21 12:43:00 -07:00
Sage Weil
a85a15fef4 Merge remote-tracking branch 'gh/stable' into next 2012-06-21 08:20:17 -07:00
Sage Weil
c467d9d1b2 v0.47.3 2012-06-20 10:57:41 -07:00
Sage Weil
17dcf60510 filestore: disable 'filestore fiemap' by default
We've seen this failing on both btrfs (Guido) and XFS (Oliver).  This works
around #2535.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-20 10:02:36 -07:00
Samuel Just
88c7629e04 OSD: clear_temp: split delete into many transactions
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-06-19 17:34:20 -07:00
Yehuda Sadeh
145d1c146b rgw: set s->header_ended before flushing formatter
otherwise we don't account the formatter in s->bytes_sent.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-06-19 13:26:02 -07:00
Yehuda Sadeh
8a4e2a116b rgw: log user and not bucket owner for service operations
For operations that are done on the service (e.g., list buckets)
we need to log the user that did the operation, and not the bucket
owner.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-06-19 13:26:00 -07:00
Yehuda Sadeh
282e2260f9 rgw: initalize s->enable_usage_log
Missing initialization, we ended up not logging every operation.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-06-19 13:25:58 -07:00
Sage Weil
f3f144adf1 osd: use derr (instead of cerr) for convertfs
This will appear in the log *and* stderr (if we're running in the
foreground).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-19 10:12:40 -07:00
Sage Weil
74658dfa2f osd: close stderr on daemonize
This spams stderr in an ugly way.  Users should look at the logs.

In particular, filestore upgrades spam the console, which is unpleasant.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-19 10:11:01 -07:00
Samuel Just
4ec9633653 PG: improve find_best_info
07f853db39 is actually too conservative,
it suffices to find any info with a last_update of at least the least
last_update from the last period to go active.  An info from a previous
interval is acceptable if the last interval never reported a commited
operation and thus still has the same last_update.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-06-19 10:49:48 -07:00
Samuel Just
0d8970fc81 PG: reg_last_pg_scrub on pg resurrection
This may solve the unreg_last_pg_scrub assert.

see #2453.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-18 14:33:50 -07:00
Samuel Just
b0e66b70cb ceph_osd: move auto-upgrade to after fork
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-18 14:33:33 -07:00
Sage Weil
37e56e0123 filestore: make disk format upgrade warning less scary, more informative
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
2012-06-18 14:07:20 -07:00
Sage Weil
030a2e3bf4 mon: include quorum in ceph status
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-18 14:02:29 -07:00
Sage Weil
2fc2cf03b3 mon: gracefully handle slow 'ceph -w' clients
If we are sending log updates to a client (ceph -w), and they are far
enough behind to drop behind first_committed, include a friendly message
in their stream but continue.

Drop useless return value from _create_sub_incremental().  Assert that we
can read the state file.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-18 14:00:06 -07:00