Commit Graph

12351 Commits

Author SHA1 Message Date
Yehuda Sadeh
6ec622c0cf common: use ceph_armor instead of openssl based functions
also modify ceph_[un]armor to get dest buffer length
2010-12-03 19:34:37 -08:00
Yehuda Sadeh
58f3ce4a34 crypto: test for allocation failure, cleanup 2010-12-03 19:34:37 -08:00
Yehuda Sadeh
15d8bdf3bf crypto: use crypto++ for aes instead of openssl
need to implement it more efficiently, currently going through a string object
2010-12-03 19:34:37 -08:00
Sage Weil
378d13df95 osd: remove poid/soid from ScrubMap::object; clean up callers
The soid is in the key in the map; no need to store it in the value.
Update the scrub code appropriately.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-03 10:02:30 -08:00
Sage Weil
a457cbb9c2 mon: fix typo
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-03 10:02:30 -08:00
Colin Patrick McCabe
a4cc929ced make: create log directories and tmp directories
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-12-03 09:35:55 -08:00
Jim Schutt
a5297388a7 msgr: Correctly handle half-open connections.
If poll() says a socket is ready for reading, but zero bytes
are read, that means that the peer has sent a FIN.  Handle that.

One way the incorrect handling was manifesting is as follows:

Under a heavy write load, clients log many messages like this:

[19021.523192] libceph:  tid 876 timed out on osd6, will reset osd
[19021.523328] libceph:  tid 866 timed out on osd10, will reset osd
[19081.616032] libceph:  tid 841 timed out on osd0, will reset osd
[19081.616121] libceph:  tid 826 timed out on osd2, will reset osd
[19081.616176] libceph:  tid 806 timed out on osd3, will reset osd
[19081.616226] libceph:  tid 875 timed out on osd9, will reset osd
[19081.616275] libceph:  tid 834 timed out on osd12, will reset osd
[19081.616326] libceph:  tid 874 timed out on osd10, will reset osd

After the clients are done writing and the file system should
be quiet, osd hosts have a high load with many active threads:

$ ps u -C cosd
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1383  162 11.5 1456248 943224 ?      Ssl  11:31 406:59 /usr/bin/cosd -i 7 -c /etc/ceph/ceph.conf

$ for p in `ps -C cosd -o pid --no-headers`; do grep -nH State /proc/$p/task/*/status | grep -v sleep; done
/proc/1383/task/10702/status:2:State:   R (running)
/proc/1383/task/10710/status:2:State:   R (running)
/proc/1383/task/10717/status:2:State:   R (running)
/proc/1383/task/11396/status:2:State:   R (running)
/proc/1383/task/27111/status:2:State:   R (running)
/proc/1383/task/27117/status:2:State:   R (running)
/proc/1383/task/27162/status:2:State:   R (running)
/proc/1383/task/27694/status:2:State:   R (running)
/proc/1383/task/27704/status:2:State:   R (running)
/proc/1383/task/27728/status:2:State:   R (running)

With this fix applied, a heavy load still causes many client
resets of osds, but no runaway threads result.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-03 09:10:58 -08:00
Colin Patrick McCabe
39b42b21e9 make: create /etc/ceph if it doesn't exist
make: create /etc/ceph if it doesn't exist. On uninstall, remove the
directory if it's empty. (Never remove a user's config file, though.)

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-12-02 17:35:32 -08:00
Colin Patrick McCabe
da5ab7c9a4 ost: object_info_t: decode old versions correctly
object_info_t has one constructor that initializes everything from a
bufferlist. This means that the decode function needs to give default
values to fields in object_info_t that aren't found in the bufferlist.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-12-02 16:56:48 -08:00
Greg Farnum
03eb4e7a07 man: add man page for cephfs
Add to Makefile, debian, and ceph.spec.in bits
2010-12-02 16:18:38 -08:00
Yehuda Sadeh
6518fae317 watch: some more linger fixes 2010-12-02 11:52:28 -08:00
Sage Weil
78a1462243 osd: fix log tail vs last_complete assert on replica activation
The last_complete may be below the log tail IFF we have a backlog.

Fixes 756918be3b.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 15:40:28 -08:00
Samuel Just
63fab458f6 rados_bencher.h:
bench_write and bench_seq will now wait on any write/read
	rather than the one least recently started.

	bench_write adds its pid to the BENCH_DATA object

	bench_read uses the pid in BENCH_DATA to generate the object
	names to read.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
2010-12-01 15:11:06 -08:00
Colin Patrick McCabe
0ea601ab26 Create SyslogStreambuf
SyslogStreambuf is a kind of stream buffer that allows you to output
characters from an ostream to syslog. Most standard IO streams can make
use of this Streambuf.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-12-01 15:00:23 -08:00
Sage Weil
a3d8c52794 filestore: call lower-level do_transactions() during journal replay
We used to call apply_transactions, which avoided rejournaling anything
because the journal wasn't writeable yet, but that uses all kinds of other
machinery that relies on threads and finishers and such that aren't
appropriate or necessary when we're just replaying journaled events.

Instead, call the lower-level do_transactions() directly.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 13:48:56 -08:00
Sage Weil
9ecbc300cb filestore: do journal mode autodetect and sanity check _before_ replay
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 13:46:30 -08:00
Sage Weil
f9fa855a71 filestore: fix journal locking on trailing mode
We're already holding journal_lock due to the surrounding
op_submit_{start,finish}.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 11:05:11 -08:00
Sage Weil
0897edafc8 Merge branch 'testing' into rc
Conflicts:
	configure.ac
2010-12-01 10:20:43 -08:00
Sage Weil
cbb562089c rbd: use MIN instead of min()
Not even sure where min() was coming from, but it seems to be missing on
i386 lucid.:

g++ -DHAVE_CONFIG_H -I.     -Wall -D__CEPH__ -D_FILE_OFFSET_BITS=64 -D_REENTRANT -D_THREAD_SAFE -rdynamic -g -O2 -MT rbd.o -MD -MP -MF .deps/rbd.Tpo -c -o rbd.o rbd.cc
rbd.cc: In function 'int do_import(void*, const char*, int, const char*)':
rbd.cc:837: error: no matching function for call to 'min(uint64_t&, off_t)'
make[3]: *** [rbd.o] Error 1

Reported-by: John Leach <john@johnleach.co.uk>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 10:20:24 -08:00
Sage Weil
792b04ba1e client: connect to export targets on cap EXPORT
Also unconditionally connect on reconnect, even when there aren't any
outstanding requests.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 10:20:23 -08:00
Sage Weil
5bdae2af8c ceph v0.23.2 2010-12-01 10:03:26 -08:00
Sage Weil
bde0c72193 filestore: do not autodetect BTRFS_IOC_SNAP_CREATE_ASYNC until interface is finalized
Li has proposed an alternative V2 ioctl that looks nicer, so wait until
that is finalized.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 10:03:26 -08:00
Sage Weil
4592c22041 client: fix cap export handler
An EXPORT cap msg can race with a cap release; deal with that (realigning
this code with the kclient).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 09:44:26 -08:00
Laszlo Boszormenyi
15c272e8df man: fix monmaptool man page
I've found the manpage problem that I've noted before. It's about
monmaptool, the CLI says it's usage:
[--print] [--create [--clobber]] [--add name 1.2.3.4:567] [--rm name]
<mapfilename>
But the manpage states this as an example:
monmaptool --create --add 192.168.0.10:6789 --add 192.168.0.11:6789 --add
192.168.0.12:6789 --clobber monmap
This definitely misses 'name' after the 'add' switch, resulting:
"invalid ip:port '--add'" as an error message. Attached patch fixes this
inconsistency.

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
2010-12-01 09:24:45 -08:00
Sage Weil
6d96104e55 osd: simplify scrub sanity checks
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-30 16:50:41 -08:00
Sage Weil
76b55c8a12 osd: only adjust osd scrub_pending if pg was reserved
If for some reason we enter scrub() without scrub_reserved == true, don't
adjust the osd->scrubs_pending or we'll screw up the accounting.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-30 16:50:25 -08:00
Sage Weil
260840f563 mds: fix import_reverse re-exporting of caps
Make the import_reverse() set the pin/state before it clears them by using
the helper that sets them.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-30 16:38:21 -08:00
Sage Weil
fe9fad7bea v0.25~rc 2010-11-30 16:25:50 -08:00
Sage Weil
109e3f180b mds: turn off mds_bal_frag until resolve vs split/merge is fixed
See #594

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-30 16:25:15 -08:00
Sage Weil
f216b0200b Merge remote branch 'origin/lost' into unstable
Conflicts:
	src/osd/osd_types.h
2010-11-30 16:11:20 -08:00
Colin Patrick McCabe
0cc8d34e7f osd: refactor object_info_t constructor a bit
Create a copy constructor for object_info_t, since we often want to copy
an object_info_t and would rather not try to remember all the fields.
Drop the lost parameter from one of the other constructors, because it's
not used that much.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:49 -08:00
Colin Patrick McCabe
cee3cd51fc osd: share_pg_log: update peer_missing
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
e9ccd7eb09 osd: mark_obj_as_lost: fix oloc init, eversion
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
c29fbb12e0 osd: mark_all_unfound_as_lost: bugfix, refactor
mark_all_unfound_as_lost: just delete items from the rmissing set as we
find them, rather than using a multi-pass system.

Update info.last_update as we go so that log printouts will look correct
(the log printout function checks info.last_update)

Don't remove from missing or missing_loc in mark_obj_as_lost.
PG::missing_loc should never have the soid, and PG::missing we handle
elsewhere.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
b46f847cf9 osd: mark_obj_as_lost: don't assume we have obj
In PG::mark_obj_as_lost, we have to mark a missing object as lost. We
should not assume that we have an old version of the missing object in
the ObjectStore. If the object doesn't exist in the object store, we
have to create it so that recovery can function correctly.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
5e243f3ee8 osd: create lost2 test
This one verifies:
1. Client asks for an unfound object and gets put to sleep
2. Object gets declared lost
3. Client wakes up

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
55f7e567de osd: mark_all_unfound_as_lost: set lost attr
In mark_all_unfound_as_lost, we need to set the lost bit in the objects'
object_info_t.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
d5e6cae2f4 radostool: fix memleak in error path
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
c281e1e073 osd: mark_all_unfound_as_lost: wake waiters
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:48 -08:00
Colin Patrick McCabe
b15a97c71e test_lost: add lost1 test
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:47 -08:00
Colin Patrick McCabe
ad4e5f36d4 osd: ReplicatedPG::do_op: error on read-from-lost
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:48:47 -08:00
Colin Patrick McCabe
136dfdeb70 osd: don't mark objs as lost unless we're active
We don't have enough information to mark objects as lost until we
activate the PG. might_have_unfound isn't even built until PG::activate.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:47:09 -08:00
Sage Weil
08bd4eadd2 mds: fix resolve for surviving observers
Make all survivors participate in resolve stage, so that survivors can
properly determine the outcome of migrations to the failed node that did
not complete.

The sequence (before):
 - A starts to export /foo to B
 - C has ambiguous auth (A,B) in it's subtree map
 - B journals import_start
 - B fails
...
 - B restarts
 - B sends resolves to everyone
   - does not claim /foo
 - A sends resolve _only_ to B
   - does claim /foo
 - B knows it's import did not complete
 - C doesn't know anything.  Also, the maybe_resolve_finish stuff was
   totally broken because the recovery_set wasn't initialized

See new (commented out) assert in Migrator.cc to reproduce the above.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-30 15:43:53 -08:00
Colin Patrick McCabe
c0e60afea5 test: dump_osd_store: sort dump output
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:43:44 -08:00
Colin Patrick McCabe
e555899cd3 osd: active replicas process logs from primaries
In _process_pg_info, if the primary sends us a PG::Log, a replica should
merge that log into its own.

mark_all_unfound_as_lost / share_pg_log: don't send the whole PG::Log.
Just send the new entries that were just added when marking the objects
as lost.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:43:44 -08:00
Colin Patrick McCabe
de09422497 osd: object_info_t: add lost field
We can now permanently mark objects as lost by setting the lost bit in
their object_info_t. Rev the object_info_t struct.

get_object_context: re-arrange this so that we're always setting the
lost bit. Also avoid some unecessary steps.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:43:44 -08:00
Colin Patrick McCabe
80f3ea10f5 Add ./ceph dump pg debug degraded_pgs_exist
./ceph dump pg debug degraded_pgs_exist returns TRUE if some pgs are
degraded; false otherwise.

tests: move start_recovery into test_common.sh.
Create recovery1 test.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:43:44 -08:00
Colin Patrick McCabe
fb4734be56 (re)add mechanism for marking objects as lost
In activate_map, we now mark objects that we know are unfindable as
lost. This relies on the might_have_unfound set introduced earlier.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-30 15:43:44 -08:00
Yehuda Sadeh
1123b5c588 osd, librados: misc fixes, linger related issues 2010-11-30 13:21:50 -08:00
Sage Weil
bf784cdb4f osd: fix object_info_t() initialization of oloc
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-30 12:57:43 -08:00