Commit Graph

19576 Commits

Author SHA1 Message Date
Sage Weil
8ec476e526 osd: do not mark pg clean until active is durable
Do not mark a PG CLEAN or set last_epoch_clean until after the PG activate
is stable on all replicas.

This effectively means that last_epoch_clean will never fall in an interval
that follows last_epoch_started's interval.  It *can* be >
last_epoch_started when it falls within the same interval.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 13:08:09 -07:00
Sage Weil
86aa07d7a9 osd: check against last_peering_reset in _activate_committed
We are checking against last_peering_reset in _activate_committed(), so we
need to pass in that value to compare against; last_peering_reset may be
greater than same_interval_since, e.g. on a replica that learns about the
PG after the initial creation epoch.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 13:08:09 -07:00
Sage Weil
9d7ec04b69 osd: tweak slow request warnings
- always include 'slow request' in the warning string
- only summarize if we warn about anything (they all may have backed off)
- be more concise

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 13:08:04 -07:00
Sage Weil
a4b42fc3ce keyring: clean up error output
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:13:41 -07:00
Sage Weil
ae0ca7be3c keyring: catch key decode errors
Return EINVAL on decoding errors.

Other decode_base64() callers are already guarded.

Fixes: #2124
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:13:28 -07:00
Sage Weil
6812309edf debian: depend on uuid-runtime
We use uuidgen for osd creation.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:03:56 -07:00
Sage Weil
3509b039a2 safe_io: int -> ssize_t
int is 32-bit on 64-bit archs, but ssize_t is 64-bits.  This fixes overflow
when reading large (>2GB) extends.

Fixes: #2275
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:01:44 -07:00
Sage Weil
203a7d67aa objectcacher: wait directly from writex()
This gives us access to the original ObjectExtent (useful later), and
simplifies the callers.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 20:31:01 -07:00
Sage Weil
991c93ed27 mon: fix call to get_uuid() on non-existant osd
Didn't catch this with vstart.sh testing.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 16:02:00 -07:00
Yehuda Sadeh
150adcceb6 debian: add rules for rest-bench
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:55:34 -07:00
Yehuda Sadeh
53f642e29a rest-bench: build conditionally
added configure --with-rest-bench, and configure --with-system-libs3

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:55:25 -07:00
Yehuda Sadeh
f60444fbeb obj_bencher: changed interface
Passing bufferlist and not const bufferlist in aio_write(). We assign
it to another object which is not const, and it doesn't work too
well.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:27 -07:00
Yehuda Sadeh
d54ef1c805 rest-bench: change thread context for libs3 calls
Apparently S3_put_object() and S3_get_object() need to
run on the same thread as S3_runall_request_context() (at least
per context). So We now call them in the workqueue thread.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:27 -07:00
Yehuda Sadeh
6832231806 rest-bench: change command line arg for seconds
seconds should be a param, not a command.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
99791327cb obj_bencher: fix data encoding
There was a bug when doing a read with multiple threads, when
one of the threads was left behind; when it returned the compared
data string might have been cluttered by newer strings that
were longer.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
76a5c8993d obj_bencher: use better round robin for completion slot scan
Start where left last time, don't start from zero.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
e2eb825602 rest-bench: reuse libs3 handle
This is necessary for keep-alive to be useful. Otherwise a new
connection will be created for each request.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
e62fd7f987 obj_bencher: fix param order
seq benchmark was broken, passed params in wrong order.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
7e96a4a862 rest-bench: use refcount for req_state life cycle
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
072c316a11 rest-bench: multiple fixes
write seems to work

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
4fe068ecf8 rest-bench: cleanups, initialization
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
6e043808b7 rest-bench: create workqueue for requests dispatching
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
13296a5453 rest_bench: cleanups, implement get and put
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
54da3e6d7c rest_bench: some more implementation
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
e6026fee15 rest_bench: initial work
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
f9d9fb6a68 rados_bencher: abstract away rados specific operations
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
0fbc3c53b9 rados_bencher -> obj_bencher
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
1a8eea853d rados_bencher: fix build
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
a17124efad rados_bencher: restructure code, create RadosBencher class
Preparing for different benchmark backend.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Yehuda Sadeh
ddb858c5db rados_bencher: restructure code (initial work)
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:26 -07:00
Sage Weil
3e260aec72 librados: call safe callback on read operation
This avoids confusion for the user who isn't sure if they should wait for
complete or safe on a read aio.  It also means that you can always wait
for safe for both reads or writes, which can simplify some code.

Dup the roundtrip functional tests to verify this works.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@inktank.com>
2012-05-04 15:26:33 -07:00
Sage Weil
edd73e2e41 crush: note that tree bucket size is tree size, not item count
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 14:51:15 -07:00
Sage Weil
4061fca545 Merge remote-tracking branch 'gh/wip-crush-forcefeed'
Reviewed-by: Sam Just <sam.just@inktank.com>
2012-05-04 14:16:49 -07:00
Sage Weil
ce60e1be21 OpRequest: ignore all ops while the oldest one is still young.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-05-04 14:15:59 -07:00
Sage Weil
f3043fee3e objectcacher: don't wait for write waiters; wait after dirtying
We do three things here:

- Wait for the dirty limit to drop _after_ writing into the cache.  This
  means that an active thread can always provide its dirty data to the
  cache for potential writing without waiting (a small win).  It's also
  helpful later... (see below, and next commit)

- Don't wait for other waiters.  If another thread dirtying 1MB and is
  waiting for it, don't wait for them too.  This prevents two threads
  writing 1MB at a time with a limit of 1MB from serializing: both can
  dirty their 1MB and initiate a flush, and they once 1/2 of that has
  flushed one of them will be allowed to proceed.

- Update the flusher to add the dirty_waiting bytes to the amount to
  write so that the OPs will indeed be parallel.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 13:12:58 -07:00
Sage Weil
f3760da4fe crush: update_item() should pass an error back to the caller
If you give it a nonsensical loc, it will fail check_item_loc() (false) and
then error out on insert_item().

Reported-by: Sam Just <sam.just@inktank.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 12:09:28 -07:00
Sage Weil
e0a636f907 crush: improve docs/comments for check_item_loc and insert_item semantics
We don't adjust the internal hierarchy structure (currently).  This is a
bit confusing, so describe the semantics in some detail.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 11:06:27 -07:00
Sage Weil
878423f963 crush: comment and clean up checks for check_item_loc and insert_item
- drop useless cur for check_item_loc
- comment the checks we're doing so the code is understandable
- use name_exists instead of broken get_item_id != 0 check

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 11:05:34 -07:00
Sage Weil
845e2aa56d Merge branch 'wip-crush-update'
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-05-03 20:44:20 -07:00
Sage Weil
720bea4a71 Merge branch 'wip-osd-uuid'
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-05-03 20:43:54 -07:00
Sage Weil
2629474f52 global_init: do not count threads before daemonize()
We were verifying that there was only 1 thread (the presumably main()) when
we call daemonize.  However, with the new logging code, we stop a thread
right before the check, and /proc apparently updates asynchronously such
that our attempt to count running threads gives us a bad answer.

Just remove this kludgey check; we'll have to catch this class of bugs
the hard way.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-05-03 20:42:21 -07:00
Sage Weil
72538c0f18 Makefile: fix $shell_scripts substution
No spaces here, apparently!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-03 20:40:20 -07:00
Sage Weil
16461acf6a mon: simplify 'osd create <uuid>' command
Make the flow clearer for the three cases (exists, about to exist, new).

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-03 20:34:38 -07:00
Sage Weil
42f2d2fd65 crushtool: another simple test for update
If the weight doesn't change it should be a no-op.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-03 20:33:35 -07:00
Sage Weil
9772d13218 crush: document return values
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-03 20:28:27 -07:00
Sage Weil
1cd6f76420 crush: compare fixed-point weights in update_item
This is less ugly than converting the quantized value back to a float and
comparing that.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-03 20:28:21 -07:00
Sage Weil
c03b852187 thread: remove get_num_threads() static
This looks in /proc to count threads.  Kludgey and no longer needed.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-05-03 20:01:02 -07:00
Sage Weil
e50932c204 global_init: do not count threads before daemonize()
We were verifying that there was only 1 thread (the presumably main()) when
we call daemonize.  However, with the new logging code, we stop a thread
right before the check, and /proc apparently updates asynchronously such
that our attempt to count running threads gives us a bad answer.

Just remove this kludgey check; we'll have to catch this class of bugs
the hard way.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-05-03 20:00:36 -07:00
Sage Weil
684558ace9 crush: clean up check_item_loc() comments
Thanks Greg!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-03 18:59:02 -07:00
Joao Eduardo Luis
27d98d2419 OpRequest: only show a small set of the oldest messages, instead of all.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-05-03 15:49:51 -07:00