Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end. This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.
Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state. It's
also a bit more defensive.
Signed-off-by: Sage Weil <sage@inktank.com>
It is possible that the crush map contains device ids that do not exist as
osds. Filter them out of the CRUSH result.
Signed-off-by: Sage Weil <sage@newdream.net>
These default arguments, although handy when we just want to run the test,
just mess things up when we don't actually need them. If we don't specify
them on the CLI, we'll end up using the default ones, and that is just
annoying.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Use '--test-max-in-flight VAL' (default: 50) or check '--help' for more.
Also, allow the test to work even if we don't specify a conf file.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Allow the user to have more control on:
- the sizes of the data being written by the operations;
- which operations are suppressed from execution;
- view the throughput;
- specify the periodicity of throughput output.
For the CLI options, '--help' should suffice.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
There was a silent bug in the activate 'acks' that go from the replica back
to the primary. Prior to 86aa07d7a91ac23074e76551c3a6db3a5736cffa, we
were passing same_interval_since to the callback, which mean that
sometimes _activate_committed() would ignore it and we wouldn't update
last_epoch_started. This was mosty invisible; the next peering event would
just, in some cases, look at more past intervals than it needed to.
In 86aa07d7a91ac23074e76551c3a6db3a5736cffa we fixed this so that the check
is correct. (We noticed because now we aren't setting the pg CLEAN flag
until after last_epoch_started is updated.) That, in turn, revealed a
similar bug that we're fixing here: the replica's last_peering_reset could
be lower than the primary's, such that the activate 'ack' info is ignored.
To fix this, simply set last_peering_reset to the current epoch when the
replica activates; this will always be greater than the primary's.
Signed-off-by: Sage Weil <sage@inktank.com>
The various rgw tools were all recompiling my_libradosgw_src files over
again. Instead build a single .a (not .la!) and link that in.
Signed-off-by: Sage Weil <sage@inktank.com>
Make ObjectCacher users specify the cache size for each ObjectCacher
instances. This avoids the confusing config namespace for the object
cache (client_oc_*), and also will make it possible to eventually have
cache sizes that vary between (say) RBD images.
- drop unused client_oc_max_sync_write
- add rbd_cache_max_size, max_dirty, target_dirty config values (these are
the defaults for each image)
We probably want to add librbd calls to specify the cache size on a
per-image basis? Alternatively, we should make it possible to share a
cache pool between multiple images in some explicit way.
Signed-off-by: Sage Weil <sage@newdream.net>
Once upon a time the caller would do this, but none of those have survived,
and this makes more sense.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If the max_dirty config is 0, switch to write-thru mode, which will
explicitly flush and wait on the range we just dirtied.
Closes: #2335
Signed-off-by: Sage Weil <sage@newdream.net>
A common pattern is to search for the first buffer intersecting or
following an object offset. Use a helper for that.
Signed-off-by: Sage Weil <sage@newdream.net>
Add ability to flush a range of an object, or a vector of ObjectExtents. Flush
any buffers that intersect the specified range, or the entire object if len==0.
Signed-off-by: Sage Weil <sage@newdream.net>
Do not mark a PG CLEAN or set last_epoch_clean until after the PG activate
is stable on all replicas.
This effectively means that last_epoch_clean will never fall in an interval
that follows last_epoch_started's interval. It *can* be >
last_epoch_started when it falls within the same interval.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We are checking against last_peering_reset in _activate_committed(), so we
need to pass in that value to compare against; last_peering_reset may be
greater than same_interval_since, e.g. on a replica that learns about the
PG after the initial creation epoch.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- always include 'slow request' in the warning string
- only summarize if we warn about anything (they all may have backed off)
- be more concise
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
int is 32-bit on 64-bit archs, but ssize_t is 64-bits. This fixes overflow
when reading large (>2GB) extends.
Fixes: #2275
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Passing bufferlist and not const bufferlist in aio_write(). We assign
it to another object which is not const, and it doesn't work too
well.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Apparently S3_put_object() and S3_get_object() need to
run on the same thread as S3_runall_request_context() (at least
per context). So We now call them in the workqueue thread.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>