The code in common_preinit is still there to override
these settings as appropriate.
The set_daemon_default stuff was breaking ceph-conf tests (because
you would get the client-side defaults when asking about an OSD's
settings), and md_config_t isn't properly identifying daemons
using code_env yet.
Ticket to add it back in:
http://tracker.ceph.com/issues/20627
Signed-off-by: John Spray <john.spray@redhat.com>
This was a string in the old schema, and tests
depended on that -- if we want to change its type
let's do that separately to the infrastructure changes.
Signed-off-by: John Spray <john.spray@redhat.com>
These were awkward for typing of the '1' literal vs.
the int64_t settings. The whole max() thing is also
unnecessary now, if we set proper bounds on the option
definitions.
Signed-off-by: John Spray <john.spray@redhat.com>
As long as some options are being consumed
via md_config_t:: members, various users
of (unsigned) int values will get compile warnings
when they e.g. compare them with other unsigned values.
Signed-off-by: John Spray <john.spray@redhat.com>
The C++ class member fields continue to exist for
settings defined in common/legacy_config_opts.h, but
all the schema information is coming from common/options.cc
now.
The values in md_config_t::values are automatically
copied into the C++ class member fields for legacy config
options as needed.
Signed-off-by: John Spray <john.spray@redhat.com>
Previously these were all in one header and inclusions of it
got really verbose from everyone having to define SUBSYS and OPTION
macros even though they only wanted to pick out one or the other.
Also, this separates the subsys.h stuff (staying) from the legacy
config opt definitions (transitional, will go).
Signed-off-by: John Spray <john.spray@redhat.com>
These will be replaced by validate methods
on Option subclasses that need them. The code
that was in these files moved to options.[h|cc]
Signed-off-by: John Spray <john.spray@redhat.com>
It's a poor substitute for real a concurrency solution
but for the moment carry it forward so that the options
structure can replace the list of config_option
in md_config_t.
Signed-off-by: John Spray <john.spray@redhat.com>
We can have a legacy (static field) config object
that includes fields from config_opts.h, and
then switch to using dynamic get() for newly
added options, so that we don't need to do
code generation for the new config infrastructure.
Signed-off-by: John Spray <john.spray@redhat.com>
Define schema for config options. Helper to generate a header fragment
to declare the types.
Unlike the old config_opts.h approach, we will not intialize values in
the header. This avoids a recompile if there is a change and also allows
us to specify different defaults and do parsing and validation at runtime.
Instead, we'll intialize values in the constructure of the containing
class.
Signed-off-by: Sage Weil <sage@redhat.com>
According to AWS S3 in this document[1], an ACL can have up to 100
grants.
If the nums of grants is larger than 100, S3 will return like following:
400
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MalformedACLError</Code><Message>The XML you provided was not well-formed or did not validate against our published schema</Message><RequestId>10EC67824572C378</RequestId><HostId>AWL3NnQChs/HCfOTu5MtyEc9uzRuxpYMhmvXQry2CovCcuxO2/tMqY1zGoWOur86ipQt3v/WEiA=</HostId></Error>
Now if the nums of request acl grants is larger than the maximum allowed, rgw will return
like following:
400
<?xml version="1.0" encoding="UTF-8"?><Error><Code>MalformedACLError</Code><Message>The request is rejected, because the acl grants number you requested is larger than the maximum 101 grants allowed in an acl.</Message><BucketName>222</BucketName><RequestId>tx000000000000000000017-00596b5fad-101a-default</RequestId><HostId>101a-default-default</HostId></Error>
The maximum number of acl grants can be configured in config file with the configuration item:
rgw_acl_grants_max_num
[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
Signed-off-by: Enming Zhang <enming.zhang@umcloud.com>
To support running in dynamic enviornments (like Kubernetes) the mon needs
to be able to advertise and ip address that is different from the ip address
that it listens on locally.
Added a new config option "public_bind_addr" which if set becomes the address
that the mon will bind to locally. If empty (the default) the public_addr
will be used to bind locally.
added a new function on Messenger to set_addr which is called by ceph-mon to set
the advertised address after doing the bind.
also relaxed the "wrong node!" errors in AsyncMessenger and SimpleMessenger as
its now valid to talk to a peer whose peer_addr_of_me is different from what
we expect.
Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>
This option allows us to disable the crush smoke test when creating pools,
injecting crush maps, or making other changes. DANGER DANGER.
Signed-off-by: Sage Weil <sage@redhat.com>
It is normal for the initial cluster to lack a mgr. Wait for some
grace period before complaining about a missing mgr.
Default to 30m.
Signed-off-by: Sage Weil <sage@redhat.com>
This is a follow-up change of https://github.com/ceph/ceph/pull/15976
and makes the bluestore cache capacity being self-adaptive for
different backends.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
common: Update the error string when res_nsearch() or res_search() fails
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Sage Weil <sweil@redhat.com>
- The key change is the type of rval,
that will call the conversion when en/decoded
- Remainder is fixes for the type change and promotions
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
In luminous we now have full support of ec pool, so it is not
good to continue hardcoding default pool type to "replicated".
Introduce a configurable osd_pool_default_type for this.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Kill old mgr_modules option.
Add new mgr_initial_modules option, on the mon, for the initial cluster
mgrmap.
Add ls, enable, disable commands.
Respawn mgr if the module list changes. In the future we could enable
new modules without a full restart, but disabling probably requires (and
is best handled by) a respawn.
Signed-off-by: Sage Weil <sage@redhat.com>
* common/dns_resolve: collect the priority in the SRV records
also
* mon/MonClient: only connect to the mon with highest priority
(in context of SRV record, the lowest priority value), and prefer
the mon with higher weight.
Fixes: http://tracker.ceph.com/issues/5249
Signed-off-by: Kefu Chai <kchai@redhat.com>
It's still sort of awkward to prefix these commands
with "mgr tell" but this makes them at least
somewhat accessible to the average user.
Signed-off-by: John Spray <john.spray@redhat.com>
osd/PG: Add two new mClock implementations of the PG sharded operator queue
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Fixes the Coverity Scan Report:
CID 1412776 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)19. var_deref_model: Passing null pointer option_name to operator <<, which dereferences it.
Fixed the review comments too in this commit.
Signed-off-by: Jos Collin <jcollin@redhat.com>
Fixes the coverity scan report:
1412839 Uninitialized pointer field
CID 1412839 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)2. uninit_member: Non-static
class member array is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Jos Collin <jcollin@redhat.com>
there is chance that other pieces of application loads PK11 module
already and does not finalize it before calling common_init_finish().
also, upon fork, PK11 module resets its entire status including `nsc_init`,
by which PK11 module tell if it is initialized or not. so the behavior
of NSS_InitContext() could be different before and after fork. that's
another reason to ignore CKR_CRYPTOKI_ALREADY_INITIALIZED error (see
NSS_GetError()).
Fixes: http://tracker.ceph.com/issues/19741
Signed-off-by: Kefu Chai <kchai@redhat.com>
While highly unlikely in normal fs operations, draining an
entire filesystem could induce a recursion on object delete/unref,
when provoked by cohort_lru::drain().
Adjusted to use intrusive slist rather than std:vector<T*>
after review.
Found running librgw_file_nfsns unit test.
Fixes: http://tracker.ceph.com/issues/20374
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Doing this every 100 entries could be after 100MB of reads. There's
little cost to reset this, so remove the option for configuring it.
This reduces the likelihood of crashing the osd due to too many omap
values on an object.
Fixes: http://tracker.ceph.com/issues/20375
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Create an mClock priority queue, which can in turn be used for two new
implementations of the PG shards operator queue. The first
(mClockOpClassQueue) prioritizes operations based on which class they
belong to (recovery, scrub, snaptrim, client op, osd subop). The
second (mClockClientQueue) also incorporates the client identifier, in
order to promote fairness between clients.
In addition, also remove OpQueue's remove_by_filter and all possible
associated subclass implementations and tests.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
This only useful for bl is bufferlist::page_aligned_appender. Using
this function can remove memcopy for continue ptrs.
Because page_aligned_appender::flush will split a ptr into two or
more ptrs. For this case, rebuild_aligned_size_and_memory can't handle,
it will rebuild.
For example
a=bl.get_page_aligned_appender(1);
a.append(3K)
a.flush();
t.claim_append(bl);
a.append(1K);
a.flush();
t.claim_append(bl);
dst.claim_append(t);
//3K and 1K ptr are continue in memory. But they are two ptrs..
dst.is_aligned_size_and_memory(4096,4096) is false.
We add new function claim_append_piecewise() to specially
handle this case.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Make an incompat change here with a release note since
this only affects pool creation, a rare event, and folks
who have customized their configs (also rare).
Keep it simple: a config sets the default rule, or else we pick
the first TYPE_REPLICATED pool in the crush map.
Signed-off-by: Sage Weil <sage@redhat.com>
Store a random value up to the filestore_split_rand_factor for each
collection when it is created or apply-layout-settings is run. This
should help distribute the load of splitting directories across a
longer period of time.
Fixes: http://tracker.ceph.com/issues/15835
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
osd: heartbeat with packets large enough to require working jumbo frames
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Haomai Wang <haomai@xsky.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
We get periodic reports that users somehow misconfigure one of their switches
so that it drops jumbo frames, yet the servers are still passing them along. In
that case, MOSDOp messages generally don't get through because they are much
larger than the 1500-byte non-jumbo limit, but the MOSDPing messages have kept
going (as they are very small and dispatched independently, even when the
server is willing to make jumbo frames). This means peer OSDs won't mark down
the ones behind the broken switch, despite all IO hanging.
Push the MOSDPing message size over the 1500-byte limit so that anybody in
this scenario will see the OSDs stuck behind a bad switch get marked down.
Fixes: http://tracker.ceph.com/issues/20087
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Fixed:
** CID 717210: Uninitialized members (UNINIT_CTOR)
ceph/src/common/LogEntry.h: 70 in LogEntryKey::LogEntryKey()()
Non-static class member "_hash" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Jos Collin <jcollin@redhat.com>
This patch allow us to add batch OSDs(e.g., from a specific host which is currently in maintenance)
into a specific nodown/noout list, which can not be automatically marked down/out
and hence can be eliminated from data migration.
This has the same effect with the global nodown/noout flag but is more fine-grained.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Devote 40% to kv (rocksdb), 50% to metadata (onodes etc), 10% to data.
Note that if we don't consume the data portion (e.g., no cache hints) that
the onode metadata will "borrow" that space.
Signed-off-by: Sage Weil <sage@redhat.com>
"[Clock is] Similar to NewLRUCache, but create a cache based on CLOCK
algorithm with better concurrent performance in some cases. See
[cache]/clock_cache.cc for more detail."
Signed-off-by: Sage Weil <sage@redhat.com>
- Some of the errno conversions were lost in the conversion
to Cmake config
- rename ceph_to_host_errno to _ceph_to_hostos_errno
to indicate that the errnos are define per OS.
This is the second part, creating the basics.
Next it needs to be glued into communication with the peers.
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>