In certain cases, errors raised in mgr modules don't actually result in a
proper traceback in the mgr log; all you see is a message like "'Hello'
object has no a ttribute 'dneasdfasdf'", but you have no idea where that
came from, which is a complete PITA to debug.
Here's what's going on: handle_pyerror() calls PyErr_Fetch() to get
information about the error that occurred, then passes that information
back to python's traceback.format_exception() function to get the traceback.
If we write code in an mgr module that explicitly raises an exception
(e.g.: 'raise RuntimeError("that didn't work")'), the error value returned
by PyErr_Fetch() is of type RuntimeError, and traceback.format_exception()
does the right thing. If however we accidentally write code that's just
broken (e.g.: 'self.dneasdfasdf += 1'), the error value returned is not
an actual exception, it's just a string. So traceback.format_exception()
freaks out with something like "'str' object has no attribute '__cause__'"
(which we don't actually ever see in the logs), which in turn dumps us in a
"catch (error_already_set const &)" block, which just prints out the
single line error string.
https://docs.python.org/3/c-api/exceptions.html#c.PyErr_NormalizeException
tells us that "Under certain circumstances, the values returned by
PyErr_Fetch() below can be “unnormalized”, meaning that *exc is a class
object but *val is not an instance of the same class.". And that's exactly
the problem we're having here. We're getting a 'str', not an Exception.
Adding a call to PyErr_NormalizeException() turns the value back into a
proper Exception type and traceback.format_exception() now always does the
right thing.
I've also added calls to peek_pyerror() in the catch blocks, so if anything
else ever somehow causes traceback.format_exception to fail, we'll at least
have an idea of what it is in the log.
Fixes: https://tracker.ceph.com/issues/44799
Signed-off-by: Tim Serong <tserong@suse.com>
The systemd unit file is shared with non-ceph daemons, which (1) don't
need the /var/run directory, and (2) are based on a uid/gid from a
different container image, which means we can't figure out the right
ceph uid/gid from them to set the ownership properly.
Instead, put it in the unit.run file... and only for ceph daemons when
we have the uid/gid we need.
Fixes: https://tracker.ceph.com/issues/44894
Signed-off-by: Sage Weil <sage@redhat.com>
This if the first draft of the ceph-iscsi in cephadm.
There are a few gotchas when running `rbd-target-api` in a container:
1. We need both the ceph.conf and iscsi-gateway.cfg, so needed to
ability to pass extra config. This latter is based off the spec, so now
the daemon config func api allows you to return a dict of configs:
{ 'config': '<str>' # will be appended to the ceph.conf
'<conf name>': 'str', # Will be dumped in datadir/<conf name>
...
}
It will be up to cephadm to know to bind mount it to the right location.
The first 'config' isn't used by this patch, but makes it possible for
specs or config funcs to append anything? maybe it's overkill.
2. We need the kernel's configfs in the container so we can configure
LIO. There is a chicken and egg problem, configfs isn't mounted on the
host to bind mount when the container starts. So now a check is added to
the `unit.run` and cleanup in the `unit.poststop` scripts for
daemon_type iscsi.
3. rbd-target-api is python and hardcodes a few things, like logging
through `/dev/log` which happens to be a domain socket. So `/dev/log`
also needed to be bind mounted into the continer.
4. The daemon expects the keyring to be in `/etc/ceph` so this needed to
be specifically bind mounted to the correct location too.
As this currently stands this is deploying and starting the api on port
5000, so seems to be "working", also gateway.conf does exist in the
pool. I have yet to set up an iscsi device, but will test that next.
The `rbd-target-api` daemon expects the ssl key and cert to be named a
certain name in the contianer. So SSL isn't working yet. However, I do
hav a PR in ceph-iscsi to look in the mon config-key store for them[0].
[0] - https://github.com/ceph/ceph-iscsi/pull/173
Signed-off-by: Matthew Oliver <moliver@suse.com>
The src/cephadm/tox.ini and src/pybind/mgr/tox.ini both don't run
on older versions of tox.
When using tox 2.9.1 both fail for different reasons.
`src/cephadm/tox.ini` fails because `skipsdist=true` only works if it's
directly under the `[tox]` section.
`src/pybind/mgr/tox.ini` fails because older versions of tox can't find
the requirements.txt because they don't like whitespace between the `-r`
and `requirements.txt`.
This patch changes the tox.ini's to be backwards compatible for those
who happen to be running slightly older version of tox.
Signed-off-by: Matthew Oliver <moliver@suse.com>
Allow cephadm to start up with roles like:
roles:
- - host.a
- client.0
- osd.0
- osd.1
- - host.b
- osd.2
- osd.3
Cephadm will pick the mon names (based on host) and provision all
services by default.
The cephadm task can still provision other daemons, but it may
fight with mgr/cephadm.
Signed-off-by: Sage Weil <sage@redhat.com>
mgr/cephadm: revert trivial_completion for nfs_add
Reviewed-by: Matthew Oliver <moliver@suse.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
rgw: Disable prefetch of entire head object when GET request with ran…
Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
Reviewed-by: Mark Kogan <mkogan@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
ngx-bootstrap now requires BrowserAnimationsModule, so it has to be imported
in each unit test that imports ngx-bootstrap modules.
Fixes: https://tracker.ceph.com/issues/44854
Signed-off-by: Tiago Melo <tmelo@suse.com>
The following updates required code style changes:
- TSLint updated the logic of ordering imports.
- Prettier improved when to break a command chain into multiple lines.
Fixes: https://tracker.ceph.com/issues/44854
Signed-off-by: Tiago Melo <tmelo@suse.com>
verify whether min_size is recalculated when osd
pool size is changed.
fixes: https://tracker.ceph.com/issues/44862
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
currently `osd pool set size` only modifies min_size when it is above new size,
while it should be recalculated unconditionally.
fixes: https://tracker.ceph.com/issues/44862
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
it happens if a client or an peer osd drops the connection, so it's not
an error and hence we should not print this error message using
"error()".
Signed-off-by: Kefu Chai <kchai@redhat.com>
We already have a name for it: a Service Specification of
type `osd`. We don't need to introduce a new name for it.
Well, they are "DriveGroups", but users don't need to know it.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
This commit adds a call to `ceph-facts` role in the first play of this
playbook. This is needed so `ceph-validate` won't fail because of
following error:
```
fatal: [osd0]: FAILED! => {}
MSG:
'osd_pool_default_size' is undefined
```
`osd_pool_default_size` is set in ceph-facts.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
No lockers are obtained, ImageNotFound exception will be output,
but tht image is always exist.when lockers number is zero,
Should not output any exceptions。
Fixes: https://tracker.ceph.com/issues/44613
Signed-off-by: zhangdaolong <zhangdaolong@fiberhome.com>
The owner of "changcheng.liu@aliyun.com" is an employee of Intel.
Update info for comming statistic.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>