In contrast to ceph-osd crimson sends CEPH_WATCH_EVENT_DISCONNECT directly
from the timeout handler and after CEPH_WATCH_EVENT_NOTIFY_COMPLETE.
This simplifies the Watch::remove() interface as callers aren't obliged
anymore to decide whether EVENT_DISCONNECT needs to be send or not -- it
becomes an implementation detail of Watch.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This will allow in a moment to get rid of the dependency on
`MOSDOp` on all paths of `PG::do_osd_ops_execute()`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Before this commit the method was depending on `MOSDOp::get_min_epoch()`
to start an `UrgentRecovery`. However, it seems `PG::get_osdmap_epoch()`
would be sufficient here as the very early stages of the processing
in `ClientRequest` ensure the PG fits the `get_min_epoch()` requirement.
In the classical OSD the counterpart code looks like below:
```
int PrimaryLogPG::rep_repair_primary_object(const hobject_t& soid, OpContext *ctx)
{
// ...
queue_peering_event(
PGPeeringEventRef(
std::make_shared<PGPeeringEvent>(
get_osdmap_epoch(),
get_osdmap_epoch(),
PeeringState::DoRecovery())));
return -EAGAIN;
}
```
In addition to the dependency minimalisation, the commits reformats
the code around `PG::repair_object()` to fit our guidelines.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
If the execution of an `OSDOp` fails, we're left with potentially
altered `ObjectContext`. We deal with that by reloading `obc` if
there was any modification to it. To figure this out, `has_seen_write()`
on `OpsExecuter` is being called. Unfortunately, the current impl.
has following drawbacks:
* `has_seen_write()` can be called after `std::move(ox).flush_...()`
which is very inelegant;
* it requires catching both `ObjectContext` and `OpsExecuter` while
the latter already references the former;
* there is no explicitly given reason in the header for justifying
the presence of `has_seen_writes()`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This commit brings `PG::do_osd_ops_execute()` a subset of
`PG::do_osd_ops()`; it handles the ops execution through
`OpsExecuter` and the `submit_transaction()` but it stays
indepedent from `MOSDOp` and `MOSDOpReply`. This trait
facilitates the `InternalClientRequest`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
THe reason is unification of infrastructure between external
client requests (everything represented by the `ClientRequest`)
and internal requests.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This commit introduces a `ObjectContext`-taking variant of
`PG::with_locked_obc()`. The upcoming internal counterpart
for the `ClientRequest` is the intended audience.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Taken with "crimson/osd: use obc->get_oid() instead of passing
hobject_t around" and enriched with the move-constructing down
the `ObjectState` path this should allows to save some work in
e.g. `std::string` instances that are part of the `hobject_t`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This will necessary to spawn the upcoming `InternalClientRequest`
from the `Watch`'s timeout handler.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
mgr/dashboard: fix base-href: revert it to previous approach
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
the typical use case of get_val() passes a literal string as the key,
in that case, there is no need to create a std::string. as
md_config_t::get_val() always accepts a string_view as the option name.
Signed-off-by: Kefu Chai <kchai@redhat.com>
use mclock_scheduler as the default scheduler
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sridhar Seshasayee <sseshasa@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
An attempt to `Connection::do_auth()` may finish in one of three states:
_success_, _failure_ and _cancellation_. Unfortunately, its callers were
missing the third treating cancellation like a failure. This was the root
cause of the following failure at Sepia:
```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-06_22:08:43-rados-master-distro-basic-smithi/6102605$ less ./remote/smithi204/log/ceph-osd.3.log.gz
...
WARN 2021-05-06 22:35:40,464 [shard 0] osd - ms_handle_reset
...
INFO 2021-05-06 22:35:40,465 [shard 0] monc - do_auth_single: connection closed
INFO 2021-05-06 22:35:40,465 [shard 0] ms - [osd.3(client) v2:172.21.15.204:6808/31418@57568 >> mon.? v2:172.21.15.204:3300/0] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:6, protocol aborted)
...
ERROR 2021-05-06 22:35:40,465 [shard 0] osd - mon.osd.3 dispatch() ms_handle_reset caught exception: std::system_error (error crimson::net:3, negotiation failure)
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3909-g81233a18/rpm/el8/BUILD/ceph-17.0.0-3909-g81233a18/src/crimson/common/gated.h:36: crimson::common::Gated::dispatch(const char*, T&, Func&&) [with Func = crimson::mon::Client::ms_handle_reset(crimson::net::ConnectionRef, bool)::<lambda()>&; T = crimson::mon::Client]::<lambda(std::__exception_ptr::exception_ptr)>: Assertion `*eptr.__cxa_exception_type() == typeid(seastar::gate_closed_exception)' failed.
Aborting on shard 0.
Backtrace:
0# 0x00005618C973932F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F7BB592EB20 in /lib64/libpthread.so.0
4# gsignal in /lib64/libc.so.6
5# abort in /lib64/libc.so.6
6# 0x00007F7BB3F29B09 in /lib64/libc.so.6
7# 0x00007F7BB3F37DE6 in /lib64/libc.so.6
8# 0x00005618C9FF295C in ceph-osd
9# 0x00005618C3907313 in ceph-osd
10# 0x00005618CCA2F84F in ceph-osd
11# 0x00005618CCA34D90 in ceph-osd
12# 0x00005618CCBEC9BB in ceph-osd
13# 0x00005618CC744E9A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
daemon-helper: command crashed with signal 6
```
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
In `crimson/osd/main.cc` we instruct Seastar to handle `SIGHUP`.
```
// just ignore SIGHUP, we don't reread settings
seastar::engine().handle_signal(SIGHUP, [] {})
```
This happens using the Seastar's signal handling infrastructure
which is incompliant with the alien world.
```
void
reactor::signals::handle_signal(int signo, noncopyable_function<void ()>&& handler) {
// ...
struct sigaction sa;
sa.sa_sigaction = [](int sig, siginfo_t *info, void *p) {
engine()._backend->signal_received(sig, info, p);
};
// ...
}
```
```
extern __thread reactor* local_engine;
extern __thread size_t task_quota;
inline reactor& engine() {
return *local_engine;
}
```
The low-level signal handler above assumes `local_engine._backend`
is not null which stays true only for threads from the S*'s world.
Unfortunately, as we don't block the `SIGHUP` for alien threads,
kernel is perfectly authorized to pick up one them to run the handler
leading to weirdly-looking segfaults like this one:
```
INFO 2021-04-23 07:06:57,807 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:58,753 [shard 0] ms - [osd.1(client) v2:172.21.15.100:6802/30478@51064 >> mgr.4105 v2:172.21.15.109:6800/29891] --> #7 === pg_stats(0 pgs seq 55834574872 v 0) v2 (87)
...
INFO 2021-04-23 07:06:58,813 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:59,753 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO 2021-04-23 07:06:59,753 [shard 0] osd - asok response length: 2947
INFO 2021-04-23 07:06:59,817 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:59,865 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO 2021-04-23 07:06:59,866 [shard 0] osd - asok response length: 2947
DEBUG 2021-04-23 07:07:00,020 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO 2021-04-23 07:07:00,020 [shard 0] osd - asok response length: 2947
INFO 2021-04-23 07:07:00,820 [shard 0] bluestore - stat
...
Backtrace:
0# 0x00005600CD0D6AAF in ceph-osd
1# FatalSignal::signaled(int) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F5877C7EB20 in /lib64/libpthread.so.0
4# 0x00005600CD830B81 in ceph-osd
5# 0x00007F5877C7EB20 in /lib64/libpthread.so.0
6# pthread_cond_timedwait in /lib64/libpthread.so.0
7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
8# 0x00007F5877999BA3 in /lib64/libstdc++.so.6
9# 0x00007F5877C7414A in /lib64/libpthread.so.0
10# clone in /lib64/libc.so.6
daemon-helper: command crashed with signal 11
```
Ultimately, it turned out the thread came out from a syscall (`futex`)
and started crunching the `SIGHUP` handler's code in which a nullptr
dereference happened.
This patch blocks `SIGHUP` for all threads spawned by `AlienStore`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
to silence the warning from clang. it fails to figure out that this is
actually used, and complains that this is captured but not used.
Signed-off-by: Kefu Chai <kchai@redhat.com>