While we don't allow setting quotas on the global
filesystem root, the inode which is root from the
client mount's point of view might have a quota.
Signed-off-by: John Spray <john.spray@redhat.com>
Generalise the path traversal into check_quota_condition
and then call that from each of file_exceeded, bytes_exceeded,
bytes_approaching with the appropriate lambda function.
Motivated by fixing the path traversal and wanting to
only fix it in one place.
Signed-off-by: John Spray <john.spray@redhat.com>
look at "after giving up session lock it can be freed at any time by response handler" in _op_submit,
so the _op_submit_with_budget::op maybe is wild after call _op_submit.
Fixes: #13208
Signed-off-by: Ruifeng Yang <yangruifeng.09209@h3c.com>
1. We need to return ENOSPC *before* we apply our side-effects to the obc
cache in finish_ctx().
2. Consider object count changes too, not just bytes.
3. Consider cluster full flag, not just the pool flag.
4. Reply only if FULL_TRY; silently drop ops that were sent despite the
full flag.
Fixes: #13098
Signed-off-by: Sage Weil <sage@redhat.com>
This is buggy (we reset the pool epoch each time it remains full instead
of only on the initial transition from non-full to full) and now unused.
Signed-off-by: Sage Weil <sage@redhat.com>
1. The current pool_last_map_marked_full tracking is buggy.
2. We need to recheck this each time we consider the op, not just when it
is received off the wire. Otherwise, we might get a message, queue it
for some reason, get a map indicating the cluster or pool is full, and
then requeue and process the op instead of discarding it.
3. For now, silently drop ops when failsafe check fails. This will lead to
stalled client IO. This needs a more robust fix.
Signed-off-by: Sage Weil <sage@redhat.com>
This currenty only applies to the MDS. Eventually we can remove the
OSD MDS checks once we are confident all MDS instances are new enough
to set this flag.
Signed-off-by: Sage Weil <sage@redhat.com>
FULL_TRY = try to do the op; ENOSPC if it results in a net increase is
space. client must cope.
FULL_FORCE = do the op even if it consumes space. The MDS will use this.
We should restrict this based on the OSD cap (* vs w, probably).
Signed-off-by: Sage Weil <sage@redhat.com>
Hammer is sloppy about the hobject_t's it uses for the scrub bounds in that
the pool isn't set. (Hammer FileStore doesn't care, but post-hammer is
much more careful about this sort of thing.)
Compensate by setting the pool on any scrub messages we receive.
Signed-off-by: Sage Weil <sage@redhat.com>
These are relatively expensive (we grab the full map from the mon) so we
should avoid duplicating our requests.
Track which requests are in flight. Only send a new request when new
maps are asked for. Resend requests when there is a new mon session.
Signed-off-by: Sage Weil <sage@redhat.com>
Rare outside of vstart clusters, but if someone did
ever have one of these events in their journal and
try to update to latest ceph, they would end up
with bogus expire_pos on the reformatted events.
Signed-off-by: John Spray <john.spray@redhat.com>
When running make distdir=ceph-9.0.3-1870-gfd861bb dist, a few files
have names longer than 99 characters and discarded, which then causes
the resulting tarbal to be incomplete:
tar: ceph-9.0.3-1870-gfd861bb/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index_internal.cc: file name is too long (max 99); not dumped
tar: ceph-9.0.3-1870-gfd861bb/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index_internal.h: file name is too long (max 99); not dumped
Use the tar-ustar format instead of the legacy v7
format (http://www.gnu.org/software/automake/manual/automake.html#Options). It
is unlikely machines with a C++11 compiler also have an antique tar
binary that would only support v7.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The only field actually relevant from this structure is .begin, which
duplicates the information in hit_set_start_stamp less well. The problem
is that the starting point of the currently open hit set is ephemeral
state which shouldn't go into the pg_info_t structure.
This also caused 13185 since pg_info_t.hit_set.current_info gets default
constructed with use_gmt = true regardless of the pool setting. This
becomes a problem in hit_set_persist since the oid is generated using
the pool setting, rather than the use_gmt value in current_info which
is placed into the history list. That discrepancy then causes a crash
in hit set trim. There would also be a related bug if the pool setting
is changed between when current_info is constructed and when it is
written out.
Since current_info isn't actually useful, I'm removing it so that we
don't later rely on invalid fields.
Fixes: 13185
Signed-off-by: Samuel Just <sjust@redhat.com>
Currently we already do a small write when the *first* election in
a round happens (to update the election epoch). If the backend
happens to fail while we are already in the midst of elections,
however, we may continue to call elections without verifying we
are still writeable.
Signed-off-by: Sage Weil <sage@redhat.com>
Do this globally intead of relying on teh zillion mon callers to
check the error code. There are no cases where we want to
tolerate a commit failure.
Fixes: #13089
Signed-off-by: Sage Weil <sage@redhat.com>