Instead of hard-coding a check in ceph.conf and some reasonable
defaults, defer this work to ceph-crush-location, and allow users to
specify their own hook with alternative logic.
This can be helpful in a nubmer of cases, like:
- rack (or other) information included in hostname and easily parsed
out by a hook
- multiple types of devices in each host, resulting in 'parallel'
crush trees (e.g., one for hdd, one for ssd)
Signed-off-by: Sage Weil <sage@inktank.com>
At some point in the dumpling cycle I separated the map stage from the
send stage. We can send the creates any time we have a non-zero osdmap
epoch, and are in good shape as long as we do the map step after the
osdmap is loaded (hence the post_paxos_update).
Some background:
We originally introduced the map-but-don't send in a2fe0137, at which
point all was well because we only called it on ceph-mon startup.
Later, this turned into post_paxos_update in e635c478, at which point
it was now called by a running monitor.. but we didn't add in the
send_pg_creates(). This is where this bug stems from.
This particular path is responsible for the stalled test referenced in
bug #6673.
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
If the last item in the directory is a remote link and the corresponding
inode is not in cache, the readir reply will not contain the last item.
But iterator 'it' is equal to dir->end() in this case, it causes the 'end'
flag of the readdir reply be set to true.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
We can't adjust last_backfill to object x until x has been fully
backfilled. pending_backfill_updates contains all those backfills
started, but which have not yet been reflected in pinfo.last_update.
backfills_in_flight contains those backfills which have not yet
completed. Thus, we can adjust last_update to the largest entry
in pending_backfill_updates not in backfills_in_flight.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Subsequent updates to that object need to have their stats added
to the backfill info stats atomically with the last_backfill
update.
Signed-off-by: Samuel Just <sam.just@inktank.com>
last_backfill_started reflects what pinfo.last_backfill will be
once all currently outstanding backfills complete. backfill_pos
was tricky since we couldn't correctly inialize it without
doing the first backfill scan pair.
In recover_backfill, we rescan from last_backfill_started rather
than from backfill_pos. This ensures that we capture all clones
created between last_backfill_started and what previously had been
backfill_pos without special handling in make_writeable. The main
downside is that we will tend to "rescan" last_backfill_started.
Signed-off-by: Samuel Just <sam.just@inktank.com>
If the monitor is not currently available, this crush update would block
forever, preventing the OSD and (potentially) the rest of the system
from starting up. Instead, make it time out after 10 seconds and then
abort startup. This prevents startup of an OSD if we failed to update
the CRUSH position for some reason.
In fact, do not start up the OSD if the CRUSH update fails for any
reason--not just a timeout!
Works-around: #5612
Signed-off-by: Sage Weil <sage@inktank.com>
We need flushing the sequencer to ensure that all Contexts which hold
ObjectContextRefs have been run or deleted.
C_ReplicatedBackend_OnPullComplete, however, gets queued in a second
work queue in order to avoid performing expensive push related reads
in the FileStore finisher.
Rather than keep the objects contexts around, we instead put off
removing the object from the pulling map until the call back
fires and read the object context out of the pulling map. This
way the ObjectContextRef will be cleaned up along with the rest
of the pulling map in on_change.
Signed-off-by: Samuel Just <sam.just@inktank.com>
If we are writing to backfill_pos and create a clone, we end
up failing to send the transaction creating the clone to the
backfill peer. This is fine as long as we end up backfilling
the clone. To that end, we simply add the clone to
backfill_info and adjust backfill_pos accordingly. This is less
brittle than the waiting_for_backfill_pos mechanism since it
works even if we wait between that check and issuing the repop,
which can happen for copy_from.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We also modify recovering to hold a reference to the recovering obc
in order to ensure that our backfill_read_lock doesn't outlive the
obc.
ReplicatedPG::op_applied no longer clears repop->obc since we need
it to live until the op is finally cleaned up. This is fine since
repop->obc is now an ObjectContextRef and can clean itself up.
Signed-off-by: Samuel Just <sam.just@inktank.com>
This way, if execute_ctx is rerun on the same OpContext, we
won't erroneously reuse a stale snapset/object_info.
Signed-off-by: Samuel Just <sam.just@inktank.com>