Commit Graph

101047 Commits

Author SHA1 Message Date
Yao Zongyou
a140dba8c8 common: remove unused _STR and STRINGIFY macro
Signed-off-by: Yao Zongyou <yaozongyou@vip.qq.com>
2019-08-12 20:49:30 +08:00
Jan Fajerski
84498bfcb4
Merge pull request #29547 from jan--f/c-v-always-log-to-stdout
ceph-volume: never log to stdout, use stderr instead
2019-08-12 14:13:19 +02:00
Kefu Chai
aed743aee4
Merge pull request #29076 from tchaikov/wip-crimson-rep-op
crimson/osd: implement replicated write

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2019-08-12 18:51:46 +08:00
Kefu Chai
9759005452
Merge pull request #29378 from cyx1231st/rfc-seastar-msgr-lossless-master
crimson/net: lossless policy for v2 protocol

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:50:40 +08:00
Kefu Chai
4f7d7769c0 crimson/osd: log if the dest of send_to_osd() is not valid
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:01:46 +08:00
Kefu Chai
f95c69d0df crimson/osd: do not return "void"
to avoid potential confusions.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:01:46 +08:00
Kefu Chai
11ac2af314 crimson/osd: write pg info after done with peering
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:01:46 +08:00
Kefu Chai
a7a52efffd crimson/osd: handle MOSDRepOp
* add a `RepRequest` operation which is blocked by `ConnectionPipeline`
  and `PGPipeline`. these two pipelines are modeled after their
  counterparts of `ClientRequest`.
* add these two blockers to `PG` and `OSDConnectionPriv` accordingly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:01:46 +08:00
Kefu Chai
4add5fd47b crimson/osd: replicate transaction to peers
* handle `MOSDRepOpReply` message in osd, and pass it all the way down
  to `PGBackend`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:01:46 +08:00
Kefu Chai
e26ba3aeed crimson/osd: implement PeeringListener::on_activate()
see also f7b55ec144

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 18:01:46 +08:00
Yingxin Cheng
acca474339 crimson/net: minor logging cleanup
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:45 +08:00
Yingxin Cheng
5491bc48ae crimson/net: throw bad_peer_address when reconnect address doesn't match
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:45 +08:00
Yingxin Cheng
c41c44b2e9 crimson/net: REPLACING state to resolve racing and retain session
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:45 +08:00
Yingxin Cheng
d1cd196981 crimson/net: handle fault for READY, CONNECTING and ACCEPTING
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:45 +08:00
Yingxin Cheng
d75c9e884a crimson/net: WAIT state and backoff for client
Client goes to WAIT state when it is delayed to reconnect, or wants to
be replaced by a newly established socket.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:45 +08:00
Yingxin Cheng
8ea10a6a75 crimson/net: SERVER_WAIT state for accepting server
Server wait for peer client close the socket at SERVER_WAIT.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:45 +08:00
Yingxin Cheng
f8053d08ee crimson/net: STANDBY state for lossless server or peer
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:38 +08:00
Yingxin Cheng
dd59586ef0 crimson/net: allow REPLACING state wait for protocol exit
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:22:23 +08:00
Yingxin Cheng
49a08e8bc3 crimson/net: send AckFrame for lossless policy
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:18:15 +08:00
Yingxin Cheng
6cacf1f7b2 crimson/net: maintain the sent queue for lossless policy
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:17:43 +08:00
Yingxin Cheng
492263962c crimson/net: reset write state with reset_write()
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:52 +08:00
Yingxin Cheng
babc9c24fd crimson/net: allow connecting state reentrant
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:52 +08:00
Yingxin Cheng
675a50326c crimson/net: reset handshake status when connecting/accepting
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:52 +08:00
Yingxin Cheng
b7c7dc0b26 crimson/net: pending_q to store the pending(sending) messages
We cannot left the pending messages in the out_q, because with lossless
policy, they can be partially sent and even acknowledged.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:52 +08:00
Yingxin Cheng
4fa1c4c07d crimson/net: wait_write_exit() to wait for writer stopped
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:51 +08:00
Yingxin Cheng
b3f1e56d6c crimson/net: is_queued() to check if there's any pending writes
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:51 +08:00
Yingxin Cheng
04f8a35d79 crimson/net: fix variables for stateful connection
server_cookie, client_cookie, connect_seq and global_seq are identifiers
of a stateful connection.

We already have some related implementations, but they are stub code
when implement lossy policy and cannot work properly.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 17:02:42 +08:00
Kefu Chai
4913510173 qa/tasks/cbt.py: use "git --depth 1 for" faster clone
we don't need the full history for performing the test.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 16:58:12 +08:00
Jeegn Chen
3bfb5c2621 osd: support osd_scrub_extended_sleep
1. always take osd_scrub_sleep for manually initiated
   scrubs
2. when scrub_time_permit() return true for scheduled
   ones, the existing osd_scrub_sleep is used
3. when scrub_time_permit() return false for scheduled
   ones, there may be 2 scenarios
   3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
       let's take osd_scrub_sleep
   3.2 otherwise, let's take osd_scrub_extended_sleep

Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
2019-08-12 16:54:36 +08:00
Yingxin Cheng
469a9cda73 crimson/net: clean up, exsiting_conn and existing_proto
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 16:34:45 +08:00
Yingxin Cheng
014a662b20 crimson/net: next_step_t for explicit decision of next state
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 16:34:45 +08:00
Yingxin Cheng
b41af731b4 crimson/net: introduce 3 ways to abort the active protocol state
* abort_in_fault(): a fault is happening and needs to be handled.
* abort_protocol(): abort the current protocol state due to preemptive
                    state change.
* abort_in_close(): close this connection and abort the current protocol
                    state due to some fatal error.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2019-08-12 16:34:38 +08:00
Kefu Chai
2e2414b3df ceph-objectstore-tool: update-mon-db: do not fail if incmap is missing
there is chance that we could use an OSD which does not have incmap of a
certain epoch for rebuilding the monstore. and since OSD does not read
and store the incmap if the MOSDMap message already has the fullmap of
that fullmap, and if an OSD does not have previous fullmap, monitor
will just send it the fullmao. so it's not unusual that an OSD has
a fullmap of some epoch without corresponding incmap.

Fixes: https://tracker.ceph.com/issues/41177
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-08-12 13:06:01 +08:00
Matt Benjamin
801d2f0449
Merge pull request #28157 from Kriechi/docs-rgw-ldap
docs: improve rgw ldap auth options
2019-08-11 20:45:29 -04:00
Tianshan Qu
bc82637f54 rgw: fix list bucket with delimiter wrongly skip some special keys
list with delimiter will skip subfile with directory + after_delim_s,
but the code wrongly add after_delim_s to next marker regardless it have directory

Fixes: http://tracker.ceph.com/issues/40905

Signed-off-by: Tianshan Qu <tianshan@xsky.com>
2019-08-12 00:19:54 +08:00
Yuval Lifshitz
929c062ae9 rgw: don't throw when accept errors are happening on frontend
Signed-off-by: Yuval Lifshitz <yuvalif@yahoo.com>
2019-08-11 10:06:05 +03:00
Josh Durgin
3f18ed55aa
Merge pull request #28227 from sseshasa/monCachePriority
mon/OSDMonitor: Use generic priority cache tuner for mon caches

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-08-09 14:23:39 -07:00
Casey Bodley
bc45261470
Merge pull request #29540 from cbodley/wip-rgw-user-rename
rgw: followup for 'user rename'

Reviewed-by: Shilpa Jagannath <smanjara@redhat.com>
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2019-08-09 16:57:25 -04:00
Sage Weil
9346d3c3bc os/bluestore: do not set osd_memory_target default from cgroup limit
On the aarch64 box I'm testing, this gives us a value of
7378697629483768832, which is not what we want.

I think we are better off relying on this limit being explicitly set via
environment variables (POD_* by kuberentes/rook) or via the command line.

This partially reverts 5c6b533697, but not
all of it, since we wan to keep the option itself, as it is now used by
common/config.cc when dealing with the POD_MEMORY_LIMIT env var.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-09 12:25:59 -05:00
Casey Bodley
13f1595335
Merge pull request #29558 from theanalyst/rgw-cache-lock
rgw: fix unlock of shared lock in RGWCache

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2019-08-09 13:03:35 -04:00
Sage Weil
377fdb1484 os/bluestore: refuse to mkfs or mount if osd_max_object_size >= MAX_OBJECT_SIZE
BlueStore has its own object size limit (2^32-1).  Make sure the cluster
limit is below that or refuse to mkfs or mount.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-09 10:57:14 -05:00
Abhishek Lekshmanan
2b6dbe31c8 rgw: fix unlock of shared lock in RGWCache
similar to https://github.com/ceph/ceph/pull/29538/ we unlock a shared_lock with
unlock causing a crash. Also scope the single line if statements to make the
code more concise

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
2019-08-09 17:54:22 +02:00
Sage Weil
f011c13547 Merge PR #29292 into master
* refs/pull/29292/head:
	os/bluestore: warn on no per-pool omap
	os/bluestore: fsck: warning (not error) by default on no per-pool omap
	os/bluestore: fsck: int64_t for error count
	os/bluestore: default size of 1 TB for testing
	os/bluestore: behave if we *do* set PGMETA and PERPOOL flags
	os/bluestore: do not set both PGMETA_OMAP and PERPOOL_OMAP
	os/bluestore: fsck: only generate 1 error per omap_head
	os/bluestore: make fsck repair convert to per-pool omap
	os/bluestore: teach fsck to tolerate per-pool omap
	os/bluestore: ondisk format change to 3 for per-pool omap
	mon/PGMap: add data/omap breakouts for 'df detail' view
	osd/osd_types: separate get_{user,allocated}_bytes() into data and omap variants
	mon/PGMap: fix stored_raw calculation
	mon/PGMap: add in actual omap usage into per-pool stats
	osd: report per-pool omap support via store_statfs_t
	os/bluestore: set per_pool_omap key on mkfs
	osd/osd_types: count per-pool omap capable OSDs
	os/bluestore: report omap_allocated per-pool
	os/bluestore: add pool prefix to omap keys
	kv/KeyValueDB: take key_prefix for estimate_prefix_size()
	os/bluestore: fix manual omap key manipulation to use Onode::get_omap_key()
	os/bluestore: make omap key helpers Onode methods
	os/bluestore: add Onode::get_omap_prefix() helper
	os/bluestore: change _do_omap_clear() args

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-08-09 10:40:45 -05:00
Alfredo Deza
8363d89a4d
Merge pull request #29528 from tchaikov/wip-build-doc-with-python3
admin/build-doc: use python3

Reviewed-by: Alfredo Deza <adeza@redhat.com>
2019-08-09 11:17:19 -04:00
Sage Weil
9426974195 os/bluestore/BlueFS: fix device_migrate_to_* to handle varying alloc sizes
The previous implementation moved extents individually.  This caused
problems when moving an extent with a small alloc_size that wasn't
a multiple of the target device's alloc_size.

Instead, identify files with extents that need to be moved, and then read
the file in its entirety and rewrite it in its entirety.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-09 10:10:12 -05:00
Sage Weil
e8b5a458c3 os/bluestore/BlueFS: apply shared_alloc_size to shared device
Keep an alloc_size vector so that we have this value handy at all times.
Allow bluestore to fetch this value directly instead of looking at the
bluefs_* config options since this encapsulates things a bit better, and
also isn't vulnerable to the config setting changing at runtime.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-09 10:10:12 -05:00
Abhishek Lekshmanan
fac4ab71fb rgw: url decode PutUserPolicy params
Since these are sent as a part of a POST request which is usually urlencoded,
the json parser would later return invalid json for jsons containing whitespace

Fixes: https://tracker.ceph.com/issues/41189
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
2019-08-09 16:57:25 +02:00
Sage Weil
39db4d7c4b os/bluestore/KernelDevice: print aio error extent in hex
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-09 09:23:25 -05:00
Matt Benjamin
a7b29647fd
Merge pull request #29560 from linuxbox2/wip-rgwf-advance
rgw_file: dont deadlock in advance_mtime()
2019-08-09 10:01:03 -04:00
Sage Weil
b8501164ef os/bluestore: warn on no per-pool omap
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-09 08:21:18 -05:00