From 8d4e1e44c7cdde5b91f53edcaf32b770a0208743 Mon Sep 17 00:00:00 2001 From: Chunsong Feng Date: Mon, 13 Dec 2021 10:29:52 +0000 Subject: [PATCH] doc/dev: add dpdkstack doc Add the description of DPDKStack development, debugging, optimization and plan. Fixes: https://tracker.ceph.com/issues/53432 Signed-off-by: Chunsong Feng Reviewed-by: luo rixin Reviewed-by: Han Fengzhe --- doc/dev/dpdk.rst | 110 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 doc/dev/dpdk.rst diff --git a/doc/dev/dpdk.rst b/doc/dev/dpdk.rst new file mode 100644 index 00000000000..79403ef0eb3 --- /dev/null +++ b/doc/dev/dpdk.rst @@ -0,0 +1,110 @@ +========================= +Ceph messenger DPDKStack +========================= + +Compiling DPDKStack +=================== +Ceph dpdkstack is not compiled by default.Therefore,you need to recompile and +enable the DPDKstack component. +Install dpdk and dpdk-devel,and compile do_cmake.sh -DWITH_DPDK=ON. + +Setting the DPDK Network Adapter +================================ +Most mainstream NICs support SR-IOV and can be virtualized into multiple VF NICs. +Each OSD uses some dedicated NICs through DPDK. The mon,mgr,and client use the PF NICs +through the POSIX protocol stack. + +Load the driver on which DPDK depends: +modprobe vfio +modprobe vfio_pci + +Configure Hugepage +vm.nr_hugepages = xxx + +Configure the number of VFs based on the number of OSDs: +echo $numvfs > /sys/class/net/$port/device/sriov_numvfs + +Binding NICs to DPDK Applications: +dpdk-devbind.py -b vfio-pci 0000:xx:yy.z + + +Configuring OSD DPDKStack +========================== +The DPDK RTE initialization process requires the root permission. +Therefore,you need to grant the root permission to ceph. +modify /etc/passwd to give ceph user root privilege and /var/run folder write: +ceph:x:0:0:Ceph storage service:/var/lib/ceph:/bin/false:/var/run + +The OSD selects the NICs using ms_dpdk_devs_allowlist: +1)Configure a single NIC. +ms_dpdk_devs_allowlist=-a 0000:7d:010 or ms_dpdk_devs_allowlist=--allow=0000:7d:010 +2)Configure the Bond Network Adapter +ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6 +--vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6 + +DPDK-related configuration items are as follows: +[osd] +ms_type=async+dpdk +ms_async_op_threads=1 + +ms_dpdk_port_id=0 +ms_dpdk_gateway_ipv4_addr=172.19.36.1 +ms_dpdk_netmask_ipv4_addr=255.255.255.0 +ms_dpdk_hugepages=/dev/hugepages +ms_dpdk_hw_flow_control=false +ms_dpdk_lro=false +ms_dpdk_enable_tso=false +ms_dpdk_hw_queue_weight=1 +ms_dpdk_memory_channel=2 +ms_dpdk_debug_allow_loopback = true + +[osd.x] +ms_dpdk_coremask=0xf0 +ms_dpdk_host_ipv4_addr=172.19.36.51 +public_addr=172.19.36.51 +cluster_addr=172.19.36.51 +ms_dpdk_devs_allowlist=--allow=0000:7d:01.1 + +Debug and Optimization +====================== +Locate faults based on logs and adjust logs to a proper level: +debug_dpdk=xx +debug_ms=xx +if the log contains a large number of retransmit messages,reduce the value of ms_dpdk_tcp_wmem. + +Run the perf dump command to view DPDKStack statistics: +ceph daemon osd.$i perf dump | grep dpdk +if the "dpdk_device_receive_nombuf_errors" keeps increasing,check whether the +throttling exceeds the limit: +ceph daemon osd.$i perf dump | grep throttle-osd_client -A 7 | grep "get_or_fail_fail" +ceph daemon osd.$i perf dump | grep throttle-msgr_dispatch_throttler -A 7 | grep "get_or_fail_fail" +if the throttling exceeds the threshold,increase the throttling threshold or +disable the throttling. + +Check whether the network adapter is faulty or abnormal.Run the following +command to obtain the network adapter status and statistics: +ceph daemon osd.$i show_pmd_stats +ceph daemon osd.$i show_pmd_xstats + +Some DPDK versions(eg.dpdk-20.11-3.e18.aarch64)or NIC TSOs are abnormal, +try disabling tso: +ms_dpdk_enable_tso=false + +if VF NICs support multiple queues,more NIC queues can be allocated to a +single core to improve performance: +ms_dpdk_hw_queues_per_qp=4 + + +Status and Future Work +====================== +Compared with POSIX Stack,In the multi-concurrency test,DPDKStack has the same +4K random write performance,8K random write performance is improved by 28%,and +1 MB packets are unstable.In the single-latency test,the 4K and 8K random write +latency is reduced by 15%(the lower the latency is,the better). + +At a high level,our future work plan is: +OSD multiple network support (public network and cluster network) +The public and cluster network adapters can be configured.When connecting or +listening,the public or cluster network adapters can be selected based on the +IP address.During msgr-work initialization,initialize both the public and cluster +network adapters and create two DPDKQueuePairs.