5. MLX5 vDPA driver
The MLX5 vDPA (vhost data path acceleration) driver library (librte_pmd_mlx5_vdpa) provides support for Mellanox ConnectX-6, Mellanox ConnectX-6 Dx and Mellanox BlueField families of 10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in SR-IOV context.
Due to external dependencies, this driver is disabled in default
configuration of the “make” build. It can be enabled with
CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=y or by using “meson” build system which
will detect dependencies.
For security reasons and robustness, this driver only deals with virtual memory addresses. The way resources allocations are handled by the kernel, combined with hardware specifications that allow to handle virtual memory addresses directly, ensure that DPDK applications cannot access random physical memory (or memory that does not belong to the current process).
The PMD can use libibverbs and libmlx5 to access the device firmware or directly the hardware components. There are different levels of objects and bypassing abilities to get the best performances:
- Verbs is a complete high-level generic API
- Direct Verbs is a device-specific API
- DevX allows to access firmware objects
- Direct Rules manages flow steering at low-level hardware layer
Enabling librte_pmd_mlx5_vdpa causes DPDK applications to be linked against libibverbs.
A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5
driver but not in parallel. Hence, the user should decide the driver by the
class parameter in the device argument list.
By default, the mlx5 device will be probed by the net/mlx5 driver.
5.2. Supported NICs
- Mellanox® ConnectX®-6 200G MCX654106A-HCAT (2x200G)
- Mellanox® ConnectX®-6 Dx EN 25G MCX621102AN-ADAT (2x25G)
- Mellanox® ConnectX®-6 Dx EN 100G MCX623106AN-CDAT (2x100G)
- Mellanox® ConnectX®-6 Dx EN 200G MCX623105AN-VDAT (1x200G)
- Mellanox® BlueField SmartNIC 25G MBF1M332A-ASCAT (2x25G)
- Mellanox OFED version: 5.0 see MLX5 poll mode driver guide for more Mellanox OFED details.
5.3.1. Compilation options
These options can be modified in the
Toggle compilation of librte_pmd_mlx5 itself.
Build PMD with additional code to make it loadable without hard dependencies on libibverbs nor libmlx5, which may not be installed on the target system.
In this mode, their presence is still required for it to run properly, however their absence won’t prevent a DPDK application from starting (with
CONFIG_RTE_BUILD_SHARED_LIBdisabled) and they won’t show up as missing with
It works by moving these dependencies to a purpose-built rdma-core “glue” plug-in which must either be installed in a directory whose name is based on
-glueif set, or in a standard location for the dynamic linker (e.g.
/lib) if left to the default empty string (
This option has no performance impact.
Embed static flavor of the dependencies libibverbs and libmlx5 in the PMD shared library or the executable static binary.
For BlueField, target should be set to
CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD and set
RTE_CACHE_LINE_SIZE to 64. Default armv8a configuration of make build and
meson build set it to 128 then brings performance degradation.
5.3.2. Run-time configuration
- ethtool operations on related kernel interfaces also affect the PMD.
126.96.36.199. Driver options
Select the class of the driver that should probe the device. vdpa for the mlx5 vDPA driver.
- 0, Completion queue scheduling will be managed by a timer thread which automatically adjusts its delays to the coming traffic rate.
- 1, Completion queue scheduling will be managed by a timer thread with fixed delay time.
- 2, Completion queue scheduling will be managed by interrupts. Each CQ burst arms the CQ in order to get an interrupt event in the next traffic burst.
- Default mode is 0.
Per mode micro-seconds parameter - relevant only for event mode 0 and 1:
- 0, A nonzero value to set timer step in micro-seconds. The timer thread dynamic delay change steps according to this value. Default value is 1us.
- 1, A nonzero value to set fixed timer delay in micro-seconds. Default value is 100us.
A nonzero value defines the traffic off time, in seconds, that moves the driver to no-traffic mode. In this mode the timer events are stopped and interrupts are configured to the device in order to notify traffic for the driver. Default value is 2s.