5. MLX5 vDPA driver

The MLX5 vDPA (vhost data path acceleration) driver library (librte_vdpa_mlx5) provides support for Mellanox ConnectX-6, Mellanox ConnectX-6 Dx and Mellanox BlueField families of 10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in SR-IOV context.

Note

This driver is enabled automatically when using “meson” build system which will detect dependencies.

5.1. Design

For security reasons and robustness, this driver only deals with virtual memory addresses. The way resources allocations are handled by the kernel, combined with hardware specifications that allow to handle virtual memory addresses directly, ensure that DPDK applications cannot access random physical memory (or memory that does not belong to the current process).

The PMD can use libibverbs and libmlx5 to access the device firmware or directly the hardware components. There are different levels of objects and bypassing abilities to get the best performances:

  • Verbs is a complete high-level generic API

  • Direct Verbs is a device-specific API

  • DevX allows to access firmware objects

  • Direct Rules manages flow steering at low-level hardware layer

Enabling librte_vdpa_mlx5 causes DPDK applications to be linked against libibverbs.

A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5 driver but not in parallel. Hence, the user should decide the driver by the class parameter in the device argument list. By default, the mlx5 device will be probed by the net/mlx5 driver.

5.2. Supported NICs

  • Mellanox® ConnectX®-6 200G MCX654106A-HCAT (2x200G)

  • Mellanox® ConnectX®-6 Dx EN 25G MCX621102AN-ADAT (2x25G)

  • Mellanox® ConnectX®-6 Dx EN 100G MCX623106AN-CDAT (2x100G)

  • Mellanox® ConnectX®-6 Dx EN 200G MCX623105AN-VDAT (1x200G)

  • Mellanox® BlueField SmartNIC 25G MBF1M332A-ASCAT (2x25G)

5.3. Prerequisites

5.3.1. Compilation option

The meson option ibverbs_link is shared by default, but can be configured to have the following values:

  • dlopen

    Build PMD with additional code to make it loadable without hard dependencies on libibverbs nor libmlx5, which may not be installed on the target system.

    In this mode, their presence is still required for it to run properly, however their absence won’t prevent a DPDK application from starting (with DPDK shared build disabled) and they won’t show up as missing with ldd(1).

    It works by moving these dependencies to a purpose-built rdma-core “glue” plug-in which must be installed in a directory whose name is based on RTE_EAL_PMD_PATH suffixed with -glue.

    This option has no performance impact.

  • static

    Embed static flavor of the dependencies libibverbs and libmlx5 in the PMD shared library or the executable static binary.

Note

Default armv8a configuration of meson build sets RTE_CACHE_LINE_SIZE to 128 then brings performance degradation.

5.3.2. Run-time configuration

  • ethtool operations on related kernel interfaces also affect the PMD.

5.3.2.1. Driver options

  • class parameter [string]

    Select the class of the driver that should probe the device. vdpa for the mlx5 vDPA driver.

  • event_mode parameter [int]

    • 0, Completion queue scheduling will be managed by a timer thread which automatically adjusts its delays to the coming traffic rate.

    • 1, Completion queue scheduling will be managed by a timer thread with fixed delay time.

    • 2, Completion queue scheduling will be managed by interrupts. Each CQ burst arms the CQ in order to get an interrupt event in the next traffic burst.

    • Default mode is 1.

  • event_us parameter [int]

    Per mode micro-seconds parameter - relevant only for event mode 0 and 1:

    • 0, A nonzero value to set timer step in micro-seconds. The timer thread dynamic delay change steps according to this value. Default value is 1us.

    • 1, A value to set fixed timer delay in micro-seconds. Default value is 0us.

  • no_traffic_time parameter [int]

    A nonzero value defines the traffic off time, in polling cycle time units, that moves the driver to no-traffic mode. In this mode the polling is stopped and interrupts are configured to the device in order to notify traffic for the driver. Default value is 16.

  • event_core parameter [int]

    CPU core number to set polling thread affinity to, default to control plane cpu.

  • hw_latency_mode parameter [int]

    The completion queue moderation mode:

    • 0, HW default.

    • 1, Latency is counted from the first packet completion report.

    • 2, Latency is counted from the last packet completion.

  • hw_max_latency_us parameter [int]

    • 1 - 4095, The maximum time in microseconds that packet completion report can be delayed.

    • 0, HW default.

  • hw_max_pending_comp parameter [int]

    • 1 - 65535, The maximum number of pending packets completions in an HW queue.

    • 0, HW default.

5.3.2.2. Devargs example

  • PCI devargs:

    -a 0000:03:00.2,class=vdpa
    
  • Auxiliary devargs:

    -a auxiliary:mlx5_core.sf.2,class=vdpa
    

5.3.2.3. Error handling

Upon potential hardware errors, mlx5 PMD try to recover, give up if failed 3 times in 3 seconds, virtq will be put in disable state. User should check log to get error information, or query vdpa statistics counter to know error type and count report.