7. Odyssey ODM DMA Device Driver

The odm DMA device driver provides a poll-mode driver (PMD) for Marvell Odyssey DMA Hardware Accelerator block found in Odyssey SoC. The block supports only mem to mem DMA transfers.

ODM DMA device can support up to 32 queues and 16 VFs.

7.1. Device Setup

ODM DMA device is initialized by kernel PF driver. The PF kernel driver is part of Marvell software packages for Odyssey.

Kernel module can be inserted as in below example:

sudo insmod odyssey_odm.ko

ODM DMA device can support up to 16 VFs:

sudo echo 16 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_numvfs

Above command creates 16 VFs with 2 queues each.

The dpdk-devbind.py script, included with DPDK, can be used to show the presence of supported hardware. Running dpdk-devbind.py --status-dev dma will show all the Odyssey ODM DMA devices.

7.1.1. Devices using VFIO drivers

The HW devices to be used will need to be bound to a user-space IO driver. The dpdk-devbind.py script can be used to view the state of the devices and to bind them to a suitable DPDK-supported driver, such as vfio-pci. For example:

dpdk-devbind.py -b vfio-pci 0000:08:00.1

7.1.2. Device Probing and Initialization

To use the devices from an application, the dmadev API can be used.

Once configured, the device can then be made ready for use by calling the rte_dma_start() API.

7.1.3. Performing Data Copies

Refer to the Enqueue / Dequeue API section of the dmadev library documentation for details on operation enqueue and submission API usage.

7.1.4. Performance Tuning Parameters

To achieve higher performance, DMA device needs to be tuned using PF kernel driver module parameters.

Following options are exposed by kernel PF driver via devlink interface for tuning performance.

eng_sel

ODM DMA device has 2 engines internally. Engine to queue mapping is decided by a hardware register which can be configured as below:

/sbin/devlink dev param set pci/0000:08:00.0 name eng_sel value 3435973836 cmode runtime

Each bit in the register corresponds to one queue. Each queue would be associated with one engine. If the value of the bit corresponding to the queue is 0, then engine 0 would be picked. If it is 1, then engine 1 would be picked.

In the above command, the register value is set as 1100 1100 1100 1100 1100 1100 1100 1100 which allows for alternate engines to be used with alternate VFs (assuming the system has 16 VFs with 2 queues each).

max_load_request

Specifies maximum outstanding load requests on internal bus. Values can range from 1 to 512. Set to 512 for maximum requests in flight.:

/sbin/devlink dev param set pci/0000:08:00.0 name max_load_request value 512 cmode runtime