9. Machine Learning (ML) Device Library

The Machine Learning (ML) Device library provides a Machine Learning device framework for the management and provisioning of hardware and software ML poll mode drivers, defining an API which support a number of ML operations including device handling and inference processing. The ML model creation and training is outside of the scope of this library.

The ML framework is built on the following model:

Fig. 9.13 Work flow of inference on MLDEV

ML Device: A hardware or software-based implementation of ML device API for running inferences using a pre-trained ML model.
ML Model: An ML model is an algorithm trained over a dataset. A model consists of procedure/algorithm and data/pattern required to make predictions on live data. Once the model is created and trained outside of the DPDK scope, the model can be loaded via rte_ml_model_load() and then start it using rte_ml_model_start() API function. The rte_ml_model_params_update() can be used to update the model parameters such as weights and bias without unloading the model using rte_ml_model_unload().
ML Inference: ML inference is the process of feeding data to the model via rte_ml_enqueue_burst() API function and use rte_ml_dequeue_burst() API function to get the calculated outputs / predictions from the started model.

9.1. Design Principles

The MLDEV library follows the same basic principles as those used in DPDK’s Ethernet Device framework and the Crypto framework. The MLDEV framework provides a generic Machine Learning device framework which supports both physical (hardware) and virtual (software) ML devices as well as an ML API to manage and configure ML devices. The API also supports performing ML inference operations through ML poll mode driver.

9.2. Device Operations

9.2.1. Device Creation

Physical ML devices are discovered during the PCI probe/enumeration, through the EAL functions which are executed at DPDK initialization, based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function). ML physical devices, like other physical devices in DPDK can be allowed or blocked using the EAL command line options.

9.2.2. Device Identification

Each device, whether virtual or physical is uniquely designated by two identifiers:

A unique device index used to designate the ML device in all functions exported by the MLDEV API.
A device name used to designate the ML device in console messages, for administration or debugging purposes.

9.2.3. Device Features and Capabilities

ML devices may support different feature set. In order to get the supported PMD feature rte_ml_dev_info_get() API which return the info of the device and its supported features.

9.2.4. Device Configuration

The configuration of each ML device includes the following operations:

Allocation of resources, including hardware resources if a physical device.
Resetting the device into a well-known default state.
Initialization of statistics counters.

The rte_ml_dev_configure() API is used to configure a ML device.

int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg);

The rte_ml_dev_config structure is used to pass the configuration parameters for the ML device, for example number of queue pairs, maximum number of models, maximum size of model and so on.

9.2.5. Configuration of Queue Pairs

Each ML device can be configured with number of queue pairs. Each queue pair is configured using rte_ml_dev_queue_pair_setup()

9.2.6. Logical Cores, Memory and Queues Pair Relationships

Multiple logical cores should never share the same queue pair for enqueuing operations or dequeueing operations on the same ML device since this would require global locks and hinder performance.

9.2.7. Configuration of Machine Learning models

Pre-trained ML models that are built using external ML compiler / training frameworks are used to perform inference operations. These models are configured on an ML device in a two-stage process that includes loading the model on an ML device, and starting the model to accept inference operations. Inference operations can be queued for a model only when the model is in started state. Model load stage assigns a Model ID, which is unique for the model in a driver’s context. Model ID is used during all subsequent slow-path and fast-path operations.

Model loading and start is done through the rte_ml_model_load() and rte_ml_model_start() functions.

Similarly stop and unloading are done through rte_ml_model_stop() and rte_ml_model_unload() functions.

Stop and unload functions would release the resources allocated for the models. Inference tasks cannot be queued for a model that is stopped.

Detailed information related to the model can be retrieved from the driver using the function rte_ml_model_info_get(). Model information is accessible to the application through the rte_ml_model_info structure. Information available to the user would include the details related to the inputs and outputs, and the maximum batch size supported by the model.

User can optionally update the model parameters such as weights and bias, without unloading the model, through the rte_ml_model_params_update() function. A model should be in stopped state to update the parameters. Model has to be started in order to enqueue inference requests after parameters update.

9.2.8. Enqueue / Dequeue

The burst enqueue API uses a ML device identifier and a queue pair identifier to specify the device queue pair to schedule the processing on. The nb_ops parameter is the number of operations to process which are supplied in the ops array of rte_ml_op structures. The enqueue function returns the number of operations it enqueued for processing, a return value equal to nb_ops means that all packets have been enqueued.

The dequeue API uses the same format as the enqueue API of processed but the nb_ops and ops parameters are now used to specify the max processed operations the user wishes to retrieve and the location in which to store them. The API call returns the actual number of processed operations returned; this can never be larger than nb_ops.

rte_ml_op provides the required information to the driver to queue an ML inference task. ML op specifies the model to be used and the number of batches to be executed in the inference task. Input and output buffer information is specified through the structure rte_ml_buff_seg, which supports segmented data. Input is provided through the rte_ml_op::input and output through rte_ml_op::output. Data pointed in each op, should not be released until the dequeue of that op.

9.2.9. Quantize and Dequantize

Inference operations performed with lower precision types would improve the throughput and efficiency of the inference execution with a minimal loss of accuracy, which is within the tolerance limits. Quantization and dequantization is the process of converting data from a higher precision type to a lower precision type and vice-versa. ML library provides the functions rte_ml_io_quantize() and rte_ml_io_dequantize() to enable data type conversions. User needs to provide the address of the quantized and dequantized data buffers to the functions, along the number of the batches in the buffers.

For quantization, the dequantized data is assumed to be of the type dtype provided by the rte_ml_model_info::input and the data is converted to qtype provided by the rte_ml_model_info::input.

For dequantization, the quantized data is assumed to be of the type qtype provided by the rte_ml_model_info::output and the data is converted to dtype provided by the rte_ml_model_info::output.

Size of the buffers required for the input and output can be calculated using the functions rte_ml_io_input_size_get() and rte_ml_io_output_size_get(). These functions would get the buffer sizes for both quantized and dequantized data for the given number of batches.