23. Machine Learning Device Library
The MLDEV library provides a Machine Learning device framework for the management and provisioning of hardware and software ML poll mode drivers, defining an API which support a number of ML operations including device handling and inference processing. The ML model creation and training is outside of the scope of this library.
The ML framework is built on the following model:
- ML Device
A hardware or software-based implementation of ML device API for running inferences using a pre-trained ML model.
- ML Model
An ML model is an algorithm trained over a dataset. A model consists of procedure/algorithm and data/pattern required to make predictions on live data. Once the model is created and trained outside of the DPDK scope, the model can be loaded via
rte_ml_model_load()
and then start it usingrte_ml_model_start()
API function. Therte_ml_model_params_update()
can be used to update the model parameters such as weights and bias without unloading the model usingrte_ml_model_unload()
.- ML Inference
ML inference is the process of feeding data to the model via
rte_ml_enqueue_burst()
API function and userte_ml_dequeue_burst()
API function to get the calculated outputs / predictions from the started model.
23.1. Design Principles
The MLDEV library follows the same basic principles as those used in DPDK’s Ethernet Device framework and the Crypto framework. The MLDEV framework provides a generic Machine Learning device framework which supports both physical (hardware) and virtual (software) ML devices as well as an ML API to manage and configure ML devices. The API also supports performing ML inference operations through ML poll mode driver.
23.2. Device Operations
23.2.1. Device Creation
Physical ML devices are discovered during the PCI probe/enumeration, through the EAL functions which are executed at DPDK initialization, based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function). ML physical devices, like other physical devices in DPDK can be allowed or blocked using the EAL command line options.
23.2.2. Device Identification
Each device, whether virtual or physical is uniquely designated by two identifiers:
A unique device index used to designate the ML device in all functions exported by the MLDEV API.
A device name used to designate the ML device in console messages, for administration or debugging purposes.
23.2.3. Device Features and Capabilities
ML devices may support different feature set.
In order to get the supported PMD feature rte_ml_dev_info_get()
API
which return the info of the device and its supported features.
23.2.4. Device Configuration
The configuration of each ML device includes the following operations:
Allocation of resources, including hardware resources if a physical device.
Resetting the device into a well-known default state.
Initialization of statistics counters.
The rte_ml_dev_configure()
API is used to configure a ML device.
int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg);
The rte_ml_dev_config
structure is used to pass the configuration parameters
for the ML device, for example number of queue pairs, maximum number of models,
maximum size of model and so on.
23.2.5. Configuration of Queue Pairs
Each ML device can be configured with number of queue pairs.
Each queue pair is configured using rte_ml_dev_queue_pair_setup()
23.2.6. Logical Cores, Memory and Queues Pair Relationships
Multiple logical cores should never share the same queue pair for enqueuing operations or dequeueing operations on the same ML device since this would require global locks and hinder performance.
23.2.7. Configuration of Machine Learning models
Pre-trained ML models that are built using external ML compiler / training frameworks are used to perform inference operations. These models are configured on an ML device in a two-stage process that includes loading the model on an ML device, and starting the model to accept inference operations. Inference operations can be queued for a model only when the model is in started state. Model load stage assigns a Model ID, which is unique for the model in a driver’s context. Model ID is used during all subsequent slow-path and fast-path operations.
Model loading and start is done
through the rte_ml_model_load()
and rte_ml_model_start()
functions.
Similarly stop and unloading are done
through rte_ml_model_stop()
and rte_ml_model_unload()
functions.
Stop and unload functions would release the resources allocated for the models. Inference tasks cannot be queued for a model that is stopped.
Detailed information related to the model can be retrieved from the driver
using the function rte_ml_model_info_get()
.
Model information is accessible to the application
through the rte_ml_model_info
structure.
Information available to the user would include the details related to
the inputs and outputs, and the maximum batch size supported by the model.
User can optionally update the model parameters such as weights and bias,
without unloading the model, through the rte_ml_model_params_update()
function.
A model should be in stopped state to update the parameters.
Model has to be started in order to enqueue inference requests after parameters update.
23.2.8. Enqueue / Dequeue
The burst enqueue API uses a ML device identifier and a queue pair identifier
to specify the device queue pair to schedule the processing on.
The nb_ops
parameter is the number of operations to process
which are supplied in the ops
array of rte_ml_op
structures.
The enqueue function returns the number of operations it enqueued for processing,
a return value equal to nb_ops
means that all packets have been enqueued.
The dequeue API uses the same format as the enqueue API of processed
but the nb_ops
and ops
parameters are now used to specify
the max processed operations the user wishes to retrieve
and the location in which to store them.
The API call returns the actual number of processed operations returned;
this can never be larger than nb_ops
.
rte_ml_op
provides the required information to the driver
to queue an ML inference task.
ML op specifies the model to be used and the number of batches
to be executed in the inference task.
Input and output buffer information is specified through
the structure rte_ml_buff_seg
, which supports segmented data.
Input is provided through the rte_ml_op::input
and output through rte_ml_op::output
.
Data pointed in each op, should not be released until the dequeue of that op.
23.2.9. Quantize and Dequantize
Inference operations performed with lower precision types would improve
the throughput and efficiency of the inference execution
with a minimal loss of accuracy, which is within the tolerance limits.
Quantization and dequantization is the process of converting data
from a higher precision type to a lower precision type and vice-versa.
ML library provides the functions rte_ml_io_quantize()
and rte_ml_io_dequantize()
to enable data type conversions.
User needs to provide the address of the quantized and dequantized data buffers
to the functions, along the number of the batches in the buffers.
For quantization, the dequantized data is assumed to be
of the type dtype
provided by the rte_ml_model_info::input
and the data is converted to qtype
provided by the rte_ml_model_info::input
.
For dequantization, the quantized data is assumed to be
of the type qtype
provided by the rte_ml_model_info::output
and the data is converted to dtype
provided by the rte_ml_model_info::output
.
Size of the buffers required for the input and output can be calculated
using the functions rte_ml_io_input_size_get()
and rte_ml_io_output_size_get()
.
These functions would get the buffer sizes for both quantized and dequantized data
for the given number of batches.