.. SPDX-License-Identifier: BSD-3-Clause Copyright (c) 2022 Marvell. Machine Learning Device Library =============================== The MLDEV library provides a Machine Learning device framework for the management and provisioning of hardware and software ML poll mode drivers, defining an API which support a number of ML operations including device handling and inference processing. The ML model creation and training is outside of the scope of this library. The ML framework is built on the following model: .. _figure_mldev_work_flow: .. figure:: img/mldev_flow.* Work flow of inference on MLDEV ML Device A hardware or software-based implementation of ML device API for running inferences using a pre-trained ML model. ML Model An ML model is an algorithm trained over a dataset. A model consists of procedure/algorithm and data/pattern required to make predictions on live data. Once the model is created and trained outside of the DPDK scope, the model can be loaded via ``rte_ml_model_load()`` and then start it using ``rte_ml_model_start()`` API function. The ``rte_ml_model_params_update()`` can be used to update the model parameters such as weights and bias without unloading the model using ``rte_ml_model_unload()``. ML Inference ML inference is the process of feeding data to the model via ``rte_ml_enqueue_burst()`` API function and use ``rte_ml_dequeue_burst()`` API function to get the calculated outputs / predictions from the started model. Design Principles ----------------- The MLDEV library follows the same basic principles as those used in DPDK's Ethernet Device framework and the Crypto framework. The MLDEV framework provides a generic Machine Learning device framework which supports both physical (hardware) and virtual (software) ML devices as well as an ML API to manage and configure ML devices. The API also supports performing ML inference operations through ML poll mode driver. Device Operations ----------------- Device Creation ~~~~~~~~~~~~~~~ Physical ML devices are discovered during the PCI probe/enumeration, through the EAL functions which are executed at DPDK initialization, based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function). ML physical devices, like other physical devices in DPDK can be allowed or blocked using the EAL command line options. Device Identification ~~~~~~~~~~~~~~~~~~~~~ Each device, whether virtual or physical is uniquely designated by two identifiers: - A unique device index used to designate the ML device in all functions exported by the MLDEV API. - A device name used to designate the ML device in console messages, for administration or debugging purposes. Device Features and Capabilities ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ML devices may support different feature set. In order to get the supported PMD feature ``rte_ml_dev_info_get()`` API which return the info of the device and its supported features. Device Configuration ~~~~~~~~~~~~~~~~~~~~ The configuration of each ML device includes the following operations: - Allocation of resources, including hardware resources if a physical device. - Resetting the device into a well-known default state. - Initialization of statistics counters. The ``rte_ml_dev_configure()`` API is used to configure a ML device. .. code-block:: c int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg); The ``rte_ml_dev_config`` structure is used to pass the configuration parameters for the ML device, for example number of queue pairs, maximum number of models, maximum size of model and so on. Configuration of Queue Pairs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each ML device can be configured with number of queue pairs. Each queue pair is configured using ``rte_ml_dev_queue_pair_setup()`` Logical Cores, Memory and Queues Pair Relationships ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Multiple logical cores should never share the same queue pair for enqueuing operations or dequeueing operations on the same ML device since this would require global locks and hinder performance. Configuration of Machine Learning models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pre-trained ML models that are built using external ML compiler / training frameworks are used to perform inference operations. These models are configured on an ML device in a two-stage process that includes loading the model on an ML device, and starting the model to accept inference operations. Inference operations can be queued for a model only when the model is in started state. Model load stage assigns a Model ID, which is unique for the model in a driver's context. Model ID is used during all subsequent slow-path and fast-path operations. Model loading and start is done through the ``rte_ml_model_load()`` and ``rte_ml_model_start()`` functions. Similarly stop and unloading are done through ``rte_ml_model_stop()`` and ``rte_ml_model_unload()`` functions. Stop and unload functions would release the resources allocated for the models. Inference tasks cannot be queued for a model that is stopped. Detailed information related to the model can be retrieved from the driver using the function ``rte_ml_model_info_get()``. Model information is accessible to the application through the ``rte_ml_model_info`` structure. Information available to the user would include the details related to the inputs and outputs, and the maximum batch size supported by the model. User can optionally update the model parameters such as weights and bias, without unloading the model, through the ``rte_ml_model_params_update()`` function. A model should be in stopped state to update the parameters. Model has to be started in order to enqueue inference requests after parameters update. Enqueue / Dequeue ~~~~~~~~~~~~~~~~~ The burst enqueue API uses a ML device identifier and a queue pair identifier to specify the device queue pair to schedule the processing on. The ``nb_ops`` parameter is the number of operations to process which are supplied in the ``ops`` array of ``rte_ml_op`` structures. The enqueue function returns the number of operations it enqueued for processing, a return value equal to ``nb_ops`` means that all packets have been enqueued. The dequeue API uses the same format as the enqueue API of processed but the ``nb_ops`` and ``ops`` parameters are now used to specify the max processed operations the user wishes to retrieve and the location in which to store them. The API call returns the actual number of processed operations returned; this can never be larger than ``nb_ops``. ``rte_ml_op`` provides the required information to the driver to queue an ML inference task. ML op specifies the model to be used and the number of batches to be executed in the inference task. Input and output buffer information is specified through the structure ``rte_ml_buff_seg``, which supports segmented data. Input is provided through the ``rte_ml_op::input`` and output through ``rte_ml_op::output``. Data pointed in each op, should not be released until the dequeue of that op. Quantize and Dequantize ~~~~~~~~~~~~~~~~~~~~~~~ Inference operations performed with lower precision types would improve the throughput and efficiency of the inference execution with a minimal loss of accuracy, which is within the tolerance limits. Quantization and dequantization is the process of converting data from a higher precision type to a lower precision type and vice-versa. ML library provides the functions ``rte_ml_io_quantize()`` and ``rte_ml_io_dequantize()`` to enable data type conversions. User needs to provide the address of the quantized and dequantized data buffers to the functions, along the number of the batches in the buffers. For quantization, the dequantized data is assumed to be of the type ``dtype`` provided by the ``rte_ml_model_info::input`` and the data is converted to ``qtype`` provided by the ``rte_ml_model_info::input``. For dequantization, the quantized data is assumed to be of the type ``qtype`` provided by the ``rte_ml_model_info::output`` and the data is converted to ``dtype`` provided by the ``rte_ml_model_info::output``. Size of the buffers required for the input and output can be calculated using the functions ``rte_ml_io_input_size_get()`` and ``rte_ml_io_output_size_get()``. These functions would get the buffer sizes for both quantized and dequantized data for the given number of batches.