55. TAP Poll Mode Driver
The TAP Poll Mode Driver (PMD) is a virtual device for injecting packets to be processed by the Linux kernel. This PMD is useful when writing DPDK application for offloading network functionality (such as tunneling) from the kernel.
From the kernel point of view, the TAP device looks like a regular network interface.
The network device can be managed by standard tools such as ip
and ethtool
commands.
It is also possible to use existing packet tools such as wireshark
or tcpdump
.
From the DPDK application, the TAP device looks like a DPDK ethdev. Packets are sent and received in L2 (Ethernet) format. The standard DPDK API’s to query for information, statistics and send/receive packets work as expected.
55.1. Requirements
The TAP PMD requires kernel support for multiple queues in TAP device
as well as the multi-queue multiq
and incoming ingress
queue disciplines.
These are standard kernel features in most Linux distributions.
55.2. Arguments
TAP devices are created with the command line --vdev=net_tap0
option.
This option may be specified more than once by repeating with a different net_tapX
device.
By default, the Linux interfaces are named dtap0
, dtap1
, etc.
The interface name can be specified by adding the iface=foo0
, for example:
--vdev=net_tap0,iface=foo0 --vdev=net_tap1,iface=foo1, ...
Normally the PMD will generate a random MAC address.
If a static address is desired instead, the mac=fixed
can be used:
--vdev=net_tap0,mac=fixed
With the fixed option, the MAC address will have the first octets: as 02:’d’:’t’:’a’:’p’:[00-FF] and the last octets are the interface number.
To specify a specific MAC address use the conventional representation.
The string is byte converted to hex, the result is MAC address: 02:64:74:61:70:11
.
It is possible to specify a remote netdevice to capture packets from by adding
remote=foo1
, for example:
--vdev=net_tap,iface=tap0,remote=foo1
If a remote
is set, the tap MAC address will be set to match the remote one
just after netdevice creation. Using TC rules, traffic from the remote netdevice
will be redirected to the tap. If the tap is in promiscuous mode, then all
packets will be redirected. In allmulti mode, all multicast packets will be
redirected.
Using the remote feature is especially useful for capturing traffic from a netdevice that has no support in the DPDK. It is possible to add explicit rte_flow rules on the tap PMD to capture specific traffic (see next section for examples).
Normally, when the DPDK application exits, the TAP device is marked down and is removed. But this behavior can be overridden by the use of the persist flag, example:
--vdev=net_tap0,iface=tap0,persist ...
55.3. TUN devices
The TAP device can be used as an L3 tunnel only device (TUN). This type of device does not include the Ethernet (L2) header; all packets are sent and received as IP packets.
TUN devices are created with the command line arguments --vdev=net_tunX
,
where X stands for unique id, example:
--vdev=net_tun0 --vdev=net_tun1,iface=foo1, ...
Unlike TAP PMD, TUN PMD does not support user arguments as MAC
or remote
user
options. Default interface name is dtunX
, where X stands for unique id.
55.4. Flow API support
The TAP PMD supports major flow API pattern items and actions.
55.4.1. Requirements
Flow support in TAP driver requires the Linux kernel support of
flow based traffic control filter flower
.
This was added in Linux 4.3 kernel.
The implementation of RSS action uses an eBPF module
that requires additional libraries and tools.
Building the RSS support requires the clang
compiler
to compile the C code to BPF target;
bpftool
to convert the compiled BPF object to a header file;
and libbpf
to load the eBPF action into the kernel.
Supported match items:
eth: src and dst (with variable masks), and eth_type (0xffff mask).
vlan: vid, pcp, but not eid. (requires kernel 4.9)
ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask).
udp/tcp: src and dst port (0xffff) mask.
Supported actions:
DROP
QUEUE
PASSTHRU
RSS
It is generally not possible to provide a “last” item. However, if the “last” item, once masked, is identical to the masked spec, then it is supported.
Only IPv4/6 and MAC addresses can use a variable mask. All other items need a full mask (exact match).
As rules are translated to TC, it is possible to show them with something like:
tc -s filter show dev dtap1 parent 1:
55.4.2. Examples of testpmd flow rules
Drop packets for destination IP 192.0.2.1:
testpmd> flow create 0 priority 1 ingress pattern eth / ipv4 dst is 192.0.2.1 \
/ end actions drop / end
Ensure packets from a given MAC address are received on a queue 2:
testpmd> flow create 0 priority 2 ingress pattern eth src is 06:05:04:03:02:01 \
/ end actions queue index 2 / end
Drop UDP packets in vlan 3:
testpmd> flow create 0 priority 3 ingress pattern eth / vlan vid is 3 / \
ipv4 proto is 17 / end actions drop / end
Distribute IPv4 TCP packets using RSS to a given MAC address over queues 0-3:
testpmd> flow create 0 priority 4 ingress pattern eth dst is 0a:0b:0c:0d:0e:0f \
/ ipv4 / tcp / end actions rss queues 0 1 2 3 end / end
55.5. Multi-process sharing
It is possible to attach an existing TAP device in a secondary process, by declaring it as a vdev with the same name as in the primary process, and without any parameter.
The port attached in a secondary process will give access to the statistics and the queues. Therefore it can be used for monitoring or Rx/Tx processing.
The IPC synchronization of Rx/Tx queues is currently limited:
Maximum 8 queues shared
Synchronized on probing, but not on later port update
55.6. RSS specifics
The default packet distribution in TAP without flow rules is done by the kernel which has a default flow based distribution. When flow rules are used to distribute packets across a set of queues, an eBPF program is used to calculate the RSS based on Toeplitz algorithm with the given key.
The hash is calculated for IPv4 and IPv6, over src/dst addresses (8 or 32 bytes for IPv4 or IPv6 respectively) and optionally the src/dst TCP/UDP ports (4 bytes).
55.7. Limitations
Since TAP device uses a file descriptor to talk to the kernel, the same number of queues must be specified for receive and transmit.
The RSS algorithm only support L3 or L4 functions. It does not support finer grain selections (for example: only IPV6 packets with extension headers).