269. Performance-thread performance Tests

The Performance-Thread results are produced using l3fwd-thread application.

For more information about Performance Thread sameple applicaton please refer to link: http://doc.dpdk.org/guides/sample_app_ug/performance_thread.html

269.1. Prerequisites

  1. Hardware requirements:

    • For each CPU socket, each memory channel should be populated with at least 1x DIMM

    • Board is populated with 4x 1GbE or 10GbE ports. Special PCIe restrictions may be required for performance. For example, the following requirements should be met for Intel 82599 (Niantic) NICs:

      • NICs are plugged into PCIe Gen2 or Gen3 slots
      • For PCIe Gen2 slots, the number of lanes should be 8x or higher
      • A single port from each NIC should be used, so for 4x ports, 4x NICs should be used
    • NIC ports connected to traffic generator. It is assumed that the NIC ports P0, P1 (as identified by the DPDK application) are connected to the traffic generator ports TG0, TG1. The application-side port mask of NIC ports P0, P1 is noted as PORTMASK in this section.

  2. BIOS requirements:

    • Intel Hyper-Threading Technology is ENABLED
    • Hardware Prefetcher is DISABLED
    • Adjacent Cache Line Prefetch is DISABLED
    • Direct Cache Access is DISABLED
  3. Linux kernel requirements:

    • Linux kernel has the following features enabled: huge page support, UIO, HPET
    • Appropriate number of huge pages are reserved at kernel boot time
    • The IDs of the hardware threads (logical cores) per each CPU socket can be determined by parsing the file /proc/cpuinfo. The naming convention for the logical cores is: C{x.y.z} = hyper-thread z of physical core y of CPU socket x, with typical values of x = 0 .. 3, y = 0 .. 7, z = 0 .. 1. Logical cores C{0.0.1} and C{0.0.1} should be avoided while executing the test, as they are used by the Linux kernel for running regular processes.

4. The application options to be used in below test cases are listed as well as the general description.:

 ./build/l3fwd-thread [EAL options] -- \
     -p PORTMASK [-P]
     --rx(port,queue,lcore,thread)[,(port,queue,lcore,thread)]
     --tx(lcore,thread)[,(lcore,thread)]
     [--enable-jumbo] [--max-pkt-len PKTLEN]]
     [--no-lthreads]

For other options please refer to URL memtioned above for detail explanation.
  1. Traffic generator requirements

The flows need to be configured and started by the traffic generator:

Flow Traffic Gen. Port MAC Src. Address MAC Dst. Address IPV4 Src. Address IPV4 Dest. Address
1 TG0 Random MAC DUT Port0 Mac Random IP 2.1.1.1
2 TG1 Random Mac DUT port1 Mac Random IP 1.1.1.1
  1. Test results table.

Frame sizes should be configured from 64,128,256,512,1024,2000, etc

Frame Size S/C/T Throughput(Mpps) Line Rate(%)
64      
128      
256      
512      
1024      
2000      

269.2. Test Case: one_lcore_per_pcore performance

  1. Launch app with descriptor parameters:

    ./examples/performance-thread/l3fwd-thread/x86_64-native-linuxapp-gcc/l3fwd-thread \
     -c ff -n 4 -- -P -p 3 --enable-jumbo --max-pkt-len 2500  \
     --rx="(0,0,0,0)(1,0,0,0)" --tx="(1,0)" --no-lthread
    

    (Note: option “–stat-lcore” is not enabled in the automation scripts)

  2. Send traffic and verify performance.

  3. Repeat above tests with below command lines respectively

# Command Line
1
./l3fwd-thread -c ff -n 4 – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(1,0,1,1) –tx=”(2,0)(3,1) –no-lthread
2
./l3fwd-thread -c ff -n 4 – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(0,1,1,1)(1,0,2,2)(1,1,3,3)” –tx=”(4,0)(5,1)(6,2)(7,3)” –no-lthread
3
./l3fwd-thread -c ff -n 4 – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(0,1,1,1)(0,2,2,2)(0,3,3,3)(1,0,4,4)(1,1,5,5)(1,2,6,6)(1,3,7,7)” –tx=”(8,0)(9,1)(10,2)(11,3)(12,4)(13,5)(14,6)(15,7)” –no-lthread
  1. Check test results and full out the above result table.

269.3. Test Case: n_lcore_per_pcore performance

  1. Launch app with descriptor parameters:

    ./examples/performance-thread/l3fwd-thread/x86_64-native-linuxapp-gcc/l3fwd-thread \
     --lcores="2,(0-1)@0" -- -P -p 3 --enable-jumbo --max-pkt-len 2500 \
     --rx="(0,0,0,0)(1,0,0,0)" --tx="(1,0)"
    

    (Note: option “–stat-lcore” is not enabled in the automation scripts)

  2. Send traffic and verify performance both directional and bi-directional

  3. Repeat above tests with below command lines respectively

# Command Line
1
./l3fwd-thread -n 4 –lcores=”(0-3)@0,4” – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(1,0,1,1) –tx=”(2,0)(3,1) –no-lthread
2
./l3fwd-thread -n 4 –lcores=”(0-7)@0,8” – -P -p 3-P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(0,1,1,1)(1,0,2,2)(1,1,3,3)” –tx=”(4,0)(5,1)(6,2)(7,3)” –no-lthread
3
./l3fwd-thread -n 4 –lcores=”(0-15)@0,16” – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(0,1,1,1)(0,2,2,2)(0,3,3,3)(1,0,4,4)(1,1,5,5)(1,2,6,6)(1,3,7,7)” –tx=”(8,0)(9,1)(10,2)(11,3)(12,4)(13,5)(14,6)(15,7)” –no-lthread
  1. Check test results and full out the above result table.

269.4. Test Case: n_lthread_per_pcore performance

  1. Launch app with descriptor parameters:

    ./examples/performance-thread/l3fwd-thread/x86_64-native-linuxapp-gcc/l3fwd-thread \
     -c ff -n 4 -- -P -p 3 --enable-jumbo --max-pkt-len 2500 \
     ----tx="(0,0)" --tx="(0,0)"
    

    (Note: option “–stat-lcore” is not enabled in the automation scripts)

  2. Send traffic and verify performance both directional and bi-directional

  3. Repeat above tests with below command lines respectively

# Command Line
1
./l3fwd-thread -c ff -n 4 – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(1,0,1,1) –tx=”(0,0),(0,1)”
2
./l3fwd-thread -c ff -n 4 – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(0,1,0,1)(1,0,0,2)(1,1,0,3)” –tx=”(0,0)(0,1)(0,2)(0,3)”
3
./l3fwd-thread -c ff -n 4 – -P -p 3 –enable-jumbo –max-pkt-len 2500
–rx=”(0,0,0,0)(0,1,0,1)(0,2,0,2)(0,3,0,3)(1,0,0,4)(1,1,0,5)(1,2,0,6)(1,3,0,7)” –tx=”(0,0)(0,1)(0,2)(0,3)(0,4)(0,5)(0,6)(0,7)”
  1. Check test results and full out the above result table.