Internet-Draft Benchmarking Containerized Infra November 2024
Tran, et al. Expires 8 May 2025 [Page]
Workgroup:
Benchmarking Methodology Working Group
Internet-Draft:
draft-ietf-bmwg-containerized-infra-03
Published:
Intended Status:
Informational
Expires:
Authors:
N. Tran
Soongsil University
S. Rao
The Linux Foundation
J. Lee
Soongsil University
Y. Kim
Soongsil University

Considerations for Benchmarking Network Performance in Containerized Infrastructures

Abstract

Recently, the Benchmarking Methodology Working Group has extended the laboratory characterization from physical network functions (PNFs) to virtual network functions (VNFs). Considering the network function implementation trend moving from virtual machine-based to container-based, system configurations and deployment scenarios for benchmarking will be partially changed by how the resources allocation and network technologies are specified for containerized network functions. This draft describes additional considerations for benchmarking network performance when network functions are containerized and performed in general-purpose hardware.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 8 May 2025.

Table of Contents

1. Introduction

The Benchmarking Methodology Working Group(BMWG) has recently expanded its benchmarking scope from Physical Network Function (PNF) running on a dedicated hardware system to Network Function Virtualization(NFV) infrastructure and Virtualized Network Function (VNF). [RFC8172] described considerations for configuring NFV infrastructure and benchmarking metrics, and [RFC8204] gives guidelines for benchmarking virtual switch which connects VNFs in Open Platform for NFV (OPNFV).

Recently NFV infrastructure has evolved to include a lightweight virtualized platform called the containerized infrastructure. Most benchmarking methodologies and configuration parameters specified in [RFC8172] and [RFC8204] can be equally applied to benchmark container networking. However, major architecture differences between virtual machine (VM)-based and container-based infrastructure cause additional considerations.

In terms of virtualization method, Containerized Network Functions (CNF) are virtualized using the host Operating System (OS) virtualization instead of hypervisor-based hardware virtualization in VM-based infrastructure. In comparison to VMs, containers do not have a separate hardware and kernel. CNFs share the same kernel space on the same host, while their resources are logically isolated in different namespaces. Hence, benchmarking container network performance might require different resources configuration settings.

In terms of networking, to route traffic between containers which are isolated in different network namespaces, a Container Network Interface (CNI) Plugin is required. Initially, when a pod or container is first instantiated, it has no network. Container network plugins insert a network interface into the isolated container network namespace, and performs other necessary tasks to connect the host and container network namespaces. It then allocates IP address to the interface, configures routing consistent with the IP address management plugin. Different CNIs use different networking technologies to implement this connection. Based on the plugins' networking technologies, and how the packet is processed/accelerated via the Kernel-space and/or the user-space of the host, these plugins can be categorized into different container networking models. These models should be considered while benchmarking container network performance.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. Terminology

This document uses the terminology described in [RFC8172], [RFC8204], [ETSI-TST-009].

Besides, with the proliferation and popularity of Kubernetes as a container orchestration platform, this document uses Kubernetes' terminologies for general containerized infrastructure.

Pod is defined as a basic and smallest unit for orchestration and management that can host multiple containers, with shared storage and network resources. Generally, each CNF is deployed as a container in a single pod. In this document, the terms container and pod are used interchangeably.

Container Network Interface (CNI) plugin is the framework that dynamically create and configure network for containers.

4. Scope

The primary scope of this document is to fill in the gaps of previous BMWG's NFV benchmarking consideration works ([RFC8172] and [RFC8204]) when applying to containerized NFV infrastructure. The first gap is different network models/topologies configured by container network interfaces (especially the extended Berkeley Packet Filter model which was not mentioned in previous documents). The other gap is resources configuration for containers. This document investigates these gaps as additional benchmarking considerations for NFV infrastructure.

Note that apart from the unique characteristics, benchmarking test and assessment methodologies defined in the above mentioned RFCs can be equally applied to containerized infrastructure from a generic-NFV point of view.

5. Benchmarking Considerations

5.1. Networking Models

Compared with VNFs, selected CNI Plugin is an important software detail parameter for containerized infrastructure benchmarking. Different CNI plugins configure different network architecture for CNFs in terms of network interfaces, virtual switch usage, and packet acceleration techniques. This section categorizes container networking models based on CNI plugin characteristics.

Note that mentioned CNI plugins in each category are notable examples, and any other current CNI plugins can fall into one of the categories mentioned in this section.

To ensure the repeatability of container network setup for benchmarking each networking model, Kubernetes is recommended to be used as the container orchestration platform because it has different CNIs that can support all the different models that are mentioned in this document. Apart from installing the corresponding CNI, underlay network configurations of the Device Under Test/System Under Test (DUT/SUT) might also be required depending on the networking model type. Each networking model sub-section below will mention these details.

5.1.1. Normal non-Acceleration Networking Model

  +------------------------------------------------------------------+
  | User Space                                                       |
  | +-------------+                                +-------------+   |
  | |     CNF     |                                |     CNF     |   |
  | | +---------+ |                                | +---------+ |   |
  | | | fwd app | |                                | | fwd app | |   |
  | | +-----^---+ |                                | +-----^---+ |   |
  | |       |     |                                |       |     |   |
  | | +---- v---+ |                                | +-----v---+ |   |
  | | |  Linux  | |                                | |  Linux  | |   |
  | | | nw stack| |                                | | nw stack| |   |
  | | +-----^---+ |                                | +-----^---+ |   |
  | |       |     |                                |       |     |   |
  | |   +---v---+ |                                |   +---v---+ |   |
  | +---| veth  |-+                                +---| veth  |-+   |
  |     +---^---+                                      +---^---+     |
  |         |                                              |         |
  |         |     +----------------------------------+     |         |
  |         |     |                                  |     |         |
  |         |     |  Networking Controller / Agent   |     |         |
  |         |     |                                  |     |         |
  |         |     +-----------------^^---------------+     |         |
  ----------|-----------------------||---------------------|----------
  |     +---v---+                   ||                 +---v---+     |
  |  +--|  veth |-------------------vv-----------------|  veth |--+  |
  |  |  +-------+     Switching/Routing Component      +-------+  |  |
  |  |         (Kernel Routing Table, OVS Kernel Datapath,        |  |
  |  |         Linux Bridge, MACVLAN/IPVLAN sub-interfaces)       |  |
  |  |                                                            |  |
  |  +-------------------------------^----------------------------+  |
  |                                  |                               |
  | Kernel Space         +-----------v----------+                    |
  +----------------------|          NIC         |--------------------+
                         +----------------------+

Figure 1: Example architecture of the Normal non-Acceleration Networking Model

Figure 1 shows the Normal non-Acceleration networking model. This networking model is normally deployed in general container infrastructure environment. A single CNI is used for configuring the container network. CNF has a separate network namespace with the host. A virtual ethernet (veth) interface pairs are configured by the CNI to create a tunnel for connecting these two separated network namespace. The veth on the CNF side attaches to the vanilla Linux network stack inside the CNF. The veth on the host side can attach to the switching/routing component at the host kernel space.

The kernel switching/routing component is different depending on the implemented CNI. In the case of Calico, it is the direct point-to-point attachment to the host namespace then using Kernel routing table for routing between containers. For Flannel, it is the Linux Bridge. In the case of MACVLAN/IPVLAN, it is the corresponding virtual sub-interfaces. For dynamic networking configuration, the Forwarding policy can be pushed by the controller/agent located in the user-space. In the case of Open vSwitch (OVS) [OVS], configured with Kernel Datapath, the first packet of the 'non-matching' flow can be sent to the user space networking controller/agent (ovs-switchd) for dynamic forwarding decision.

In general, data packets should be processed in-network stack of host kernel before being transferred to the CNF running in user-space. Not only pod-to-External but also pod-to-pod traffic should be processed in the kernel space. Besides, data packets also traverse through the network stack inside CNF between the application and the CNF veth. This design makes networking performance worse than other networking models which utilize packet acceleration techniques described in below sections. Kernel-space vSwitch models are listed below:

o Docker Network [Docker-network], Flannel Network [Flannel], Calico [Calico], OVS (OpenvSwitch) [OVS], OVN (Open Virtual Network) [OVN], MACVLAN, IPVLAN

This normal non-Acceleration networking model is the basic and default containerized networking model in general container deployment use-cases. It can be set up by applying the corresponding YAML configuration file of the chosen CNI to the containerized cluster.

5.1.2. Acceleration Networking Models

Acceleration Networking Models includes different types of container networking model that apply different kinds of packet acceleration technique to address the packet processing performance bottleneck issue of the kernel networking stack. Compared with the Normal non-Acceleration networking model that requires only a single CNI, Acceleration networking models normally require at least two CNIs to configure two network interface for each CNF. In general, one default CNF network interface is configured by a normal non-Acceleration CNI for IP address management and control plane traffic such as CNF management. The other network interfaces are configured by CNIs that support packet acceleration techniques. The accelerated network interface is used for high performance application traffic transmission. Multus CNI is a special CNI plugin that is required to enable attaching multiple network interfaces to CNF. The default CNI for IPAM and the chosen CNI for packet acceleration should be defined in Multus CNI configuration.

The Acceleration networking model architecture and characteristic differences caused by different implemented packet acceleration techniques are discussed below. For simplicity and clearance, the example architecture figures of different Acceleration networking models only illustrate the accelerated application datapath corresponding to the CNF network interface of the CNI that enables packet acceleration. The CNF management datapath corresponding to the default CNI is omitted.

5.1.2.1. User-space Acceleration Model
  +------------------------------------------------------------------+
  | User Space                                                       |
  |   +---------------+                          +---------------+   |
  |   |      CNF     +|-------+          +-------|+     CNF      |   |
  |   |              |default |          |default |              |   |
  |   |  +---------+ |CNI eth |          |CNI eth | +---------+  |   |
  |   |  | fwd app | +|-------+          +-------|+ | fwd app |  |   |
  |   |  +----|----+  |                          |  +----|----+  |   |
  |   | +-----|-----+ |    +-----------------+   | +-----|-----+ |   |
  |   | |  virtio   | |    |    Networking   |   | |  virtio   | |   |
  |   +-|  /memif   |-+    | Controller/Agent|   +-|  /memif   |-+   |
  |     +-----^-----+      +-------^^--------+     +-----^-----+     |
  |           |                    ||                    |           |
  |           |                    ||                    |           |
  |     +-----v-----+              ||              +-----v-----+     |
  |     | vhost-user|              ||              | vhost-user|     |
  |  +--|  / memif  |--------------vv--------------|  / memif  |--+  |
  |  |  +-----------+                              +-----------+  |  |
  |  |             User space DPDK-supported vSwitch              |  |
  |  |                      +--------------+                      |  |
  |  +----------------------|      PMD     |----------------------+  |
  |                         |              |                         |
  |                         +-------^------+                         |
  ----------------------------------|---------------------------------
  |                                 |                                |
  |                                 |                                |
  |                                 |                                |
  | Kernel Space         +----------V-----------+                    |
  +----------------------|          NIC         |--------------------+
                         +----------------------+

Figure 2: Example architecture of the User-Space Acceleration Model

Figure 2 shows the user-space vSwitch model, in which data packets from the physical network port bypass the network stack in the kernel space and are delivered directly to the vSwitch running on user-space. This model is commonly considered Data Plane Acceleration (DPA) technology since it can achieve higher packet processing rates than a Kernel-space network with limited packet throughput. To create this user-space acceleration networking model, the user-space vSwitch is required to support Data Plane Development Kit (DPDK) libraries. DPDK enables the user-space vSwitch to use Poll Mode Drivers (PMD) to poll the incoming packets from the NIC queues and transfer them directly to the user-space vSwitch.

Userspace CNI [userspace-cni] is required to create interfaces for packet transfer between user-space vSwitch and pods. This CNI creates shared-memory interfaces that can improve packet transfer performance between vSwitch and pods. The two common shared-memory interface kinds are vhost-user and memif. In case of vhost-user, the CNI creates a virtio PMD at the pod, and links it with the vhost-user port at the DPDK-based vSwitch. In case of memif, the CNI creates a memif PMD at the pod, and links it with the memif port at the DPDK-based vSwitch.

User-space Acceleration models are listed below based on the current available DPDK-based user-space vSwitches:

o OVS-DPDK [ovs-dpdk], VPP [vpp]

To set up the user-space acceleration model, mapping between NIC ports, vSwitch ports, and pod interfaces is required. For packet transfer between NIC and vSwitch, DPDK libraries and a DPDK-based user-space vSwitches need to be installed. Then, selected NIC ports for user-space acceleration network need to be bound to the vSwitch's DPDK PMD by using a DPDK-compatible driver such as VFIO or UIO. For packet transfer between vSwitch and pods, vhost-user/memif ports need to be added at vSwitch via port configurations. Traffic routing paths between NIC polling PMD ports and these vhost-user/memif ports should be configured at vSwitch. Then, Userspace CNI should be installed and configured to map the pods' virtio/memif interfaces to the vSwitch's vhost-user/memif ports.

5.1.2.2. eBPF Acceleration Model
  +------------------------------------------------------------------+
  | User Space                                                       |
  |    +----------------+                     +----------------+     |
  |    |       CNF      |                     |       CNF      |     |
  |    | +------------+ |                     | +------------+ |     |
  |    | |  fwd app   | |                     | |  fwd app   | |     |
  |    | +-----^------+ |                     | +------^-----+ |     |
  |    |       |        |                     |        |       |     |
  |    | +-----v------+ |                     | +------v-----+ |     |
  |    | |   Linux    | |                     | |    Linux   | |     |
  |    | |  nw stack  | |                     | |  nw stack  | |     |
  |    | +-----^------+ |                     | +------^-----+ |     |
  |    |       |        |                     |        |       |     |
  |    | +-----v------+ |                     | +------v-----+ |     |
  |    +-|    eth     |-+                     +-|     eth    |-+     |
  |      +-----^------+                         +------^-----+       |
  |            |                                       |             |
  -------------|---------------------------------------|--------------
  |      +-----v-------+                        +-----v-------+      |
  |      |  +------+   |                        |  +------+   |      |
  |      |  | eBPF |   |                        |  | eBPF |   |      |
  |      |  +------+   |                        |  +------+   |      |
  |      | veth tc hook|                        | veth tc hook|      |
  |      +-----^-------+                        +------^------+      |
  |            |                                       |             |
  |            |   +-------------------------------+   |             |
  |            |   |                               |   |             |
  |            |   |       Networking Stack        |   |             |
  |            |   |                               |   |             |
  |            |   +-------------------------------+   |             |
  |      +-----v-------+                        +------v------+      |
  |      |  +------+   |                        |  +------+   |      |
  |      |  | eBPF |   |                        |  | eBPF |   |      |
  |      |  +------+   |                        |  +------+   |      |
  |      | veth tc hook|                        | veth tc hook|      |
  |      +-------------+                        +-------------+      |
  |      |     OR      |                        |     OR      |      |
  |    +-|-------------|------------------------|-------------|--+   |
  |    | +-------------+                        +-------------+  |   |
  |    | |  +------+   |                        |  +------+   |  |   |
  |    | |  | eBPF |   |         NIC Driver     |  | eBPF |   |  |   |
  |    | |  +------+   |                        |  +------+   |  |   |
  |    | |  XDP hook   |                        |  XDP hook   |  |   |
  |    | +-------------+                        +------------ +  |   |
  |    +---------------------------^-----------------------------+   |
  |                                |                                 |
  | Kernel Space          +--------v--------+                        |
  +-----------------------|       NIC       |------------------------+
                          +-----------------+
Figure 3: Example architecture of the eBPF Acceleration Model - non-AF_XDP
  +------------------------------------------------------------------+
  | User Space                                                       |
  |    +-----------------+                    +-----------------+    |
  |    |       CNF      +|-------+    +-------|+      CNF       |    |
  |    |   +---------+  |default |    |default |  +---------+   |    |
  |    |   | fwd app |  |CNI veth|    |CNI veth|  | fwd app |   |    |
  |    |   +---|-----+  +|-------+    +-------|+  +----|----+   |    |
  |    | +-----|-------+ |  +--------------+  | +------|------+ |    |
  |    +-|  CNDP port  |-+  |   CNDP APIs  |  +-|  CNDP Port  |-+    |
  |      +-----^-------+    +--------------+    +------^------+      |
  |            |                                       |             |
  |      +-----v-------+                        +------v------+      |
  -------|    AF_XDP   |------------------------|    AF_XDP   |------|
  |      |    socket   |                        |    socket   |      |
  |      +-----^-------+                        +-----^-------+      |
  |            |                                       |             |
  |            |   +-------------------------------+   |             |
  |            |   |                               |   |             |
  |            |   |       Networking Stack        |   |             |
  |            |   |                               |   |             |
  |            |   +-------------------------------+   |             |
  |            |                                       |             |
  |    +-------|---------------------------------------|--------+    |
  |    | +-----|------+                           +----|-------+|    |
  |    | |  +--v---+  |                           |  +-v----+  ||    |
  |    | |  | eBPF |  |         NIC Driver        |  | eBPF |  ||    |
  |    | |  +------+  |                           |  +------+  ||    |
  |    | |  XDP hook  |                           |  XDP hook  ||    |
  |    | +-----^------+                           +----^-------+|    |
  |    +-------|-------------------^-------------------|--------+    |
  |            |                                       |             |
  -------------|---------------------------------------|--------------
  |            +---------+                   +---------+             |
  |               +------|-------------------|----------+            |
  |               | +----v-------+       +----v-------+ |            |
  |               | |   netdev   |       |   netdev   | |            |
  |               | |     OR     |       |     OR     | |            |
  |               | | sub/virtual|       | sub/virtual| |            |
  |               | |  function  |       |  function  | |            |
  | Kernel Space  | +------------+  NIC  +------------+ |            |
  +---------------|                                     |------------+
                  +-------------------------------------+

Figure 4: Example architecture of the eBPF Acceleration Model - using AF_XDP supported CNI
  +------------------------------------------------------------------+
  | User Space                                                       |
  |   +---------------+                          +---------------+   |
  |   |      CNF     +|-------+          +-------|+    CNF       |   |
  |   |  +---------+ |default |          |default | +---------+  |   |
  |   |  | fwd app | |CNI veth|          |CNI veth| | fwd app |  |   |
  |   |  +----|----+ +|-------+          +-------|+ +----|----+  |   |
  |   | +-----|-----+ |    +-----------------+   | +-----|-----+ |   |
  |   | |  virtio   | |    |    Networking   |   | |  virtio   |-|   |
  |   +-|  /memif   |-+    | Controller/Agent|   +-|  /memif   |-+   |
  |     +-----^-----+      +-------^^--------+     +-----^-----+     |
  |           |                    ||                    |           |
  |           |                    ||                    |           |
  |     +-----v-----+              ||              +-----v-----+     |
  |     | vhost-user|              ||              | vhost-user|     |
  |  +--|  / memif  |--------------vv--------------|  / memif  |--+  |
  |  |  +-----^-----+                              +-----^-----+  |  |
  |  |        |    User space DPDK-supported vSwitch     |        |  |
  |  |  +-----v-----+                              +-----v-----+  |  |
  |  +--|AF_XDP PMD |------------------------------|AF_XDP PMD |--+  |
  |     +-----^-----+                              +-----^-----+     |
  |           |                                          |           |
  |     +-----v-----+                              +-----v-----+     |
  ------|   AF_XDP  |------------------------------|   AF_XDP  |-----|
  |     |   socket  |                              |   socket  |     |
  |     +-----^----+                               +-----^-----+     |
  |           |                                          |           |
  |           |    +-------------------------------+     |           |
  |           |    |                               |     |           |
  |           |    |       Networking Stack        |     |           |
  |           |    |                               |     |           |
  |           |    +-------------------------------+     |           |
  |           |                                          |           |
  |    +------|------------------------------------------|--------+  |
  |    | +----|-------+                           +------|-----+  |  |
  |    | |  +-v----+  |                           |  +---v--+  |  |  |
  |    | |  | eBPF |  |         NIC Driver        |  | eBPF |  |  |  |
  |    | |  +------+  |                           |  +------+  |  |  |
  |    | |  XDP hook  |                           |  XDP hook  |  |  |
  |    | +------------+                           +------------+  |  |
  |    +----------------------------^-----------------------------+  |
  |                                 |                                |
  ----------------------------------|---------------------------------
  |                                 |                                |
  | Kernel Space         +----------v-----------+                    |
  +----------------------|          NIC         |--------------------+
                         +----------------------+
Figure 5: Example architecture of the eBPF Acceleration Model - using AF_XDP supported vSwitch

The eBPF Acceleration model leverages the extended Berkeley Packet Filter (eBPF) technology [eBPF] to achieve high-performance packet processing. It enables execution of sandboxed programs inside abstract virtual machines within the Linux kernel without changing the kernel source code or loading the kernel module. To accelerate data plane performance, eBPF programs are attached to different BPF hooks inside the linux kernel stack.

eXpress Data Path (XDP) and Traffic Control Ingress/Egress (tc) are the eBPF hook types that are used in different eBPF acceleration CNIs. XDP is the hook at the NIC driver. It is the earliest point in the networking stack that a BPF hook can be attached. Traffic Control Ingress/Egress (tc) is the hook at the networking interface on container incoming/outgoing packet path. eBPF program is triggered to process a packet when it arrives at these locations.”

On the egress datapath side, whenever a packet exits the pod, it first goes through the pod's veth interface. Then, the destination that received the packet depends on the chosen CNI plugin that is used to create container networking. If the chosen CNI plugin is a non-AF_XDP-based CNI, the packet is received by the eBPF program running at veth interface tc hook. If the chosen CNI plugin is an AF_XDP-supported CNI, the packet is received by the AF_XDP socket [AFXDP]. AF_XDP socket is a new Linux socket type which enables an in-kernel short path between the user space and the XDP hook at the networking driver. The eBPF program at the XDP hook redirects the packets from the NIC to the AF_XDP socket instead of the kernel networking stack. Packets are transmitted between user space and AF_XDP socket via a shared memory buffer. Once the egress packet arrived at the AF_XDP socket or tc hook, it is directly forwarded to the NIC.

On the ingress datapath side, eBPF programs at the XDP hook/tc hook pick up packets from the NIC network devices (NIC ports). In case of using AF_XDP CNI plugin [afxdp-cni], there are two operation modes: "primary" and "cdq". In "primary" mode, NIC network devices PF and VF can be directly allocated to pods. Meanwhile, in "cdq" mode, NIC network devices can be efficiently partioned to subfunctions. From there, packets are directly delivered to the veth interface pair or AF_XDP socket (via or not via AF_XDP socket depends on the chosen CNI), bypass all of the kernel network layer processing such as iptables. In case of Cilium CNI [Cilium], context-switching process to the pod network namespace can also be bypassed.

Notable eBPF Acceleration models can be classified into 3 categories below. Their corresponding model architecture are shown in Figure 3, Figure 4, Figure 5.

o non-AF_XDP: eBPF supported CNI such as Calico [Calico], Cilium [Cilium]

o using AF_XDP supported CNI: AF_XDP K8s plugin [afxdp-cni] used by Cloud Native Data Plane project [CNDP]

o using AF_XDP supported vSwitch: OVS-DPDK [ovs-dpdk] and VPP [vpp] are the vSwitches that have AF_XDP device driver support. Userspace CNI [userspace-cni] is used to enable container networking via these vSwitches.

To set up these kinds of eBPF Acceleration networking model, the corresponding CNIs of each model kind need to be installed and configured to map the pod interfaces to the NIC ports. In case of using user-space vSwitch, the AF_XDP-supported version of the vSwitch needs to be installed. The NIC ports can be bound to the vSwitch's AF_XDP PMD via vSwitch's port configurations. Then, packet transfer between pods and vSwitch is configured via Userspace CNI.

Container network performance of Cilium project is reported by the project itself in [cilium-benchmark]. Meanwhile, AF_XDP performance and comparison against DPDK are reported in [intel-AFXDP] and [LPC18-DPDK-AFXDP], respectively.

5.1.2.3. Smart-NIC Acceleration Model
  +------------------------------------------------------------------+
  | User Space                                                       |
  |    +-----------------+                    +-----------------+    |
  |    |       CNF      +|-------+    +-------|+       CNF      |    |
  |    |   +---------+  |default |    |default |  +---------+   |    |
  |    |   | fwd app |  |CNI veth|    |CNI veth|  | fwd app |   |    |
  |    |   +---|-----+  +|-------+    +-------|+  +----|----+   |    |
  |    | +-----|-------+ |                    | +------|------+ |    |
  |    +-|  vf driver  |-+                    +-|  vf driver  |-+    |
  |      +-----^-------+                        +------^------+      |
  |            |                                       |             |
  -------------|---------------------------------------|--------------
  |            +---------+                   +---------+             |
  |               +------|-------------------|------+                |
  |               | +----v-----+       +-----v----+ |                |
  |               | | virtual  |       | virtual  | |                |
  |               | | function |  NIC  | function | |                |
  | Kernel Space  | +----^-----+       +-----^----+ |                |
  +---------------|      |                   |      |----------------+
                  | +----v-------------------v----+ |
                  | |   NIC Classify and Queue    | |
                  | +-----------------------------+ |
                  +---------------------------------+
Figure 6: Examples of Smart-NIC Acceleration Model

Figure 6 shows the Smart-NIC acceleration model, which utilizes packet acceleration support features of the Smart-NIC card.

Single-Root I/O Virtualization (SR-IOV) is the packet acceleration technique supported by Smart-NIC cards from several different vendors. SR-IOV is an extension of PCIe specifications to enable multiple partitions running simultaneously within a system to share PCIe devices. In the NIC, there are virtual replicas of PCI functions known as virtual functions (VF), and each of them is directly connected to each container's network interfaces. Using SR-IOV, data packets from external bypass both kernel and user space and are directly forwarded to container's virtual network interface. SR-IOV network device plugin for Kubernetes [SR-IOV] is recommended to create an special interface at each container controlled by the VF driver.

Smart-NICs from a few vendors also support XDP eBPF packet acceleration technique. It keep the same eBPF Acceleration networking model. The only difference is the XDP eBPF program running at the Smart-NIC card instead of the host machine, which free host machine's CPU resources for application workloads.

To set up the SR-IOV acceleration container network, SR-IOV capable NIC cards and BIOS support for creating NIC's VF are required. After VF creation, VFs need to be bound to a DPDK-compatible driver such as VFIO. Then, pods can be configured to use these VFs via the SR-IOV network plugin configurations. In the case of the eBPF/XDP offloading container network, Cilium is the required CNI with its eBPF offloading feature should be properly configured.

It is worth noting that Smart-NIC features are also available in a new type of programmable networking device called the "any" Processing Units (xPU), which can be a Data Processing Unit (DPU), Infrastructure Processing Unit (IPU), etc. An xPU can include Smart-NIC features, CPU cores, and acceleration engines. xPU can offload, accelerate and isolate network and data processing tasks from the host machine CPU resources. When using an xPU for container network, control plane functions (e.g. Kubernetes kube-proxy) can be offloaded to this device and the accelerated datapath between CNF and the xPU data plane can be realized via different techniques such as SR-IOV or its enhanced Scalable IOV version. Currently, the xPU-based container network is new and has not been widely implemented and documented. An example xPU-supported CNI and xPU network operator might be required for configuring an XPU-based container network (e.g [Bluefield]).

5.1.2.4. Model Combination
  +------------------------------------------------------------------+
  | User Space                                                       |
  | +--------------------+                    +--------------------+ |
  | |         CNF       +|-------+    +-------|+        CNF        | |
  | |     +---------+   |default |    |default |    +---------+    | |
  | |     | fwd app |   |CNI veth|    |CNI veth|    | fwd app |    | |
  | |     +-|-----|-+   +|-------+    +-------|+    +-|-----|-+    | |
  | | +-----|+  +-|----+ |                    | +-----|+  +-|----+ | |
  | +-|  vf  |--|virtio|-+                    +-|  vf  |--|virtio|-+ |
  |   |driver|  |/memif|                        |driver|  |/memif|   |
  |   +---^--+  +---^--+                        +--^---+  +---^--+   |
  |       |         |                              |          |      |
  |       |         |                              |          |      |
  |       |     +---v--------+             +-------v----+     |      |
  |       |     | vhost-user |             | vhost-user |     |      |
  |       |  +--|  / memif   |-------------|  / memif   |--+  |      |
  |       |  |  +------------+             +------------+  |  |      |
  |       |  |      User space DPDK-supported vSwitch      |  |      |
  |       |  +---------------------------------------------+  |      |
  |       |                                                   |      |
  --------|---------------------------------------------------|-------
  |       +-----------+                         +-------------+      |
  |              +----|-------------------------|---+                |
  |              |+---v--+                  +---v--+|                |
  |              ||  vf  |                  |  vf  ||                |
  |              |+------+                  +------+|                |
  | Kernel Space |                                  |                |
  +--------------|                NIC               |----------------+
                 +----------------------------------+
Figure 7: Examples of Model Combination deployment

Figure 7 shows the networking model when combining user-space vSwitch model and Smart-NIC acceleration model. This model is frequently considered in service function chain scenarios when two different types of traffic flows are present. These two types are North/South traffic and East/West traffic.

North/South traffic is traffic betwen NIC and CNFs. Meanwhile, East/West traffic is traffic between multiple CNFs inside the same host machine .An example of packet acceleration technique combination is using user-space vSwitch models such as OVS-DPDK and VPP for East/West traffic and Smart-NIC model such as SR-IOV for North/South traffic. SR-IOV datapath can avoid possible performance bottleneck in North/South direction which might be caused by the user-space vSwitch. User-space vSwitch enables East/West traffic transmission in the user space only.

To set up this combined networking model, Multus CNI should be properly configured to enable the corresponding CNI for each pod's interface. For example, the interface for North/South traffic is configured using SR-IOV network plugin. The interface for East/West traffic is configured using Userspace CNI plugin.

The throughput advantages of these different networking models with different traffic direction cases are reported in [Intel-SRIOV-NFV].

5.2. Resources Configuration

The resources configuration consideration list here is not only applied for the CNF but also other components in a containerized System Under Test (SUT). A Containerized SUT is composed of NICs, possible cables between hosts, kernel and/or vSwitch, and CNFs.

5.2.1. CPU Isolation / NUMA Affinity

CPU pinning enables benefits such as maximizing cache utilization, eliminating operating system thread scheduling overhead as well as coordinating network I/O by guaranteeing resources. One example technology of CPU Pinning in containerized infrastructure is the CPU Manager for Kubernetes (CMK) [CMK]. This technology was proved to be effective in avoiding the "noisy neighbor" problem, as shown in an existing experience [Intel-EPA]. Besides, CPU Isolation techniques' benefits are not only applied for "noisy neighbor" problem. Different CNFs also neighbor each other and neighbor vSwitch if used.

NUMA affects the speed of different CPU cores when accessing different memory regions. CPU cores in the same NUMA nodes can locally access to the shared memory in that node, which is faster than remotely accessing the memory in a different NUMA node. In containerized network, packet forwarding is processed through NIC, CNF and a possible vSwitch based on chosen networking model. NIC's NUMA node alignment can be checked via the PCI devices' node affinity. Meanwhile, specific CPU cores can be directly assigned to CNF and vSwitch via their configuration settings. Network performance can be changed depending on the location of the NUMA node whether it is the same NUMA node where the physical network interface, vSwitch and CNF are attached to. There is benchmarking experience for cross-NUMA performance impacts [cross-NUMA-vineperf]. In that tests, they consist of cross-NUMA performance with 3 scenarios depending on the location of the traffic generator and traffic endpoint. As the results, it was verified as below:

o A single NUMA Node serving multiple interfaces is worse than Cross-NUMA Node performance degradation

o Worse performance with CNF sharing CPUs across NUMA

Note that CPU Pinning and NUMA Affinity configurations considerations might also applied to VM-based VNF. As mentioned above, dedicated CPU cores of a specific NUMA node can be assigned to VNF and vSwitch via their own running configurations. NIC's NUMA node can be checked from the PCI devices' infomration. Host's NUMA nodes can be scheduled to virtual machines by specifying in their settings the chosen nodes.

For this consideration, the additional configuration parameters should be considered for containerized infrastructure benchmarking are:

- Selected CPU Isolation level

- NUMA cores allocation to pod

5.2.2. Pod Hugepages

Hugepage configures a large page size of memory to reduce Translation Lookaside Buffer (TLB) miss rate and increase the application performance. This increases the performance of logical/virtual to physical address lookups performed by a CPU's memory management unit, and overall system performance. In the containerized infrastructure, the container is isolated at the application level, and administrators can set huge pages more granular level (e.g., Kubernetes allows to use of 2M bytes or 1G bytes huge pages for the container). Moreover, this page is dedicated to the application but another process, so the application uses the page more efficiently way. From a network benchmark point of view, however, the impact on general packet processing can be relatively negligible, and it may be necessary to consider the application level to measure the impact together. In the case of using the DPDK application, as reported in [Intel-EPA], it was verified to improve network performance because packet handling processes are running in the application together.

For this consideration, the additional configuration parameters should be considered for containerized infrastructure benchmarking are:

- Pod's hugepage size

5.2.3. Pod CPU Cores and Memory Allocation

Different resources allocation choices may impact the container network performance. These include different CPU cores and RAM allocation to Pods, and different CPU cores allocation to the Poll Mode Driver and the vSwitch. Benchmarking experience from [ViNePERF] which was published in [GLOBECOM-21-benchmarking-kubernetes] verified that:

o 2 CPUs per Pod is insufficient for all packet frame sizes. With large packet frame sizes (over 1024), increasing CPU per pods significantly increases the throughput. Different RAM allocation to Pods also causes different throughput results

o Not assigning dedicated CPU cores to DPDK PMD causes significant performance drops

o Increasing CPU core allocation to OVS-DPDK vSwitch does not affect its performance. However, increasing CPU core allocation to VPP vSwitch results in better latency.

Besides, regarding user-space acceleration model which uses PMD to poll packets to the user-space vSwitch, dedicated CPU cores assignment to PMD's Rx Queues might improve the network performance.

For this consideration, the additional configuration parameters should be considered for containerized infrastructure benchmarking are:

- Pod's CPU cores allocation

- Pod's RAM allocation

5.2.4. AF_XDP Configuration

AF_XDP can operate in two packet polling modes: busy and non-busy. With the busy polling mode, AF_XDP use the same CPU cores for the application and packet Rx/Tx processing process. Meanwhile, the non-busy polling mode use different CPU cores for these two tasks. The chosen AF_XDP mode and the CPU core configuration for application and packet processing task in non-busy polling case might have effect on benchmarking performance. [LPC18-DPDK-AFXDP]

For this consideration, the additional configuration parameters should be considered for containerized infrastructure benchmarking are:

- AF_XDP busy polling mode

- Number of CPU core allocation in AF_XDP non-busy polling mode

5.2.5. Service Function Chaining

When consider benchmarking for containerized and VM-based infrastructure and network functions, benchmarking scenarios may contain various operational use cases. Traditional black-box benchmarking focuses on measuring the in-out performance of packets from physical network ports since the hardware is tightly coupled with its function and only a single function is running on its dedicated hardware. However, in the NFV environment, the physical network port commonly will be connected to multiple CNFs(i.e., Multiple PVP test setup architectures were described in [ETSI-TST-009]) rather than dedicated to a single CNF. This scenario is called Service Function Chaining. Therefore, benchmarking scenarios should reflect operational considerations such as the number of CNFs or network services defined by a set of VNFs in a single host. [service-density] proposed a way for measuring the performance of multiple NFV service instances at a varied service density on a single host, which is one example of these operational benchmarking aspects. Another aspect in benchmarking service function chaining scenario should be considered is different network acceleration technologies. Network performance differences may occur because of different traffic patterns based on the provided acceleration method.

For this consideration, the additional configuration parameters should be considered for containerized infrastructure benchmarking are:

- Number of CNFs/pod

- Selected CNI Plugin

5.2.6. Other Considerations

Apart from the single-host test scenario, the multi-hosts scenario should also be considered in container network benchmarking, where container services are deployed across different servers. To provide network connectivity for CNFs between different server nodes, inter-node networking is required. According to [ETSI-NFV-IFA-038], there are several technologies to enable inter-node network: overlay technologies using a tunnel endpoint (e.g. VXLAN, IP in IP), routing using Border Gateway Protocol (BGP), layer 2 underlay, direct network using dedicated NIC for each pod, or load balancer using LoadBalancer service type in Kubernetes. Different protocols from these technologies may cause performance differences in container networking.

6. IANA Considerations

This document does not require any IANA actions.

7. Security Considerations

Benchmarking activities as described in this memo are limited to technology characterization of a DUT/SUT using controlled stimuli in a laboratory environment with dedicated address space and the constraints specified in the sections above.

The benchmarking network topology will be an independent test setup and MUST NOT be connected to devices that may forward the test traffic into a production network or misroute traffic to the test management network.

Further, benchmarking is performed on a "black-box" basis and relies solely on measurements observable external to the DUT/SUT.

Special capabilities SHOULD NOT exist in the DUT/SUT specifically for benchmarking purposes. Any implications for network security arising from the DUT/SUT SHOULD be identical in the lab and in production networks.

8. References

8.1. Informative References

[AFXDP]
"AF_XDP", , <https://www.kernel.org/doc/html/v4.19/networking/af_xdp.html>.
[afxdp-cni]
"AF_XDP Plugins for Kubernetes", <https://github.com/intel/afxdp-plugins-for-kubernetes>.
[Bluefield]
Red Hat, "OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs", , <https://access.redhat.com/articles/6804281>.
[Calico]
"Project Calico", , <https://docs.projectcalico.org/>.
[Cilium]
"Cilium Documentation", , <https://docs.cilium.io/en/stable//>.
[cilium-benchmark]
Cilium, "CNI Benchmark: Understanding Cilium Network Performance", , <https://cilium.io/blog/2021/05/11/cni-benchmark>.
[CMK]
Intel, "Userspace CNI Plugin", , <https://github.com/intel/CPU-Manager-for-Kubernetes>.
[CNDP]
"CNDP - Cloud Native Data Plane", , <https://cndp.io/>.
[cross-NUMA-vineperf]
Anuket Project, "Cross-NUMA performance measurements with VSPERF", , <https://wiki.anuket.io/display/HOME/Cross-NUMA+performance+measurements+with+VSPERF>.
[Docker-network]
"Docker, Libnetwork design", , <https://github.com/docker/libnetwork/>.
[eBPF]
"eBPF, extended Berkeley Packet Filter", , <https://www.iovisor.org/technology/ebpf>.
[ETSI-NFV-IFA-038]
"Network Functions Virtualisation (NFV) Release 4; Architectural Framework; Report on network connectivity for container-based VNF", .
[ETSI-TST-009]
"Network Functions Virtualisation (NFV) Release 3; Testing; Specification of Networking Benchmarks and Measurement Methods for NFVI", .
[Flannel]
"flannel 0.10.0 Documentation", , <https://coreos.com/flannel/>.
[GLOBECOM-21-benchmarking-kubernetes]
Sridhar, R., Paganelli, F., and A. Morton, "Benchmarking Kubernetes Container-Networking for Telco Usecases", .
[intel-AFXDP]
Karlsson, M., "AF_XDP Sockets: High Performance Networking for Cloud-Native Networking Technology Guide", .
[Intel-EPA]
Intel, "Enhanced Platform Awareness in Kubernetes", , <https://builders.intel.com/docs/networkbuilders/enhanced-platform-awareness-feature-brief.pdf>.
[Intel-SRIOV-NFV]
Patrick, K. and J. Brian, "SR-IOV for NFV Solutions Practical Considerations and Thoughts", .
[LPC18-DPDK-AFXDP]
Karlsson, M. and B. Topel, "The Path to DPDK Speeds for AF_XDP", .
[OVN]
"How to use Open Virtual Networking with Kubernetes", , <https://github.com/ovn-org/ovn-kubernetes>.
[OVS]
"Open Virtual Switch", , <https://www.openvswitch.org/>.
[ovs-dpdk]
"Open vSwitch with DPDK", , <http://docs.openvswitch.org/en/latest/intro/install/dpdk/>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8172]
Morton, A., "Considerations for Benchmarking Virtual Network Functions and Their Infrastructure", RFC 8172, , <https://www.rfc-editor.org/rfc/rfc8172>.
[RFC8204]
Tahhan, M., O'Mahony, B., and A. Morton, "Benchmarking Virtual Switches in the Open Platform for NFV (OPNFV)", RFC 8204, , <https://www.rfc-editor.org/rfc/rfc8204>.
[service-density]
Konstantynowicz, M. and P. Mikus, "NFV Service Density Benchmarking", , <https://tools.ietf.org/html/draft-mkonstan-nf-service-density-00>.
[SR-IOV]
"SRIOV for Container-networking", , <https://github.com/intel/sriov-cni>.
[userspace-cni]
Intel, "CPU Manager for Kubernetes", , <https://github.com/intel/userspace-cni-network-plugin>.
[ViNePERF]
"Project: Virtual Network Performance for Telco NFV", <https://wiki.anuket.io/display/HOME/ViNePERF>.
[vpp]
"VPP with Containers", , <https://fdio-vpp.readthedocs.io/en/latest/usecases/containers.html>.

Appendix A. Change Log (to be removed by RFC Editor before publication)

A.1. Since draft-ietf-bmwg-containerized-infra-01

Addressed comments and feedbacks from the related RFC 8204 document author - Maryam Tahhan:

Updated general description for all accelerated networking models: using Multus CNI to enable multiple CNI configurations for hybrid container networking stack.

Updated illustration of the default CNI network interface in all accelerated networking models.

Updated illustration of the separated networking stack inside CNF in normal non-acceleration networking model.

Made minor correction on the descriptions of AF_XDP and eBPF datapath.

Added xPU networking device mentioning in the Smart-NIC acceleration model.

Added AF_XDP busy/non-busy polling mode resource configuration consideration.

A.2. Since draft-ietf-bmwg-containerized-infra-00

Minor editorial changes and nits correction.

A.3. Since draft-dcn-bmwg-containerized-infra-13

Updated environment setup repeatability guidance for all mentioned container networking models.

A.4. Since draft-dcn-bmwg-containerized-infra-12

Updated scope to clearly specify the gaps of related RFCs.

A.5. Since draft-dcn-bmwg-containerized-infra-11

Merged Containerized infrastructure overview into Introduction section

Added Scope section which briefly explains the draft contribution in a clear way.

Mentioned the additional benchmarking configuration parameters for containerized infrastructure benchmarking in each Benchmarking Consideration sub-sections.

Removed Benchmarking Experiences Appendixes

A.6. Since draft-dcn-bmwg-containerized-infra-10

Updated Benchmarking Experience appendixes with latest results from Hackathon events.

Re-orgianized Benchmarking Experience appendixes to match with the the proposed benchmarking consideration inside the draft (Networking Models and Resources Configuration)

Minor enhancement changes to Introduction and Resources Configuration consideration sections such as general description for container network plugin, which resources can also be applied for VM-VNF.

A.7. Since draft-dcn-bmwg-containerized-infra-09

Removed Additional Deployment Scenarios (section 4.1 of version 09). We agreed with reviews from VinePerf that performance difference between with-VM and without-VM scenarios are negligible

Removed Additional Configuration Parameters (section 4.2 of version 09). We agreed with reviews from VinePerf that these parameters are explained in Performance Impacts/Resources Configuration section

As VinePerf suggestion to categorize the networking models based on how they can accelerate the network performances, rename titles of section 4.3.1 and 4.3.2 of version 09: Kernel-space vSwitch model and User-space vSwitch model to Kernel-space non-Acceleration model and User-space Acceleration model. Update corresponding explanation of Kernel-space non-Acceleration model

VinePerf suggested to replace the general architecture of eBPF Acceleration model with 3 seperate architecture for 3 different eBPF Acceleration model: non-AF_XDP, using AF_XDP supported CNI, and using user-space vSwitch which support AF_XDP PMD. Update corresponding explanation of eBPF Acceleration model

Renamed Performance Impacts section (section 4.4 of version 09) to Resources Configuration.

We agreed with VinePerf reviews to add "CPU Cores and Memory Allocation" consideration into Resources Configuration section

A.8. Since draft-dcn-bmwg-containerized-infra-08

Added new Section 4. Benchmarking Considerations. Previous Section 4. Networking Models in Containerized Infrastructure was moved into this new Section 4 as a subsection

Re-organized Additional Deployment Scenarios for containerized network benchmarking contents from Section 3. Containerized Infrastructure Overview to new Section 4. Benchmarking Considerations as the Addtional Deployment Scenarios subsection

Added new Addtional Configuration Parameters subsection to new Section 4. Benchmarking Considerations

Moved previous Section 5. Performance Impacts into new Section 4. Benchmarking Considerations as the Deployment settings impact on network performance section

Updated eBPF Acceleration Model with AF_XDP deployment option

Enhanced Abstract and Introduction's description about the draft's motivation and contribution.

A.9. Since draft-dcn-bmwg-containerized-infra-07

Added eBPF Acceleration Model in Section 4. Networking Models in Containerized Infrastructure

Added Model Combination in Section 4. Networking Models in Containerized Infrastructure

Added Service Function Chaining in Section 5. Performance Impacts

Added Troubleshooting and Results for SRIOV-DPDK Benchmarking Experience

A.10. Since draft-dcn-bmwg-containerized-infra-06

Added Benchmarking Experience of Multi-pod Test

A.11. Since draft-dcn-bmwg-containerized-infra-05

Removed Section 3. Benchmarking Considerations, Removed Section 4. Benchmarking Scenarios for the Containerized Infrastructure

Added new Section 3. Containerized Infrastructure Overview, Added new Section 4. Networking Models in Containerized Infrastructure. Added new Section 5. Performance Impacts

Re-organized Subsection Comparison with the VM-based Infrastructure of previous Section 3. Benchmarking Considerations and previous Section 4.Benchmarking Scenarios for the Containerized Infrastructure to new Section 3. Containerized Infrastructure Overview

Re-organized Subsection Container Networking Classification of previous Section 3. Benchmarking Considerations to new Section 4. Networking Models in Containerized Infrastructure. Kernel-space vSwitch models and User-space vSwitch models were presented as seperate subsections in this new Section 4.

Re-organized Subsection Resources Considerations of previous Section 3. Benchmarking Considerations to new Section 5. Performance Impacts as 2 seperate subsections CPU Isolation / NUMA Affinity and Hugepages. Previous Section 5. Additional Considerations was moved into this new Section 5 as the Additional Considerations subsection.

Moved Benchmarking Experience contents to Appendix

A.12. Since draft-dcn-bmwg-containerized-infra-04

Added Benchmarking Experience of SRIOV-DPDK.

A.13. Since draft-dcn-bmwg-containerized-infra-03

Added Benchmarking Experience of Contiv-VPP.

A.14. Since draft-dcn-bmwg-containerized-infra-02

Editorial changes only.

A.15. Since draft-dcn-bmwg-containerized-infra-01

Editorial changes only.

A.16. Since draft-dcn-bmwg-containerized-infra-00

Added Container Networking Classification in Section 3.Benchmarking Considerations (Kernel Space network model and User Space network model).

Added Resources Considerations in Section 3.Benchmarking Considerations(Hugepage, NUMA, RX/TX Multiple-Queue).

Renamed Section 4.Test Scenarios to Benchmarking Scenarios for the Containerized Infrastructure, added 2 additional scenarios BMP2VMP and VMP2VMP.

Added Additional Consideration as new Section 5.

Contributors

Kyoungjae Sun - ETRI - Republic of Korea

Email: [email protected]

Hyunsik Yang - Interdigital - United States of America

Email: [email protected]

Acknowledgments

The authors would like to thank Al Morton and Maryam Tahhan for their valuable comments and reviews for this work.

Authors' Addresses

Minh-Ngoc Tran
Soongsil University
369, Sangdo-ro, Dongjak-gu
Seoul
06978
Republic of Korea
Phone: +82 28200841
Sridhar Rao
The Linux Foundation
B801, Renaissance Temple Bells, Yeshwantpur
Bangalore 560022
India
Jangwon Lee
Soongsil University
369, Sangdo-ro, Dongjak-gu
Seoul
06978
Republic of Korea
Younghan Kim
Soongsil University
369, Sangdo-ro, Dongjak-gu
Seoul
06978
Republic of Korea