Internet-Draft | Computing Information Description in CAT | July 2024 |
Du, et al. | Expires 7 January 2025 | [Page] |
This document describes the considerations and requirements of the computing information that needs to be notified into the network in Computing-Aware Traffic Steering (CATS).¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Computing-Aware Traffic Steering (CATS) is proposed to support steering the traffic among different service sites according to both the real-time network and computing resource status as mentioned in [I-D.ietf-cats-usecases-requirements]. It requires the network to be aware of computing resource information and select a service instance based on the joint metric of computing and networking.¶
In order to generate steering strategies, the modeling of computing capability is required. Different from the network, computing capability is more complex to be measured. For instance, it is hard to predict how long will be used to process a specific computing task based on the different computing resource. It is hard to calculate and will be influenced by the whole internal environments of computing nodes. But there are some indicators has been used to describe the computing capability of hardware and computing service, as mentioned in Appendix A.¶
Based on the related works and the demand of CATS traffic steering, this document analyzes the types of computing resources and tasks, providing the factors to be considered when modeling and evaluating the computing resource capability. The detailed modeling job of the computing resource is not the object of this document.¶
This document makes use of the following terms:¶
Modeling itself provides a general method to evaluate the capabilities of computing resource. For CATS, modeling-based computing resource representation is the basis for subsequent traffic steering. In addition, for different applications, it may be optimized based on general modeling methods to establish a set of models that conform to their own characteristics, so as to generate corresponding representation methods. Moreover, in order to use computing resource status more efficiently and protect privacy, modeling for the further representation of resource information needs to support the necessary simplification and obfuscation. However, there are difficulties in compute resources modeling.¶
Heterogeneous computing resources have different characteristics. For example, CPUs usually deal with serial processing and are most widely used. GPUs usually handle parallel computing, such as rendering of display tasks, and are widely used in artificial intelligence and neural network. FPGA and ASIC are usually used to handle domain specific computing tasks. These basic computing chips are constructed to be different device types. For example, standard servers, AI servers, all-in-one machines, etc. These computing devices have multi-dimensional and hierarchical resources, such as cache, storage, communication, etc., and these dimensions will affect each other and further affect the overall level of computing capability. Moreover, these computing resources may further be virtualized to provide on-demand cloud services, which make the modeling even harder.¶
Modeling computing resources also depends on service types. For example, videos and distributed transaction systems may need higher computing capabilities measured in queries per second(QPS), while AI inference systems may need higher computing capabilities measured in tokens per second(TPS), and video or object recognition may need higher computing capabilities measured in frame per second(FPS). Computing capabilities have different meanings towards different applications. Moreover, different computing tasks require different computing precision, such as integer calculation, floating-point calculation, hash calculation, etc.¶
We need to use the computing resource modeling in two procedures. The first is the service deployment, and the second is the traffic steering, in which the later is more related to the CATS work. However, the service deployment is the precondition of CATS, which enables the assumption that the service can be accessed in multiple places.¶
In the procedure of service deployment, a control or management device either in the CATS domain or in the Computing domain can collect the computing information and make the service deployment decisions. As the procedure is not that real time, it can collect more information about the service points. Many existing jobs can be reused here such as the ones used in the data centers.¶
In the procedure of traffic steering, we can use limited metrics to trigger the change of the policy for the service on path, so that a quick response can be ensured for the change of the computing status.¶
For the modeling mechanism based on CATS-defined format, the decision point can collect more information to support both the service deployment and the traffic steering. On the contrary, the mechanism based on application-defined method will be more suitable for the CATS, in which only necessary metrics need to be notified into the network or called the CATS domain. The detailed requirements of metric definition can be found in Section 5.¶
Figure 1 shows the case of modeling based on CATS-defined Format. CATS provides the modeling format to the computing domain to evaluate the computing resource capability of computing domain and then get the result based on the unified interface, which will define the properties should be notified to CATS. Then CATS could select the specific service instance based on the computing resource and network resource status.¶
In this way, the CATS domain and computing domain has the relative loose boundary based on the situation that the CATS service and computing resource belongs to the same provider, CATS could be aware of computing resource more or less, depending on the privacy preserving demand of the computing domain at the same time. The exposed computing capability includes the static information of computing node category/level and the dynamic capabilities information of computing node.¶
Based on the static information, some visualization functions can be implemented on the management plane to know the global view of computing resources, which could also help the deployment of applications considering the overall distributed status of computing and network resource. Based on the dynamic information, CATS could steer category-based applications traffic based on the unified modeling format and interface.¶
Figure 2 shows the case of modeling based on application-defined method. Computing resource of the specific application evaluates its computing capability by itself, and then notifies the result which might be the index of real time computing level to CATS. Then CATS selects the specific service instance based on the computing index.¶
In this way, the CATS domain and computing domain has the strict boundary based on the situation that the CATS service and computing resource belongs to the different providers. CATS is just aware of the index of computing resource which is defined by application, don't know the real status of computing domain, and the traffic steering right is potentially controlled under application itself. If CATS is authorized by application, it could steer traffic based on network status at the same time.¶
To support a computing service in CATS, we need to evaluate the comprehensive service performance in a service instance, which is influenced by the coordination of chip, storage, network, platform software, etc. It is to say that the service support capabilities are influenced by multidimensional factors. After the capability values are generated, they are notified to the decision point in the network to influence the traffic steering. However, the decision point in the network, for example the Ingress Node, only cares about how to use to capability values to do the traffic steering, but does not care about the way how the capability values are generated.¶
From the aspect of services, they need an evaluating system to generate one or more capability values. To achieve the best LB result, different services or service types may have different ways to evaluate the capability. However, it is out of scope of the document.¶
From the aspect the decision point in the network, it only needs to understand the way to use the values, and implement the related policy. This document would mainly discuss about this aspect.¶
It is assumed that the same service can be provided in multiple places in the CATS. In the different service instances, it is common that they have different kinds of computing resources, and different utilization rate of the computing resources.¶
In the CATS, the decision point, which should be a node in the network, should be aware of the network status and the computing status, and accordingly choose a proper service point for the client.¶
A general process to steer the CATS traffic is described as below. The CATS packets have an destination address as the service ID that is announced by the different service points.¶
Firstly, the service points need to collect some specific computing information that need to be sent into the network following a uniform format so that the decision point can understand the computing information. In this step, only necessary computing information needs to be considered, so as to avoid exposing too much information of the service points.¶
Secondly, the service instances send the computing information into the network by some means, and update it periodic or on demand.¶
Thirdly, the decision point receives the computing information, and makes a decision for the specific service related to the service ID. Hence, the route for the service ID on the Ingress is established or updated.¶
Fourthly, the traffic for the service ID reaching the Ingress node would be identified and steered according to the policy in the step3.¶
In fact, what to send, how to send, and the optimization objective of the policy are all related to the design of the computing resource modeling in CATS, meanwhile they would influence each other. Some requirements are listed below.¶
The optimization objective of the policy in the decision point may be various. For example, it may be the lowest latency of the sum of the network delay and the computing delay, or it may be an overall better load balance result, in which we would prefer the service points that could support more clients.¶
The update frequency of the computing metrics may be various. Some of the metrics may be more dynamic, and some are relatively static.¶
The notification ways of the computing metrics may be various. According to its update frequency, we may choose different ways to update the metric.¶
Metric merging process should be supported when multiple service instances are behind the same Egress.¶
The target in CATS mainly concerns about the service point selection and traffic steering in Layer3, in which we do not need all computing information of the service points. Hence, we can start with simple cases in the work of the computing resource modeling in CATS. Some design principles can be considered.¶
Simplicity: The computing metrics in CATS SHOULD be few and simple, so as to avoid exposing too much information of the service points.¶
Scalability: The computing metrics in CATS SHOULD be evolveable for the future extensions.¶
Interoperability: The computing metrics in CATS SHOULD be vendor-independent, and OS-independent.¶
Stability: computing metrics SHOULD NOT incur too much overhead in protocol design, and it can be stabilized to be used.¶
Accuracy: computing metrics SHOULD be effective for path selection decision making, and the accuracy SHOULD be guaranteed.¶
Various metrics can be considered in CATS, and perhaps different services would need different metrics. However, we can start with simple cases.¶
In CATS, a straightforward intent is to minimal the total delay in the network domain and the computing domain. Thus, we can have a start point for the metric designation in CATS considering only the delay information. In this case, the decision point can collect the network delay and the computing delay, and make a decision about the optimal service point accordingly. The advantage of this method is that it is simple and easy to start; meanwhile, the network metric and the computing metric have the same unit of measure. The network delay can be the latency between the Ingress node and Egress node in the network. The computing delay can be generated by the server, which has the meaning of “the estimate of the duration of my processing of request”. It is usually an average value for the service request. The optimization objective of traffic steering in this scenario is the minimal total delay for the client.¶
Another metric that can be considered is the server capability. For example, one server can support 100 simultaneous sessions and another can support 10,000 simultaneous sessions. The value can be generated by the server when deploying the service instance. The metric can work alone. In this scenario, the decision point can do a Load Balance job according to the server capability. For example, the decision process can be load balancing after pruning the service points with poor network latency metrics. Also, the metric can work with the computing delay metric. For example, in this scenario, we can prune the service points with poor total latency metrics before the load balancing.¶
In future, we can also consider other metrics, which may be more dynamic. Besides, for some other optimization objectives, we can consider other metrics, even metrics about energy consumption. However, in this cases, the decision point needs to consider more dimensions of metrics. A suggestion is that we should firstly make sure the service point is available, which means the service point can still accept more sessions, and then select a optimal target service point according to the optimization objective.¶
To enable the basic cooperation in CATS, we need one or a set of default computing metrics to be notified into the network. All the CATS Ingresses need to understand the default metrics and trigger the same or similar operations, i.e., as the default policies, inside the router. The detailed procedures inside the Ingresses are vendor-specific.¶
By comparison, other metrics would be optionally, although perhaps they can obtain a better or more preferred LB result than the default ones. If the Ingress receives the additional metrics and can understand them, it can use the optional metrics to update the default forwarding policy for the routes of the anycast IP.¶
There are two kinds of forwarding treatments on the Ingress. Although they are implementations inside the equipment, we give a general description about them here, because they are related to the default metric selection.¶
The first one is that the Ingress will deploy several routes for the anycast IP, but among them only one is active, and others are for backup and are set to inactive. The second one it that the Ingress can have multiple active routes for the anycast IP, and each route has a dedicated weight, so that a load balancing can be done within the Ingress.¶
The advantage of the first one is that it can select a best service instance for the client according to the network and computing status. However, its disadvantage is that the Ingress will forward all the new clients to a single service point before the policy is updated, which will potentially cause the service point to become busy. For the second one, it may achieve a better LB result.¶
An initial proposal of the default metrics for the default policies is that we can always send the two metrics mentioned in the last paragraph, i.e., the computing delay and the server capability. At least one of them should be valid. The bits of the computing delay or the server capability are set to all "zero" will be considered invalid, and other values are considered valid. Meanwhile, the bits of the computing delay or the server capability are set to all "one" stands for the service point is temporary busy, and the Ingress should not send new clients to that service point. Alternatively, we can also add another simple metric to indicate the busy or not status. However, this metric is relatively more dynamic than the former two.¶
The modeling of the network resource is optional, which depends on how to select the service instance and network path. For some applications which care both network and computing resource, the CATS service provider also need to consider the modeling of network and computing together.¶
The network structure can be represented as graphs, where the nodes represent the network devices and the edges represent the network path. It should evaluate the single node, the network links and the E2E performance.¶
When to consider both the computing and network status at the same time, the comprehensive modeling of computing and network might be used. For example, to measure all the resource in a unified dimension, such as latency, reliability, etc.¶
If there is no strict demand of consider them at same time, for instance, consider computing status first and then network status. CATS could select the service instance at first, then to mark identifier for network path selection of network itself. In this situation, the network modeling is not that needed. Existing mechanisms on the control plane or the management plane in the network can be used to obtain the network metrics.¶
The application always has its own demands for network and computing resource, for instance we can see the HD video always requires the high bandwidth and the PC game always requires the better GPU and memory. The application is identified by using the Service Identifier in the network, which can indicate its demands in a certain degree.¶
The modeling of the application demand is optional, which depends on whether the application could tell the demands to the network, or what it could tell. Once the CATS knows the application's demand, there should be a mapping between application demand and the modeling of the computing and/or network resource.¶
TBD.¶
The author would like to thank Adrian Farrel, Joel Halpern, Tony Li, Thomas Fossati, Dirk Trossen, Linda Dunbar for their valuable suggestions to this document.¶
The following people have substantially contributed to this document:¶
Yuexia Fu China Mobile [email protected] Jing Wang China Mobile [email protected] Peng Liu China Mobile [email protected] Wenjing Li Beijing University of Posts and Telecommunications [email protected] Lanlan Rui Beijing University of Posts and Telecommunications [email protected]¶
Some related work has been proposed to measurement and evaluate the computing capability, which could be the basis of computing capability modeling.¶
[cloud-network-edge] proposed to allocate and adjust corresponding resources to users according to the demands of computing, storage and network resources.¶
[heterogeneous-multicore-architectures] proposed to design heterogeneous multi-core architectures according to different customization, such as CPU microprocessors with ultra-low power consumption and high code density, low power microprocessor with FPU, and a high-performance application processor with FPU and MMU support based on a completely unordered multi problem architecture.¶
[ARM-based] proposed the cluster scheduling model that is combined with GPU virtualization and designed a hierarchical cluster resource management framework, which can make the heterogeneous CPU-GPU cluster be effectively used.¶
The hardware cloud service providers have also disclosed their parameter indicators for computing services:¶
[One-api] provides a collection of programming languages and cross architecture libraries across different architectures, to be compatible with heterogeneous computing resources, including CPU, GPU, FPGA, and others. [Amazon] uses the computing resource parameters when evaluating the performance, including the average CPU utilization, average number of bytes received and sent out, and average application load balancer. Alibaba cloud [Aliyun] gives the indicators including vcpu, memory, local storage, network basic and burst bandwidth capacity, network receiving and contracting capability, etc., when providing cloud servers service. [Tencent-cloud] uses vcpu, memory (GB), network receiving and sending (PPS), number of queues, intranet bandwidth capacity (Gbps), dominant frequency, etc.¶