Internet-Draft | Network Management Agent Concept | October 2024 |
Zhao, et al. | Expires 23 April 2025 | [Page] |
With the development of AI(Artificial Intelligence) technology, large model have shown significant advantages and great potential in recognition, understanding, decision-making, and generation, and can well match the self-intelligent network management requirements for the goal of autonomous network[TMF-IG1230] or Intent-based Networking [RFC9315], and can be used as one of the potential driving technologies to drive high-level autonomous networks. When introducing AI for network management, how to integrate AI technology and deal with the relationship with the existing network management entity (such as network controller) is the focus of research and standardization.¶
This document presents the concept of AI based network management agent(NMA), provides the basic definition and reference architecture of NMA, discusses the relationship of NMA with traditional network controller or other network management entity by exploring the delpoyment mode of NMA, and proposes the comman processing flow and typical application scenarios of NMA.¶
This note is to be removed before publishing as an RFC.¶
Discussion of this document takes place on the Network Management Operations Working Group mailing list ([email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/.¶
Source for this draft and an issue tracker can be found at https://github.com/ietf-wg-nmop/draft-ietf-nmop-digital-map-concept.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 23 April 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
As the types of operator services become increasingly diverse, the complexity and difficulty of network operations and maintenance continue to grow. On one hand, new service scenarios such as industrial internet, vehicle-road collaboration, and 5GtoB for vertical industries are constantly emerging, and customer services like Extended Reality (XR), Virtual Reality (VR), and smart home are becoming more abundant, with a continuous increase in network access volume. On the other hand, with the popularization of 5G and gigabit optical networks, operators' networks are facing a situation where networks from 2G to 5G coexist. The network protocols and characteristics vary across different network domains, leading to a continuous increase in the difficulty and complexity of network operations and maintenance. Relying solely on traditional manual operations and maintenance methods can no longer meet the increasingly complex network operations and maintenance demands. The level of network intelligence has become a key factor directly affecting network performance and user experience. Against this backdrop, enhancing the level of network intelligence and creating Autonomous Networks (AN)[TMF-IG1230] has become a global consensus among operators, with mainstream operators releasing goals and plans to achieve Level 4 (L4) autonomous networks by 2025.¶
L4+ AN sets higher requirement in intention, decision-making, analysis, perception, and execution. Artificial Intelligence (AI) large model technology has shown significant advantages and great potential in identification, understanding, decision-making, and generation. It has technical features such as multimodal fusion perception capabilities, more user-friendly human-computer interaction and knowledge Q&A capabilities, and content generation capabilities, which can well match the new requirements of Level 4 Autonomous Networks and already be one of the core driving technologies to achieve high-level autonomous networks.¶
The application and deployment methods of AI after its introduction are still unclear, as well as the relationship between AI and the existing network management and control systems, and in what form it can help network management, which are key issues that need to be discussed currently.¶
Therefore, this document proposes the concept of AI-based network management agent (NMA), defines the reference architecture of NMA, and discusses the application mode, general task processing flow, interface requirements, and typical application scenarios of NMA.¶
AI: Artificial Intelligence¶
LLM: Large Language Model¶
NMA: Network Management Agent, refers to AI based network management agent¶
The document defines the following terms:¶
A network management entity with autonomous task processing capabilities, which is encapsulated based on AI algorithm or AI model, and has task intent [RFC9315] perception, planning, decision-making, and execution capabilities. It can understand the input operation intent through AI model, call other functional components of the control system or external interfaces to complete task processing, and return processing results. For different application scenarios, NMA can be subdivided into multiple scenario-oriented agents.¶
The instantiated agent applications which can automatically perform certain network management tasks for specific network management scenarios. For different application scenarios, there can be multiple scenario-oriented agent instances (like apps in the phone), which can be called “NMA instance” for short in this document.¶
The network management agent (NMA) is a new network management entity with autonomous task processing capabilities, which is encapsulated based on AI algorithms or AI models, and has task intent perception, task planning, decision-making, and execution capabilities. It can understand the input operation intent through AI models, call other functional components of the control system or external interfaces to complete task processing, and return processing results automatically.¶
Based on the traditional AI agent concepts and frameworks, this document presents the architecture of AI agent for network management as shown in Figure 1.¶
The NMA consists of two main layers, the specific components and functions of each layer are as follows:¶
Base Layer includes AI based basic services as well as knowledge and memory subsystem.¶
Provide a unified intelligent agent engine framework, build interactive intelligence public capabilities, simplify application development, integrate Large Language Models (LLM), knowledge retrieval, API invocations, etc., to achieve the full process orchestration from intent understanding, task planning, tool invocation to task execution.¶
Provides unified search for local multi-type knowledge bases (vector knowledge base, system online help, operation and maintenance data logs), combines LLM to complete knowledge fusion and extraction, and improves the accuracy of downstream tasks (knowledge Q&A/task planning, etc.). Realizes knowledge injection and integrated retrieval. Among them, the knowledge base and knowledge retrieval capabilities can be deployed inside or outside the NMA according to actual needs.¶
Instance Layer includes agent instances and instance management functions.¶
It refers to the instantiated agent applications which can automatically perform certain network management tasks for specific network management scenarios. When an agent instance receives a request from the user, it can leverage the capabilities of the base layer to address complex tasks in various network operational and maintenance scenarios. It achieves understanding of task intent, plans and decomposes sub-goals, acquires and distributes information, and flexibly schedules AI models as well as invoke related function APIs to complete the execution of specific tasks, and then feeds the execution results back to the users.¶
For different application scenarios, there can be multiple scenario-oriented agent instances (like apps in the phone), which can be called “NMA instance” for short in this document. Aimed at the network planning, construction, maintenance, optimization, and operation scenarios, the main NMA instances could include:¶
Network Fault Handling Instance: This instance can be created by pre-training specific AI model based on the network troubleshooting guidance documents, network equipment product documents, and other materials. The instance can solidify the fault handling experience of experts, and realize fault impact analysis, root cause self-diagnosis, and self-repair of network faults by orchestrating and calling models or network control APIs. It also interfaces with the list dispatching system to achieve list self-closed loop, etc.¶
Network Planning Instance: The instance can make use of the capabilities of AI large model to understand the network planning intent (user intent, business development goals, network construction plans, etc.) through LLM technology, and analyzes and forecasts the current network resource usage (traffic, performance, user scale, resource utilization, etc.) to output planning schemes.¶
Network Optimization Instance: Understands the network optimization goal through natural language, converts the optimization intent into network optimization constraint rules, such as network load thresholds, service route optimization strategies, etc. The instance can use traffic prediction models to predict the future traffic and bandwidth utilization of the entire network, automatically generate resource, hidden danger, performance, traffic, and other prediction results, and can automatically generate optimization strategies based on the prediction results to perform traffic pre-diversion, autonomous decision-making, and automatic execution to achieve dynamic energy saving of equipment and optimal traffic of the entire network, etc.¶
Intelligent Assistant Instance: This instance can have open Q&A capability based on LLM, providing a dialogue Q&A style operation and maintenance. Users can "one-click" input fault descriptions or resource names in natural language, and the instance will automatically perform intent recognition and query to significantly improve the efficiency of knowledge questioning, fault reporting, and maintenance support.¶
Implements basic management capabilities such as registration of intelligent agent instances, lifecycle management, operation monitoring, and log auditing. It also provides manual takeover switch control capabilities, providing platform support for intelligent agent collaboration and integrated evolution.¶
The NMA can be part of existing SDN-based network controller or be an independent system. Correspondingly, there can be two deployment modes between the NMA and the original network control system (or controller), as shown in Figure 2.¶
As shown in Figure 2(a), NMA is independently deployed from the original network management and control system (MCS). NMA and MCS are independent systems. A new east-west interface needs to be added between NMA and MCS to achieve capability calling and result feedback operations. This interface can be called “MCS_A_I”. In this deployment mode, MCS use southbound interface (SBI) to interact with physical network, while an intelligent southbound interface (abbreviated as “I_SBI”) needs to be added between NMA and the underlying physical network.¶
As shown in Figure 2(b), NMA is integrated and deployed with the original network management and control system (MCS), and the NMA serves as a function of MCS. NMA interacts with original function modules through internal interface (abbreviated as “I_I”). The enhanced MCS interacts with underlay physical network through extended SBI (abbreviated as “E_SBI”).¶
The specific functional requirements and information model definition of interfaces mentioned above will be discussed in the following version.¶
The embedded AI model within NMA serves as the interface for user information input, and NMA instance uses the large model as the interface to clarify problems through multiple rounds, analyze positioning, generate plans, invoke interfaces/tools to handle problems, and complete closed-loop processing of problems, so as to build end-to-end problem processing assistance capabilities.¶
The common processing flow of NMA instance are shown in Figure 3. The processing steps include:¶
User/Network Management Task Input: Input the user’s task information Through multiple rounds of natural language interaction.¶
Intent Analysis: Analysis user task intent through AI model reasoning provided by the AI based basic services within NMA.¶
Task Decomposition: Split the task into detailed operations to be performed based on the analyzed intent of the task.¶
Tool/API Invocation: Call the corresponding tool or function API to complete the execution of each operation listed in step 3). The toolkit refers to the collection of all tools that can be used directly to manage and operate physical networks, which can include management functions from existing MCS, EMS, or standalone other management tools. The toolkit can include service configuration API/Tool, alarm monitor API/Tool, performance monitor API/Tool, network optimization API/Tool, topology management API/Tool, etc.¶
Process Encapsulation: Encapsulate each execution step. According to the order or dependency of all the operations, package the individual operation results into the execution result of the entire task.¶
Executive result analysis: Analyze the task processing results and return to the user.¶
Through above processing flow, NMA can achieve closed-loop automated processing of tasks and constructing end-to-end intelligent network maintenance assistance capabilities. For example, in the intelligent troubleshooting scenario, NMA can identify the cause of the fault and call the corresponding interfaces to handle it, such as creating a troubleshooting order, automatically initiating rerouting/optical power optimization, and other troubleshooting operations, and automatically verifying the progress of the order execution, with feedback on the troubleshooting results after the job order is completed.¶
The introduction of NMA can effectively improve the level of intelligent operation and maintenance of network, thus promoting the continuous evolution of communication network towards higher-level self-intelligence.¶
Typical applications of NMA in networks can cover network operation and maintenance and operation processes:¶
Intelligent planning and construction: such as broadband installation, resource/capacity planning, intelligent acceptance, site selection, etc.¶
including intelligent question and answer, customer service assistant, automatic classification of user complaints, customer retention, product recommendation, automatic flow of work orders, anti-fraud monitoring and identification, intelligent marketing and other value-added services. This part is outside the scope of this document.¶
The starting point for the application of NMA in the live network should comprehensively consider the scenarios with strong demand, feasible technology, and good input-output ratio, and at the same time meet the requirements of sufficient data for AI pre-training during the construction of NMA instance, perfect data annotations, and high fault tolerance rate. Based on above considerations, the broadband installation and maintenance assistant, fault diagnosis, operation and maintenance assistant may become the first application scenarios.¶
This document has no requests for IANA action.¶