Internet-Draft Network Management Agent Concept October 2024
Zhao, et al. Expires 23 April 2025 [Page]
Workgroup:
Network Management Operations
Internet-Draft:
draft-zhao-nmop-network-management-agent-00
Published:
Intended Status:
Informational
Expires:
Authors:
X. Zhao
CAICT
Y. Xu
CAICT
C. Yu
Huawei
H. Meng
CAICT
Y. Fu
CAICT

AI based Network Management Agent(NMA): Concepts and Architecture

Abstract

With the development of AI(Artificial Intelligence) technology, large model have shown significant advantages and great potential in recognition, understanding, decision-making, and generation, and can well match the self-intelligent network management requirements for the goal of autonomous network[TMF-IG1230] or Intent-based Networking [RFC9315], and can be used as one of the potential driving technologies to drive high-level autonomous networks. When introducing AI for network management, how to integrate AI technology and deal with the relationship with the existing network management entity (such as network controller) is the focus of research and standardization.

This document presents the concept of AI based network management agent(NMA), provides the basic definition and reference architecture of NMA, discusses the relationship of NMA with traditional network controller or other network management entity by exploring the delpoyment mode of NMA, and proposes the comman processing flow and typical application scenarios of NMA.

Discussion Venues

This note is to be removed before publishing as an RFC.

Discussion of this document takes place on the Network Management Operations Working Group mailing list ([email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/.

Source for this draft and an issue tracker can be found at https://github.com/ietf-wg-nmop/draft-ietf-nmop-digital-map-concept.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 23 April 2025.

Table of Contents

1. Introduction

As the types of operator services become increasingly diverse, the complexity and difficulty of network operations and maintenance continue to grow. On one hand, new service scenarios such as industrial internet, vehicle-road collaboration, and 5GtoB for vertical industries are constantly emerging, and customer services like Extended Reality (XR), Virtual Reality (VR), and smart home are becoming more abundant, with a continuous increase in network access volume. On the other hand, with the popularization of 5G and gigabit optical networks, operators' networks are facing a situation where networks from 2G to 5G coexist. The network protocols and characteristics vary across different network domains, leading to a continuous increase in the difficulty and complexity of network operations and maintenance. Relying solely on traditional manual operations and maintenance methods can no longer meet the increasingly complex network operations and maintenance demands. The level of network intelligence has become a key factor directly affecting network performance and user experience. Against this backdrop, enhancing the level of network intelligence and creating Autonomous Networks (AN)[TMF-IG1230] has become a global consensus among operators, with mainstream operators releasing goals and plans to achieve Level 4 (L4) autonomous networks by 2025.

L4+ AN sets higher requirement in intention, decision-making, analysis, perception, and execution. Artificial Intelligence (AI) large model technology has shown significant advantages and great potential in identification, understanding, decision-making, and generation. It has technical features such as multimodal fusion perception capabilities, more user-friendly human-computer interaction and knowledge Q&A capabilities, and content generation capabilities, which can well match the new requirements of Level 4 Autonomous Networks and already be one of the core driving technologies to achieve high-level autonomous networks.

The application and deployment methods of AI after its introduction are still unclear, as well as the relationship between AI and the existing network management and control systems, and in what form it can help network management, which are key issues that need to be discussed currently.

Therefore, this document proposes the concept of AI-based network management agent (NMA), defines the reference architecture of NMA, and discusses the application mode, general task processing flow, interface requirements, and typical application scenarios of NMA.

2. Terminology

2.1. Acronyms and Abbreviations

AI: Artificial Intelligence

LLM: Large Language Model

NMA: Network Management Agent, refers to AI based network management agent

2.2. Definitions

The document defines the following terms:

Network Management Agent (NMA):

A network management entity with autonomous task processing capabilities, which is encapsulated based on AI algorithm or AI model, and has task intent [RFC9315] perception, planning, decision-making, and execution capabilities. It can understand the input operation intent through AI model, call other functional components of the control system or external interfaces to complete task processing, and return processing results. For different application scenarios, NMA can be subdivided into multiple scenario-oriented agents.

NMA Instance:

The instantiated agent applications which can automatically perform certain network management tasks for specific network management scenarios. For different application scenarios, there can be multiple scenario-oriented agent instances (like apps in the phone), which can be called “NMA instance” for short in this document.

3. Reference architecture of AI based network management agent(NMA)

The network management agent (NMA) is a new network management entity with autonomous task processing capabilities, which is encapsulated based on AI algorithms or AI models, and has task intent perception, task planning, decision-making, and execution capabilities. It can understand the input operation intent through AI models, call other functional components of the control system or external interfaces to complete task processing, and return processing results automatically.

3.1. Function Architecture of NMA

Based on the traditional AI agent concepts and frameworks, this document presents the architecture of AI agent for network management as shown in Figure 1.

+-----------------------------------------------------------------+
|             AI based network management agent(NMA)              |
|                                                                 |
| +------------------------Instance layer-----------------------+ |
| | +-------------------------+     +-------------------------+ | |
| | |     Agent Instances     |     |                         | | |
| | |     (NMA instance)      |     |                         | | |
| | | +---------------------+ |     |                         | | |
| | | |   Fault Treatment   | |     |                         | | |
| | | +---------------------+ |     |                         | | |
| | | +---------------------+ |     |            NMA          | | |
| | | |   Network Planning  | |     |         Instance        | | |
| | | +---------------------+ |<--->|        Management       | | |
| | | +---------------------+ |     |                         | | |
| | | | Network Optimization| |     |                         | | |
| | | +---------------------+ |     |                         | | |
| | | +---------------------+ |     |                         | | |
| | | |        ......       | |     |                         | | |
| | | +---------------------+ |     |                         | | |
| | +-------------------------+     +-------------------------+ | |
| +------------------------------^------------------------------+ |
|                                |                                |
|                                v                                |
| +-------------------------Base layer--------------------------+ |
| | +-------------------------+     +-------------------------+ | |
| | | AI based Basic services |     |   Knowledge and memory  | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | | |  Intent Management  | |     | |    Knowledge Base   | | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | | |    Task Planning    | |     | | Knowledge Retrieval | | | |
| | | +---------------------+ |<--->| +---------------------+ | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | | |   Task Execution    | |     | |  Memory Management  | | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | | |  Tool Invocation    | |     | |   Memory Retrieval  | | | |
| | | +---------------------+ |     | +---------------------+ | | |
| | +-------------------------+     +-------------------------+ | |
| +-------------------------------------------------------------+ |
+-----------------------------------------------------------------+

Figure 1: Key Elements of AI based network management agent (NMA)

The NMA consists of two main layers, the specific components and functions of each layer are as follows:

3.1.1. Base Layer

Base Layer includes AI based basic services as well as knowledge and memory subsystem.

AI based Basic Services:

Provide a unified intelligent agent engine framework, build interactive intelligence public capabilities, simplify application development, integrate Large Language Models (LLM), knowledge retrieval, API invocations, etc., to achieve the full process orchestration from intent understanding, task planning, tool invocation to task execution.

Knowledge and Memory Subsystem:

Provides unified search for local multi-type knowledge bases (vector knowledge base, system online help, operation and maintenance data logs), combines LLM to complete knowledge fusion and extraction, and improves the accuracy of downstream tasks (knowledge Q&A/task planning, etc.). Realizes knowledge injection and integrated retrieval. Among them, the knowledge base and knowledge retrieval capabilities can be deployed inside or outside the NMA according to actual needs.

3.1.2. Instance Layer

Instance Layer includes agent instances and instance management functions.

Agent Instances:

It refers to the instantiated agent applications which can automatically perform certain network management tasks for specific network management scenarios. When an agent instance receives a request from the user, it can leverage the capabilities of the base layer to address complex tasks in various network operational and maintenance scenarios. It achieves understanding of task intent, plans and decomposes sub-goals, acquires and distributes information, and flexibly schedules AI models as well as invoke related function APIs to complete the execution of specific tasks, and then feeds the execution results back to the users.

For different application scenarios, there can be multiple scenario-oriented agent instances (like apps in the phone), which can be called “NMA instance” for short in this document. Aimed at the network planning, construction, maintenance, optimization, and operation scenarios, the main NMA instances could include:

  • Network Fault Handling Instance: This instance can be created by pre-training specific AI model based on the network troubleshooting guidance documents, network equipment product documents, and other materials. The instance can solidify the fault handling experience of experts, and realize fault impact analysis, root cause self-diagnosis, and self-repair of network faults by orchestrating and calling models or network control APIs. It also interfaces with the list dispatching system to achieve list self-closed loop, etc.

  • Network Planning Instance: The instance can make use of the capabilities of AI large model to understand the network planning intent (user intent, business development goals, network construction plans, etc.) through LLM technology, and analyzes and forecasts the current network resource usage (traffic, performance, user scale, resource utilization, etc.) to output planning schemes.

  • Network Optimization Instance: Understands the network optimization goal through natural language, converts the optimization intent into network optimization constraint rules, such as network load thresholds, service route optimization strategies, etc. The instance can use traffic prediction models to predict the future traffic and bandwidth utilization of the entire network, automatically generate resource, hidden danger, performance, traffic, and other prediction results, and can automatically generate optimization strategies based on the prediction results to perform traffic pre-diversion, autonomous decision-making, and automatic execution to achieve dynamic energy saving of equipment and optimal traffic of the entire network, etc.

  • Intelligent Assistant Instance: This instance can have open Q&A capability based on LLM, providing a dialogue Q&A style operation and maintenance. Users can "one-click" input fault descriptions or resource names in natural language, and the instance will automatically perform intent recognition and query to significantly improve the efficiency of knowledge questioning, fault reporting, and maintenance support.

Instance Management:

Implements basic management capabilities such as registration of intelligent agent instances, lifecycle management, operation monitoring, and log auditing. It also provides manual takeover switch control capabilities, providing platform support for intelligent agent collaboration and integrated evolution.

3.2. Deployment mode of NMA

The NMA can be part of existing SDN-based network controller or be an independent system. Correspondingly, there can be two deployment modes between the NMA and the original network control system (or controller), as shown in Figure 2.


+-----------------------------+         +--------------------+
|                             |         |                    |
| Original Network Management <-MCS_A_I-> Network Management |
|   and Control System(MCS)   |         |    Agent(NMA)      |
|                             |         |                    |
+--------------^--------------+         +----------^---------+
               |                                   |
    Southbound Interface(SBI)           Intelligent SBI(I_SBI)
               |                                   |
+--------------v-----------------------------------v---------+
|                        Physical Network                    |
+------------------------------------------------------------+
                              (a)

+------------------------------------------------------------+
|        Network Management and Control System(MCS)          |
|                                                            |
|  +--------------------+           +--------------------+   |
|  | Original Function  <--Internal-> Network management |   |
|  |      Modules       | Interface |      Agent(NMA)    |   |
|  +--------------------+   (I_I)   +--------------------+   |
|                                                            |
+------------------------------^-----------------------------+
                               |
                      Extended SBI(E_SBI)
                               |
+------------------------------v-----------------------------+
|                       Physical Network                     |
+------------------------------------------------------------+
                              (b)
Figure 2: Deployment mode of network management agent (NMA)
Independent deployment mode:

As shown in Figure 2(a), NMA is independently deployed from the original network management and control system (MCS). NMA and MCS are independent systems. A new east-west interface needs to be added between NMA and MCS to achieve capability calling and result feedback operations. This interface can be called “MCS_A_I”. In this deployment mode, MCS use southbound interface (SBI) to interact with physical network, while an intelligent southbound interface (abbreviated as “I_SBI”) needs to be added between NMA and the underlying physical network.

Integrated deployment mode:

As shown in Figure 2(b), NMA is integrated and deployed with the original network management and control system (MCS), and the NMA serves as a function of MCS. NMA interacts with original function modules through internal interface (abbreviated as “I_I”). The enhanced MCS interacts with underlay physical network through extended SBI (abbreviated as “E_SBI”).

The specific functional requirements and information model definition of interfaces mentioned above will be discussed in the following version.

4. Common processing flow of NMA

The embedded AI model within NMA serves as the interface for user information input, and NMA instance uses the large model as the interface to clarify problems through multiple rounds, analyze positioning, generate plans, invoke interfaces/tools to handle problems, and complete closed-loop processing of problems, so as to build end-to-end problem processing assistance capabilities.

          User/Network
+-----> Management Task
|               |
|               v
|       Intent Analysis <-------+            +-- Service Configuration
|               |               |            |         API/Tool
|               |               v            |
|               |       Model Reasoning      |      Alarm Monitor
|               |               ^            |         API/Tool
|               v               |            |
|       Task Decomposition <----+            |   Performance Monitor
|               |                            |         API/Tool
|               v                            |
|      Tool/API Invocation-----> Toolkit ----+   Network Optimization
|               |                  |  ^      |         API/Tool
|               v                  |  |      |
|     Process Encapsulation        |  |      |   Topology Management
|               |                  |  |      |         API/Tool
|               v                  |  |      |
+---Executive Result Analysis      |  |      +-- other APIs/Tools
                                   |  |
                                   |  |
                                   |  |
                                   |  |
           +-----------------------v--+-----------------------------+
           |                   Physical Network                     |
           +--------------------------------------------------------+

Figure 3: Common processing flow of NMA

The common processing flow of NMA instance are shown in Figure 3. The processing steps include:

  1. User/Network Management Task Input: Input the user’s task information Through multiple rounds of natural language interaction.

  2. Intent Analysis: Analysis user task intent through AI model reasoning provided by the AI based basic services within NMA.

  3. Task Decomposition: Split the task into detailed operations to be performed based on the analyzed intent of the task.

  4. Tool/API Invocation: Call the corresponding tool or function API to complete the execution of each operation listed in step 3). The toolkit refers to the collection of all tools that can be used directly to manage and operate physical networks, which can include management functions from existing MCS, EMS, or standalone other management tools. The toolkit can include service configuration API/Tool, alarm monitor API/Tool, performance monitor API/Tool, network optimization API/Tool, topology management API/Tool, etc.

  5. Process Encapsulation: Encapsulate each execution step. According to the order or dependency of all the operations, package the individual operation results into the execution result of the entire task.

  6. Executive result analysis: Analyze the task processing results and return to the user.

Through above processing flow, NMA can achieve closed-loop automated processing of tasks and constructing end-to-end intelligent network maintenance assistance capabilities. For example, in the intelligent troubleshooting scenario, NMA can identify the cause of the fault and call the corresponding interfaces to handle it, such as creating a troubleshooting order, automatically initiating rerouting/optical power optimization, and other troubleshooting operations, and automatically verifying the progress of the order execution, with feedback on the troubleshooting results after the job order is completed.

The introduction of NMA can effectively improve the level of intelligent operation and maintenance of network, thus promoting the continuous evolution of communication network towards higher-level self-intelligence.

6. Typical application scenarios after introducing NMA

Typical applications of NMA in networks can cover network operation and maintenance and operation processes:

Network management and maintenance scenarios, including:
  • Intelligent planning and construction: such as broadband installation, resource/capacity planning, intelligent acceptance, site selection, etc.

  • Intelligent maintenance: such as intelligent fault diagnosis, quality analysis, operation and maintenance/cutting assistant, broadband maintenance assistant, etc.
  • Intelligent optimization: such as route optimization, coverage optimization, topology optimization, and intelligent energy saving, etc.
Network operation scenarios:

including intelligent question and answer, customer service assistant, automatic classification of user complaints, customer retention, product recommendation, automatic flow of work orders, anti-fraud monitoring and identification, intelligent marketing and other value-added services. This part is outside the scope of this document.

The starting point for the application of NMA in the live network should comprehensively consider the scenarios with strong demand, feasible technology, and good input-output ratio, and at the same time meet the requirements of sufficient data for AI pre-training during the construction of NMA instance, perfect data annotations, and high fault tolerance rate. Based on above considerations, the broadband installation and maintenance assistant, fault diagnosis, operation and maintenance assistant may become the first application scenarios.

7. Security Considerations

TBD.

8. IANA Considerations

This document has no requests for IANA action.

9. References

9.2. Informative References

[I-D.irtf-nmrg-ai-challenges]
François, J., Clemm, A., Papadimitriou, D., Fernandes, S., and S. Schneider, "Research Challenges in Coupling Artificial Intelligence and Network Management", Work in Progress, Internet-Draft, draft-irtf-nmrg-ai-challenges-03, , <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-ai-challenges-03>.
[I-D.kdj-nmrg-ibn-usecases]
Yao, K., Chen, D., Jeong, J., Wu, Q., Yang, C., and L. Contreras, "Use Cases and Practices for Intent-Based Networking", Work in Progress, Internet-Draft, draft-kdj-nmrg-ibn-usecases-01, , <https://datatracker.ietf.org/doc/html/draft-kdj-nmrg-ibn-usecases-01>.
[RFC7575]
Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic Networking: Definitions and Design Goals", RFC 7575, DOI 10.17487/RFC7575, , <https://www.rfc-editor.org/rfc/rfc7575>.
[RFC7576]
Jiang, S., Carpenter, B., and M. Behringer, "General Gap Analysis for Autonomic Networking", RFC 7576, DOI 10.17487/RFC7576, , <https://www.rfc-editor.org/rfc/rfc7576>.
[RFC9222]
Carpenter, B. E., Ciavaglia, L., Jiang, S., and P. Peloso, "Guidelines for Autonomic Service Agents", RFC 9222, DOI 10.17487/RFC9222, , <https://www.rfc-editor.org/rfc/rfc9222>.
[RFC9315]
Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", RFC 9315, DOI 10.17487/RFC9315, , <https://www.rfc-editor.org/rfc/rfc9315>.
[TMF-IG1230]
Machwe, A., Milham, D., O’Sullivan, J., Clemm, A., and J. Niemöller, "Autonomous Networks Technical Architecture", TMF IG1230, .

Authors' Addresses

Xing Zhao
CAICT
Beijing
China
Yunbin Xu
CAICT
Beijing
China
Chaode Yu
Huawei
China
Haijun Meng
CAICT
Beijing
China
Yipeng Fu
CAICT
Beijing
China