Internet-Draft | ODES Gap Analysis | July 2024 |
Zhao, et al. | Expires 6 January 2025 | [Page] |
This document is a gap analysis of online data express delivery services, which is helpful to the design and development of online data express delivery services.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
With the rapid development of diverse computing capabilities such as general, intelligent, and super computing, the volume and complexity of data processing have exploded, particularly in scenarios like cloud disaster recovery, astronomical calculations, gene sequencing, autonomous driving, and film and television production. Current cloud service providers like AWS, Azure, and Alibaba Cloud have introduced data migration solutions like Snowball, Data Box, and Datatransport, respectively. However, most of these large-scale data transfers still rely on offline hard drive delivery, which is cumbersome, time-consuming, and has high security risks.¶
Assuming an end-to-end long-distance 100Gbps network link that can ensure high bandwidth utilization, it can transmit about 1PB of data in one day, which can meet most of the online transmission needs for massive amounts of data. Compared to offline data migration, it has advantages such as efficiency and security. Online data delivery can change the way of offline transportation for data migration, accelerate data circulation, liberate the geographical restrictions of computing, empower various industries, and drive the development of the digital economy and society.¶
Please refer to the document of Use Cases and Problem Statement of Data Express Service.¶
The speed of data transmission depends on both the transmission bandwidth on the network side and the performance of the protocol stack on the end side. Currently, the bandwidth of the core network can generally reach more than 10Gbps, and some networks can support bandwidths of more than 100Gbps. However, the end-to-end single-stream data transmission bandwidth on the wide area network is generally less than 1Gbps (most mainstream transmission protocols use TCP, and the performance is generally within 50Mbps). Existing technologies such as multi-stream concurrency and network offloading also struggle to achieve high-throughput (100Gbps or higher) data transmission."¶
Traditional TCP network congestion control and packet loss retransmission techniques are difficult to meet the performance requirements of high-throughput network transmission in wide area networks. In recent years, many new high-throughput versions of TCP protocols and TCP acceleration devices have emerged. These improved TCP protocols mainly focus on dynamically adjusting the congestion window size and increasing congestion detection signals. However, they do not fundamentally address the limitations of the AIMD mechanism in terms of high throughput. In wide area networks, packet loss due to physical media errors or sudden traffic spikes is unavoidable and cannot be ignored. As packet loss and delay increase, the end-to-end throughput of these improved TCP protocols decreases significantly. In a network environment with 0.1% packet loss and an RTT of 10ms, the single-stream throughput rate is below 50Mbps [FASP].¶
TCP's reliable mechanism can reduce network throughput and increase average latency. Due to the complexity of modifying TCP itself, in recent years, both academic and industry circles have designed new transmission schemes based on the UDP protocol, using UDP as an alternative to TCP to achieve reliability at the application layer, such as UDT, QUIC, and other schemes.¶
Most of these data transmission schemes based on UDP retransmit lost packets through some method, but they do not consider the risks of available bandwidth and network collapse, and there is also a phenomenon of seizing TCP traffic. UDT uses UDP to reliably move data, with a more aggressive data transmission mechanism and a dynamic AIMD congestion avoidance algorithm, and implements packet loss retransmission through the NACK mechanism. This method outperforms TCP in certain scenarios with optimized parameters, but in typical wide area networks, UDT's transmission performance is lower than TCP. UDT's aggressive data transmission mechanism can also easily lead to rate oscillation and packet loss, which not only destroys its own throughput but also affects other traffic in the network.¶
QUIC, based on UDP, has designed a new protocol stack optimized for the interactive application characteristics of http3.0, with improvements in connection establishment, connection migration, multi-stream multiplexing, congestion control, and forward error correction. However, in long-fat pipeline and complex network environments, QUIC cannot compete with TCP in terms of sustained throughput performance [QUIC(k)].¶
Aspera FASP is also a new transmission protocol designed based on UDP. It completely separates reliability and rate control from data transmission, quickly adjusts the sending rate by periodically probing the queuing delay in the network, and designs an application-layer packet loss retransmission mechanism to ensure reliability and high bandwidth utilization. However, FASP cannot be used in applications where byte streams are transmitted in order, and network throughput is also limited by disk IO, file systems, CPU scheduling, etc.¶
RDMA utilizes technologies such as zero-copy memory, kernel bypass, and CPU offloading to offload the entire TCP/IP protocol stack to the network card, allowing user-space applications to directly read and write to remote host memory. This avoids data copying and context switching, achieving high throughput, low latency, and low CPU power consumption. There are three technical paths for RDMA technology: Infinite Bandwidth Technology InfiniBand, RDMA over Converged Ethernet (RoCE) based on converged Ethernet, and Internet Wide Area RDMA Protocol (iWARP). Among them, RoCE technology has two versions: RoCEv1 and RoCEv2. RoCEv2 is widely used in data center networks due to its compatibility with traditional TCP/IP and ease of deployment and management, mainly in high-performance storage, high-performance computing, and other scenarios.¶
Although RDMA has very high transmission performance, with some manufacturers achieving speeds of 400Gbps, current RDMA technology generally requires operation on lossless networks and cannot be used in general wide-area networks. When the packet loss rate exceeds 0.01%, the throughput of RDMA will drop significantly, which is the root cause of why existing RDMA cannot operate in wide-area networks.¶
TBD.¶
The following people have substantially contributed to this document:¶
Zongpeng Du [email protected] Kehan Yao [email protected]¶
TBD.¶