In modern science, the growing complexity of large-scale scientific projects has led to an increasing reliance on cross-facility scientific workflows, where resources and expertise from multiple institutions and geographic locations are leveraged to accelerate scientific discovery. These workflows often require transmitting huge amounts of scientific data through wide-area networks. Although high-speed networks like ESnet and transfer services such as Globus have improved data mobility, several challenges remain. The sheer volume of data can overwhelm network bandwidth, widely used transport protocols such as TCP suffer from inefficiencies due to retransmissions triggered by packet loss, and existing fault-tolerance mechanisms like erasure coding introduce substantial overhead. In this paper, we propose JANUS, a resilient and adaptable data transmission approach designed for cross-facility scientific workflows. Unlike traditional TCP-based methods, JANUSleverages UDP, integrates erasure coding for fault tolerance, and combines it with error-bounded lossy compression to reduce overhead. This novel design allows users to balance data transmission time and accuracy, optimizing transfer performance based on specific scientific requirements. Additionally, JANUS dynamically adjusts erasure coding parameters in response to real-time network conditions, ensuring efficient data transfers even in fluctuating environments. We develop optimization models for determining ideal configurations and implement adaptive data transfer protocols to enhance reliability. Through extensive simulations and real-network experiments, we demonstrate that JANUS significantly improves transfer efficiency while maintaining data fidelity.
翻译:在现代科学研究中,大规模科学项目日益复杂化,导致对跨设施科学工作流的依赖不断增强。这类工作流通过整合多个机构及地理分布的资源与专业知识,以加速科学发现进程。此类工作流通常需要借助广域网传输海量科学数据。尽管如ESnet等高速网络及Globus等传输服务已提升了数据流动性,但仍面临诸多挑战:数据量巨大可能超出网络带宽承载能力;广泛使用的传输协议(如TCP)因数据包丢失触发的重传机制导致效率低下;现有容错机制(如纠删码)则引入显著开销。本文提出JANUS——一种专为跨设施科学工作流设计的弹性自适应数据传输方法。与传统基于TCP的方法不同,JANUS基于UDP协议,集成纠删码实现容错,并结合误差有损压缩技术以降低开销。这一创新设计使用户能够权衡数据传输时间与精度,根据具体科学需求优化传输性能。此外,JANUS能依据实时网络状况动态调整纠删码参数,确保在波动网络环境中仍能实现高效数据传输。我们开发了用于确定最优配置的优化模型,并实现了自适应数据传输协议以增强可靠性。通过大量仿真与真实网络实验,我们证明JANUS在保持数据保真度的同时,显著提升了传输效率。