Last-mile access networks are often the dominant bottlenecks for Internet applications, creating demand for data-generation approaches that are both realistic and reusable. Meeting this goal requires five properties: fidelity (capturing real network behaviors), controllability (systematic variation of network conditions), diversity (coverage of heterogeneous network behaviors), composability (construction of complex scenarios from simpler elements), and replicability (consistent outcomes across runs). Existing approaches satisfy only a subset of these requirements. This paper introduces NETREPLICA, a programmable substrate for last-mile data generation that achieves all five. NETREPLICA decomposes bottlenecks into static attributes (capacity, base latency, buffer size, shaping and active queue management policies) and dynamic attributes derived from passive traces. It introduces Cross-Traffic Profiles (CTPs) that transform passive production traces into reusable, parameterizable building blocks. By trimming, scaling, and recombining CTPs, NETREPLICA generates realistic yet tunable conditions, replaying non-reactive cross traffic alongside reactive application workloads and enabling reproducible construction of heterogeneous scenarios. In a case study on adaptive bitrate streaming, models trained with NETREPLICA-generated traces reduced transmission-time prediction error by up to 47% in challenging slow-path domains (>=400 ms RTT, <=6 Mbps throughput) compared to models trained solely on production traces -- demonstrating the utility of NETREPLICA-generated data. Overall, NETREPLICA represents a first step toward a fully programmable data-generation substrate for networking.
翻译:暂无翻译