This paper introduces a novel framework for high-accuracy outdoor user equipment (UE) positioning that applies a conditional generative diffusion model directly to high-dimensional massive MIMO channel state information (CSI). Traditional fingerprinting methods struggle to scale to large, dynamic outdoor environments and require dense, impractical data surveys. To overcome these limitations, our approach learns a direct mapping from raw uplink Sounding Reference Signal (SRS) fingerprints to continuous geographic coordinates. We demonstrate that our DiffLoc framework achieves unprecedented sub-centimeter precision, with our best model (DiffLoc-CT) delivering 0.5 cm fusion accuracy and 1-2 cm single base station (BS) accuracy in a realistic, ray-traced Tokyo urban macro-cell environment. This represents an order-of-magnitude improvement over existing methods, including supervised regression approaches (over 10 m error) and grid-based fusion (3 m error). Our consistency training approach reduces inference time from 200 steps to just 2 steps while maintaining exceptional accuracy even for high-speed users (15-25 m/s) and unseen user trajectories, demonstrating the practical feasibility of our framework for real-time 6G applications.
翻译:暂无翻译