合成金融数据在投资组合与风险建模中的应用 (Applications of synthetic financial data in portfolio and risk modeling)

from arxiv, 14 pages, submitted as a preprint. This study examines generative models (TimeGAN and VAE) for creating synthetic financial data to support portfolio construction, trading analysis, and risk modeling

Synthetic financial data offers a practical way to address the privacy and accessibility challenges that limit research in quantitative finance. This paper examines the use of generative models, in particular TimeGAN and Variational Autoencoders (VAEs), for creating synthetic return series that support portfolio construction, trading analysis, and risk modeling. Using historical daily returns from the S and P 500 as a benchmark, we generate synthetic datasets under comparable market conditions and evaluate them using statistical similarity metrics, temporal structure tests, and downstream financial tasks. The study shows that TimeGAN produces synthetic data with distributional shapes, volatility patterns, and autocorrelation behaviour that are close to those observed in real returns. When applied to mean-variance portfolio optimization, the resulting synthetic datasets lead to portfolio weights, Sharpe ratios, and risk levels that remain close to those obtained from real data. The VAE provides more stable training but tends to smooth extreme market movements, which affects risk estimation. Finally, the analysis supports the use of synthetic datasets as substitutes for real financial data in portfolio analysis and risk simulation, particularly when models are able to capture temporal dynamics. Synthetic data therefore provides a privacy-preserving, cost-effective, and reproducible tool for financial experimentation and model development.

翻译：合成金融数据为解决量化金融研究中因隐私与可获取性限制而面临的挑战提供了实用途径。本文研究了生成模型——特别是TimeGAN与变分自编码器（VAE）——在生成用于支持投资组合构建、交易分析与风险建模的合成收益率序列中的应用。以标普500指数的历史日收益率作为基准，我们在可比市场条件下生成合成数据集，并通过统计相似性度量、时序结构检验以及下游金融任务对其进行评估。研究表明，TimeGAN生成的合成数据在分布形态、波动率模式与自相关行为方面均接近真实收益率观测值。当应用于均值-方差投资组合优化时，基于该合成数据集得到的投资组合权重、夏普比率与风险水平均与真实数据所得结果保持相近。VAE虽能提供更稳定的训练过程，但倾向于平滑极端市场波动，从而影响风险估计。最终，分析支持将合成数据集作为真实金融数据的替代用于投资组合分析与风险模拟，尤其是在模型能够捕捉时序动态特征的情况下。因此，合成数据为金融实验与模型开发提供了一种兼具隐私保护、成本效益与可复现性的工具。