太阳能预测的技能评分元分析 (A Meta-Analysis of Solar Forecasting Based on Skill Score)

We conduct the first comprehensive meta-analysis of deterministic solar forecasting based on skill score, screening 1,447 papers from Google Scholar and reviewing the full texts of 320 papers for data extraction. A database of 4,687 points was built and analyzed with multivariate adaptive regression spline modelling, partial dependence plots, and linear regression. The marginal impacts on skill score of ten factors were quantified. The analysis shows the non-linearity and complex interaction between variables in the database. Forecast horizon has a central impact and dominates other factors' impacts. Therefore, the analysis of solar forecasts should be done separately for each horizon. Climate zone variables have statistically significant correlation with skill score. Regarding inputs, historical data and spatial temporal information are highly helpful. For intra-day, sky and satellite images show the most importance. For day-ahead, numerical weather predictions and locally measured meteorological data are very efficient. All forecast models were compared. Ensemble-hybrid models achieve the most accurate forecasts for all horizons. Hybrid models show superiority for intra-hour while image-based methods are the most efficient for intra-day forecasts. More training data can enhance skill score. However, over-fitting is observed when there is too much training data (longer than 2000 days). There has been a substantial improvement in solar forecast accuracy, especially in recent years. More improvement is observed for intra-hour and intra-day than day-ahead forecasts. By controlling for the key differences between forecasts, including location variables, our findings can be applied globally.

翻译：我们开展了第一份综合性的、基于技能评分的太阳能确定性预测的元分析，从Google Scholar筛选了1,447篇论文，并查阅了320篇全文进行数据提取。建立了一个包含4,687个数据点的数据库，并使用多变量自适应回归样条建模、偏部分图和线性回归进行了分析。我们量化了十个因素对技能评分的边际影响。分析结果显示，数据库中变量之间存在非线性和复杂的交互作用。预测时间范围对技能评分有主要影响，并占据了其他因素的影响。因此，太阳能预测的分析应该针对每个时间范围单独进行。气候区域变量与技能评分存在统计显著相关。在输入方面，历史数据和空间时间信息非常有帮助。对于小时内的预测，天空和卫星图像显示出最重要的影响。对于日前预测，数值天气预报和当地测量的气象数据非常高效。所有预测模型均进行了比较。集成混合模型实现了所有时间范围的最精确预测。混合模型在小时内预测中表现出优越性，而基于图像的方法在小时内预测中最为高效。更多的训练数据可以提高技能评分。然而，当有过多训练数据（超过2000天）时，出现了过度拟合现象。太阳能预测的准确性有了显着提高，尤其是近年来。在不同预测之间控制关键差异，包括位置变量，我们的研究结果可以全球应用。