简介： 机器学习是从数据和经验中学习的算法研究。 它被广泛应用于从医学到广告，从军事到行人的各种应用领域。 CIML是一组入门资料，涵盖了现代机器学习的大多数主要方面（监督学习，无监督学习，大幅度方法，概率建模，学习理论等）。 它的重点是具有严格主干的广泛应用。 一个子集可以用于本科课程； 研究生课程可能涵盖全部材料，然后再覆盖一些。
作者介绍： Hal Daumé III，教授，他曾担任Perotto教授职位，他现在Microsoft Research NYC的机器学习小组中。 研究方向是自然语言处理。
Accurate and reliable forecasting of total cloud cover (TCC) is vital for many areas such as astronomy, energy demand and production, or agriculture. Most meteorological centres issue ensemble forecasts of TCC, however, these forecasts are often uncalibrated and exhibit worse forecast skill than ensemble forecasts of other weather variables. Hence, some form of post-processing is strongly required to improve predictive performance. As TCC observations are usually reported on a discrete scale taking just nine different values called oktas, statistical calibration of TCC ensemble forecasts can be considered a classification problem with outputs given by the probabilities of the oktas. This is a classical area where machine learning methods are applied. We investigate the performance of post-processing using multilayer perceptron (MLP) neural networks, gradient boosting machines (GBM) and random forest (RF) methods. Based on the European Centre for Medium-Range Weather Forecasts global TCC ensemble forecasts for 2002-2014 we compare these approaches with the proportional odds logistic regression (POLR) and multiclass logistic regression (MLR) models, as well as the raw TCC ensemble forecasts. We further assess whether improvements in forecast skill can be obtained by incorporating ensemble forecasts of precipitation as additional predictor. Compared to the raw ensemble, all calibration methods result in a significant improvement in forecast skill. RF models provide the smallest increase in predictive performance, while MLP, POLR and GBM approaches perform best. The use of precipitation forecast data leads to further improvements in forecast skill and except for very short lead times the extended MLP model shows the best overall performance.