** This paper presents the first general (supervised) statistical learning framework for point processes in general spaces. Our approach is based on the combination of two new concepts, which we define in the paper: i) bivariate innovations, which are measures of discrepancy/prediction-accuracy between two point processes, and ii) point process cross-validation (CV), which we here define through point process thinning. The general idea is to carry out the fitting by predicting CV-generated validation sets using the corresponding training sets; the prediction error, which we minimise, is measured by means of bivariate innovations. Having established various theoretical properties of our bivariate innovations, we study in detail the case where the CV procedure is obtained through independent thinning and we apply our statistical learning methodology to three typical spatial statistical settings, namely parametric intensity estimation, non-parametric intensity estimation and Papangelou conditional intensity fitting. Aside from deriving theoretical properties related to these cases, in each of them we numerically show that our statistical learning approach outperforms the state of the art in terms of mean (integrated) squared error. **

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

** The missing data issue is ubiquitous in health studies. Variable selection in the presence of both missing covariates and outcomes is an important statistical research topic but has been less studied. Existing literature focuses on parametric regression techniques that provide direct parameter estimates of the regression model. In practice, parametric regression models are often sub-optimal for variable selection because they are susceptible to misspecification. Machine learning methods considerably weaken the parametric assumptions and increase modeling flexibility, but do not provide as naturally defined variable importance measure as the covariate effect native to parametric models. We investigate a general variable selection approach when both the covariates and outcomes can be missing at random and have general missing data patterns. This approach exploits the flexibility of machine learning modeling techniques and bootstrap imputation, which is amenable to nonparametric methods in which the covariate effects are not directly available. We conduct expansive simulations investigating the practical operating characteristics of the proposed variable selection approach, when combined with four tree-based machine learning methods, XGBoost, Random Forests, Bayesian Additive Regression Trees (BART) and Conditional Random Forests, and two commonly used parametric methods, lasso and backward stepwise selection. Numeric results show XGBoost and BART have the overall best performance across various settings. Guidance for choosing methods appropriate to the structure of the analysis data at hand are discussed. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome with data from the Study of Women's Health Across the Nation. **

** Recent advances in computational methods for intractable models have made network data increasingly amenable to statistical analysis. Exponential random graph models (ERGMs) emerged as one of the main families of models capable of capturing the complex dependence structure of network data in a wide range of applied contexts. The Bergm package for R has become a popular package to carry out Bayesian parameter inference, missing data imputation, model selection and goodness-of-fit diagnostics for ERGMs. Over the last few years, the package has been considerably improved in terms of efficiency by adopting some of the state-of-the-art Bayesian computational methods for doubly-intractable distributions. Recently, version 5 of the package has been made available on CRAN having undergone a substantial makeover, which has made it more accessible and easy to use for practitioners. New functions include data augmentation procedures based on the approximate exchange algorithm for dealing with missing data, adjusted pseudo-likelihood and pseudo-posterior procedures, which allow for fast approximate inference of the ERGM parameter posterior and model evidence for networks on several thousands nodes. **

** Qualitative and quantitative approaches to reasoning about uncertainty can lead to different logical systems for formalizing such reasoning, even when the language for expressing uncertainty is the same. In the case of reasoning about relative likelihood, with statements of the form $\varphi\succsim\psi$ expressing that $\varphi$ is at least as likely as $\psi$, a standard qualitative approach using preordered preferential structures yields a dramatically different logical system than a quantitative approach using probability measures. In fact, the standard preferential approach validates principles of reasoning that are incorrect from a probabilistic point of view. However, in this paper we show that a natural modification of the preferential approach yields exactly the same logical system as a probabilistic approach--not using single probability measures, but rather sets of probability measures. Thus, the same preferential structures used in the study of non-monotonic logics and belief revision may be used in the study of comparative probabilistic reasoning based on imprecise probabilities. **

** A composite likelihood is a non-genuine likelihood function that allows to make inference on limited aspects of a model, such as marginal or conditional distributions. Composite likelihoods are not proper likelihoods and need therefore calibration for their use in inference, from both a frequentist and a Bayesian perspective. The maximizer to the composite likelihood can serve as an estimator and its variance is assessed by means of a suitably defined sandwich matrix. In the Bayesian setting, the composite likelihood can be adjusted by means of magnitude and curvature methods. Magnitude methods imply raising the likelihood to a constant, while curvature methods imply evaluating the likelihood at a different point by translating, rescaling and rotating the parameter vector. Some authors argue that curvature methods are more reliable in general, but others proved that magnitude methods are sufficient to recover, for instance, the null distribution of a test statistic. We propose a simple calibration for the marginal posterior distribution of a scalar parameter of interest which is invariant to monotonic and smooth transformations. This can be enough for instance in medical statistics, where a single scalar effect measure is often the target. **

** Recent research has proposed neural architectures for solving combinatorial problems in structured output spaces. In many such problems, there may exist multiple solutions for a given input, e.g. a partially filled Sudoku puzzle may have many completions satisfying all constraints. Further, we are often interested in finding any one of the possible solutions, without any preference between them. Existing approaches completely ignore this solution multiplicity. In this paper, we argue that being oblivious to the presence of multiple solutions can severely hamper their training ability. Our contribution is two fold. First, we formally define the task of learning one-of-many solutions for combinatorial problems in structured output spaces, which is applicable for solving several problems of interest such as N-Queens, and Sudoku. Second, we present a generic learning framework that adapts an existing prediction network for a combinatorial problem to handle solution multiplicity. Our framework uses a selection module, whose goal is to dynamically determine, for every input, the solution that is most effective for training the network parameters in any given learning iteration. We propose an RL based approach to jointly train the selection module with the prediction network. Experiments on three different domains, and using two different prediction networks, demonstrate that our framework significantly improves the accuracy in our setting, obtaining up to 21 pt gain over the baselines. **

** This paper surveys the machine learning literature and presents machine learning as optimization models. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. Particularly, mathematical optimization models are presented for commonly used machine learning approaches for regression, classification, clustering, and deep neural networks as well new emerging applications in machine teaching and empirical model learning. The strengths and the shortcomings of these models are discussed and potential research directions are highlighted. **

** We propose an Active Learning approach to image segmentation that exploits geometric priors to streamline the annotation process. We demonstrate this for both background-foreground and multi-class segmentation tasks in 2D images and 3D image volumes. Our approach combines geometric smoothness priors in the image space with more traditional uncertainty measures to estimate which pixels or voxels are most in need of annotation. For multi-class settings, we additionally introduce two novel criteria for uncertainty. In the 3D case, we use the resulting uncertainty measure to show the annotator voxels lying on the same planar patch, which makes batch annotation much easier than if they were randomly distributed in the volume. The planar patch is found using a branch-and-bound algorithm that finds a patch with the most informative instances. We evaluate our approach on Electron Microscopy and Magnetic Resonance image volumes, as well as on regular images of horses and faces. We demonstrate a substantial performance increase over state-of-the-art approaches. **