** We explore the limitations of and best practices for using black-box variational inference to estimate posterior summaries of the model parameters. By taking an importance sampling perspective, we are able to explain and empirically demonstrate: 1) why the intuitions about the behavior of approximate families and divergences for low-dimensional posteriors fail for higher-dimensional posteriors, 2) how we can diagnose the pre-asymptotic reliability of variational inference in practice by examining the behavior of the density ratios (i.e., importance weights), 3) why the choice of variational objective is not as relevant for higher-dimensional posteriors, and 4) why, although flexible variational families can provide some benefits in higher dimensions, they also introduce additional optimization challenges. Based on these findings, for high-dimensional posteriors we recommend using the exclusive KL divergence that is most stable and easiest to optimize, and then focusing on improving the variational family or using model parameter transformations to make the posterior more similar to the approximating family. Our results also show that in low to moderate dimensions, heavy-tailed variational families and mass-covering divergences can increase the chances that the approximation can be improved by importance sampling. **

** In modern contexts, some types of data are observed in high-resolution, essentially continuously in time. Such data units are best described as taking values in a space of functions. Subject units carrying the observations may have intrinsic relations among themselves, and are best described by the nodes of a large graph. It is often sensible to think that the underlying signals in these functional observations vary smoothly over the graph, in that neighboring nodes have similar underlying signals. This qualitative information allows borrowing of strength over neighboring nodes and consequently leads to more accurate inference. In this paper, we consider a model with Gaussian functional observations and adopt a Bayesian approach to smoothing over the nodes of the graph. We characterize the minimax rate of estimation in terms of the regularity of the signals and their variation across nodes quantified in terms of the graph Laplacian. We show that an appropriate prior constructed from the graph Laplacian can attain the minimax bound, while using a mixture prior, the minimax rate up to a logarithmic factor can be attained simultaneously for all possible values of functional and graphical smoothness. We also show that in the fixed smoothness setting, an optimal sized credible region has arbitrarily high frequentist coverage. A simulation experiment demonstrates that the method performs better than potential competing methods like the random forest. The method is also applied to a dataset on daily temperatures measured at several weather stations in the US state of North Carolina. **

数据库
·

** Given two relations containing multiple measurements - possibly with uncertainties - our objective is to find which sets of attributes from the first have a corresponding set on the second, using exclusively a sample of the data. This approach could be used even when the associated metadata is damaged, missing or incomplete, or when the volume is too big for exact methods. This problem is similar to the search of Inclusion Dependencies (IND), a type of rule over two relations asserting that for a set of attributes X from the first, every combination of values appears on a set Y from the second. Existing IND can be found exploiting the existence of a partial order relation called specialization. However, this relation is based on set theory, requiring the values to be directly comparable. Statistical tests are an intuitive possible replacement, but it has not been studied how would they affect the underlying assumptions. In this paper we formally review the effect that a statistical approach has over the inference rules applied to IND discovery. Our results confirm the intuitive thought that statistical tests can be used, but not in a directly equivalent manner. We provide a workable alternative based on a "hierarchy of null hypotheses", allowing for the automatic discovery of multi-dimensional equally distributed sets of attributes. **

** Models with a large number of latent variables are often used to fully utilize the information in big or complex data. However, they can be difficult to estimate using standard approaches, and variational inference methods are a popular alternative. Key to the success of these is the selection of an approximation to the target density that is accurate, tractable and fast to calibrate using optimization methods. Most existing choices can be inaccurate or slow to calibrate when there are many latent variables. Here, we propose a family of tractable variational approximations that are more accurate and faster to calibrate for this case. It combines a parsimonious parametric approximation for the parameter posterior, with the exact conditional posterior of the latent variables. We derive a simplified expression for the re-parameterization gradient of the variational lower bound, which is the main ingredient of efficient optimization algorithms used to implement variational estimation. To do so only requires the ability to generate exactly or approximately from the conditional posterior of the latent variables, rather than to compute its density. We illustrate using two complex contemporary econometric examples. The first is a nonlinear multivariate state space model for U.S. macroeconomic variables. The second is a random coefficients tobit model applied to two million sales by 20,000 individuals in a large consumer panel from a marketing study. In both cases, we show that our approximating family is considerably more accurate than mean field or structured Gaussian approximations, and faster than Markov chain Monte Carlo. Last, we show how to implement data sub-sampling in variational inference for our approximation, which can lead to a further reduction in computation time. MATLAB code implementing the method for our examples is included in supplementary material. **

** False negative errors are of major concern in applications where missing a high proportion of true signals may cause serious consequences. False negative control, however, raises a bottleneck challenge in high-dimensional inference when signals are not identifiable at individual levels. We develop a new analytic framework to regulate false negative errors under measures tailored towards modern applications with high-dimensional data. A new method is proposed in realistic settings with arbitrary covariance dependence between variables. We explicate the joint effects of covariance dependence and signal sparsity on the new method and interpret the results using a phase diagram. It shows that signals that are not individually identifiable can be effectively retained by the proposed method without incurring excessive false positives. Simulation studies are conducted to compare the new method with several existing methods. The new method outperforms the others in adapting to a user-specified false negative control level. We apply the new method to analyze an fMRI dataset to locate voxels that are functionally relevant to saccadic eye movements. The new method exhibits a nice balance in retaining signal voxels and avoiding excessive noise voxels. **

** Category recommendation for users on an e-Commerce platform is an important task as it dictates the flow of traffic through the website. It is therefore important to surface precise and diverse category recommendations to aid the users' journey through the platform and to help them discover new groups of items. An often understated part in category recommendation is users' proclivity to repeat purchases. The structure of this temporal behavior can be harvested for better category recommendations and in this work, we attempt to harness this through variational inference. Further, to enhance the variational inference based optimization, we initialize the optimizer at better starting points through the well known Metapath2Vec algorithm. We demonstrate our results on two real-world datasets and show that our model outperforms standard baseline methods. **

** Amortized inference has led to efficient approximate inference for large datasets. The quality of posterior inference is largely determined by two factors: a) the ability of the variational distribution to model the true posterior and b) the capacity of the recognition network to generalize inference over all datapoints. We analyze approximate inference in variational autoencoders in terms of these factors. We find that suboptimal inference is often due to amortizing inference rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation. **

** Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility. **