We analyze hypotheses tests via classical results on large deviations for the case of two different Holder Gibbs probabilities. The main difference for the the classical hypotheses tests in Decision Theory is that here the two considered measures are singular with respect to each other. We analyze the classical Neyman-Pearson test showing its optimality. This test becomes exponentially better when compared to other alternative tests, with the sample size going to infinity. We also consider both, the Min-Max and a certain type of Bayesian hypotheses tests. We shall consider these tests in the log likelihood framework by using several tools of Thermodynamic Formalism. Versions of the Stein's Lemma and the Chernoff's information are also presented.
We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.
In his seminal work on recording quantum queries [Crypto 2019], Zhandry studied interactions between quantum query algorithms and the quantum oracle corresponding to random functions. Zhandry presented a framework for interpreting various states in the quantum space of the oracle that can be used to provide security proofs in quantum cryptography. In this paper, we introduce a similar interpretation for the case when the oracle corresponds to random permutations instead of random functions. Because both random functions and random permutations are highly significant in security proofs, we hope that the present framework will find applications in quantum cryptography. Additionally, we show how this framework can be used to prove that the success probability for a k-query quantum algorithm that attempts to invert a random N-element permutation is at most O(k^2/N).
We present an intimate connection among the following fields: (a) distributed local algorithms: coming from the area of computer science, (b) finitary factors of iid processes: coming from the area of analysis of randomized processes, (c) descriptive combinatorics: coming from the area of combinatorics and measure theory. In particular, we study locally checkable labellings in grid graphs from all three perspectives. Most of our results are for the perspective (b) where we prove time hierarchy theorems akin to those known in the field (a) [Chang, Pettie FOCS 2017]. This approach that borrows techniques from the fields (a) and (c) implies a number of results about possible complexities of finitary factor solutions. Among others, it answers three open questions of [Holroyd et al. Annals of Prob. 2017] or the more general question of [Brandt et al. PODC 2017] who asked for a formal connection between the fields (a) and (b). In general, we hope that our treatment will help to view all three perspectives as a part of a common theory of locality, in which we follow the insightful paper of [Bernshteyn 2020+] .
We propose and compare methods for the analysis of extreme events in complex systems governed by PDEs that involve random parameters, in situations where we are interested in quantifying the probability that a scalar function of the system's solution is above a threshold. If the threshold is large, this probability is small and its accurate estimation is challenging. To tackle this difficulty, we blend theoretical results from large deviation theory (LDT) with numerical tools from PDE-constrained optimization. Our methods first compute parameters that minimize the LDT-rate function over the set of parameters leading to extreme events, using adjoint methods to compute the gradient of this rate function. The minimizers give information about the mechanism of the extreme events as well as estimates of their probability. We then propose a series of methods to refine these estimates, either via importance sampling or geometric approximation of the extreme event sets. Results are formulated for general parameter distributions and detailed expressions are provided when Gaussian distributions. We give theoretical and numerical arguments showing that the performance of our methods is insensitive to the extremeness of the events we are interested in. We illustrate the application of our approach to quantify the probability of extreme tsunami events on shore. Tsunamis are typically caused by a sudden, unpredictable change of the ocean floor elevation during an earthquake. We model this change as a random process, which takes into account the underlying physics. We use the one-dimensional shallow water equation to model tsunamis numerically. In the context of this example, we present a comparison of our methods for extreme event probability estimation, and find which type of ocean floor elevation change leads to the largest tsunamis on shore.
Non-linearity of a Boolean function indicates how far it is from any linear function. Despite there being several strong results about identifying a linear function and distinguishing one from a sufficiently non-linear function, we found a surprising lack of work on computing the non-linearity of a function. The non-linearity is related to the Walsh coefficient with the largest absolute value; however, the naive attempt of picking the maximum after constructing a Walsh spectrum requires $\Theta(2^n)$ queries to an $n$-bit function. We improve the scenario by designing highly efficient quantum and randomised algorithms to approximate the non-linearity allowing additive error, denoted $\lambda$, with query complexities that depend polynomially on $\lambda$. We prove lower bounds to show that these are not very far from the optimal ones. The number of queries made by our randomised algorithm is linear in $n$, already an exponential improvement, and the number of queries made by our quantum algorithm is surprisingly independent of $n$. Our randomised algorithm uses a Goldreich-Levin style of navigating all Walsh coefficients and our quantum algorithm uses a clever combination of Deutsch-Jozsa, amplitude amplification and amplitude estimation to improve upon the existing quantum versions of the Goldreich-Levin technique.
Coding Theory where the alphabet is identified with the elements of a ring or a module has become an important research topic over the last 30 years. Such codes over rings had important applications and many interesting mathematical problems are related to this line of research. It has been well established, that with the generalization of the algebraic structure to rings there is a need to also generalize the underlying metric beyond the usual Hamming weight used in traditional coding theory over finite fields. This paper introduces a new weight, called the overweight, which can be seen as a generalization of the Lee weight on the integers modulo $4$. For this new weight we provide a number of well-known bounds, like a Plotkin bound, a sphere-packing bound, and a Gilbert-Varshamov bound. A further highlight is the proof of a Johnson bound for the homogeneous weight on a general finite Frobenius ring.
Autoregressive tempered fractionally integrated moving average with stable innovations modifies the power-law kernel of the fractionally integrated time series model by adding an exponential tempering factor. The tempered time series is a stationary model that can exhibits semi-long-range dependence. This paper develops the basic theory of the tempered time series model, including dependence structure and parameter estimation.
Preferably in two- or three-arm randomized clinical trials, a few (2,3) correlated multiple primary endpoints are considered. In addition to the closed testing principle based on different global tests, two max(maxT) tests are compared with respect to any-pairs, all-pairs and individual power in a simulation study.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.