Knowledge of Wi-Fi networks helps to guide future engineering and spectrum policy decisions. However, due to its unlicensed nature, the deployment of Wi-Fi Access Points is undocumented meaning researchers are left making educated guesses as to the prevalence of these assets through remotely collected or passively sensed measurements. One commonly used method is referred to as `wardriving` essentially where a vehicle is used to collect geospatial statistical data on wireless networks to inform mobile computing and networking security research. Surprisingly, there has been very little examination of the statistical issues with wardriving data, despite the vast number of analyses being published in the literature using this approach. In this paper, a sample of publicly collected wardriving data is compared to a predictive model for Wi-Fi Access Points. The results demonstrate several statistical issues which future wardriving studies must account for, including selection bias, sample representativeness and the modifiable areal unit problem.
Evaluating the reliability of intelligent physical systems against rare safety-critical events poses a huge testing burden for real-world applications. Simulation provides a useful platform to evaluate the extremal risks of these systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these learning-based systems due to their black-box nature that fundamentally undermines its efficiency guarantee, which can lead to under-estimation without diagnostically detected. We propose a framework called Deep Probabilistic Accelerated Evaluation (Deep-PrAE) to design statistically guaranteed IS, by converting black-box samplers that are versatile but could lack guarantees, into one with what we call a relaxed efficiency certificate that allows accurate estimation of bounds on the safety-critical event probability. We present the theory of Deep-PrAE that combines the dominating point concept with rare-event set learning via deep neural network classifiers, and demonstrate its effectiveness in numerical examples including the safety-testing of an intelligent driving algorithm.
Because of the widespread diffusion of computational systems of stochastic nature, in recent years probabilistic model checking has become an important area of research. However, despite its great success, standard probabilistic model checking suffers the limitation of requiring a sharp specification of the probabilities governing the model behaviour. Imprecise probability theory offers a natural approach to overcome such limitation by a sensitivity analysis with respect to the values of these parameters. However, only extensions based on discrete-time imprecise Markov chains have been considered so far for such a robust approach to model checking. Here we present a further extension based on imprecise Markov reward models. In particular, we derive efficient algorithms to compute lower and upper bounds of the expected cumulative reward and probabilistic bounded rewards based on existing results for imprecise Markov chains. These ideas are finally tested on a real case study involving the spend-down costs of geriatric medicine departments.
A full Bayesian approach to the estimation of Vaccine Efficacy is presented, which is an improvement over the currently used exact method conditional on the total number of cases. As an example, we reconsider the statistical sections of the BioNTech/Pfizer protocol, which in 2020 has led to the first approved anti-Covid-19 vaccine.
Policy responses to COVID-19, particularly those related to non-pharmaceutical interventions, are unprecedented in scale and scope. Epidemiologists are more involved in policy decisions and evidence generation than ever before. However, policy impact evaluations always require a complex combination of circumstance, study design, data, statistics, and analysis. Beyond the issues that are faced for any policy, evaluation of COVID-19 policies is complicated by additional challenges related to infectious disease dynamics and lags, lack of direct observation of key outcomes, and a multiplicity of interventions occurring on an accelerated time scale. The methods needed for policy-level impact evaluation are not often used or taught in epidemiology, and differ in important ways that may not be obvious. The volume and speed, and methodological complications of policy evaluations can make it difficult for decision-makers and researchers to synthesize and evaluate strength of evidence in COVID-19 health policy papers. In this paper, we (1) introduce the basic suite of policy impact evaluation designs for observational data, including cross-sectional analyses, pre/post, interrupted time-series, and difference-in-differences analysis, (2) demonstrate key ways in which the requirements and assumptions underlying these designs are often violated in the context of COVID-19, and (3) provide decision-makers and reviewers a conceptual and graphical guide to identifying these key violations. The overall goal of this paper is to help epidemiologists, policy-makers, journal editors, journalists, researchers, and other research consumers understand and weigh the strengths and limitations of evidence that is essential to decision-making.
Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.
Copula models are flexible tools to represent complex structures of dependence for multivariate random variables. According to Sklar's theorem (Sklar, 1959), any d-dimensional absolutely continuous density can be uniquely represented as the product of the marginal distributions and a copula function which captures the dependence structure among the vector components. In real data applications, the interest of the analyses often lies on specific functionals of the dependence, which quantify aspects of it in a few numerical values. A broad literature exists on such functionals, however extensions to include covariates are still limited. This is mainly due to the lack of unbiased estimators of the copula function, especially when one does not have enough information to select the copula model. Recent advances in computational methodologies and algorithms have allowed inference in the presence of complicated likelihood functions, especially in the Bayesian approach, whose methods, despite being computationally intensive, allow us to better evaluate the uncertainty of the estimates. In this work, we present several Bayesian methods to approximate the posterior distribution of functionals of the dependence, using nonparametric models which avoid the selection of the copula function. These methods are compared in simulation studies and in two realistic applications, from civil engineering and astrophysics.
Boom cranes are among the most common material handling systems due to their simple design. Some boom cranes also have an auxiliary jib connected to the boom with a flexible joint to enhance the maneuverability and increase the workspace of the crane. Such boom cranes are commonly called knuckle boom cranes. Due to their underactuated properties, it is fairly challenging to control knuckle boom cranes. To the best of our knowledge, only a few techniques are present in the literature to control this type of cranes using approximate models of the crane. In this paper we present for the first time a complete mathematical model for this crane where it is possible to control the three rotations of the crane (known as luff, slew, and jib movement), and the cable length. One of the main challenges to control this system is how to reduce the oscillations in an effective way. In this paper we propose a nonlinear control based on energy considerations capable of guiding the crane to desired sets points while effectively reducing load oscillations. The corresponding stability and convergence analysis is proved using the LaSalle's invariance principle. Simulation results are provided to demonstrate the effectiveness and feasibility of the proposed method.
Automation of cranes can have a direct impact on the productivity of construction projects. In this paper, we focus on the control of one of the most used cranes, the boom crane. Tower cranes and overhead cranes have been widely studied in the literature, whereas the control of boom cranes has been investigated only by a few works. Typically, these works make use of simple models making use of a large number of simplifying assumptions (e.g. fixed length cable, assuming certain dynamics are uncoupled, etc.) A first result of this paper is to present a fairly complete nonlinear dynamic model of a boom crane taking into account all coupling dynamics and where the only simplifying assumption is that the cable is considered as rigid. The boom crane involves pitching and rotational movements, which generate complicated centrifugal forces, and consequently, equations of motion highly nonlinear. On the basis of this model, a control law has been developed able to perform position control of the crane while actively damping the oscillations of the load. The effectiveness of the approach has been tested in simulation with realistic physical parameters and tested in the presence of wind disturbances.
CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the U.S. Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework's ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data.
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.