数据科学的三项原则:可预测性、可计算性和稳定性(PCS) (Three principles of data science: predictability, computability, and stability (PCS))

We propose the predictability, computability, and stability (PCS) framework to extract reproducible knowledge from data that can guide scientific hypothesis generation and experimental design. The PCS framework builds on key ideas in machine learning, using predictability as a reality check and evaluating computational considerations in data collection, data storage, and algorithm design. It augments PC with an overarching stability principle, which largely expands traditional statistical uncertainty considerations. In particular, stability assesses how results vary with respect to choices (or perturbations) made across the data science life cycle, including problem formulation, pre-processing, modeling (data and algorithm perturbations), and exploratory data analysis (EDA) before and after modeling. Furthermore, we develop PCS inference to investigate the stability of data results and identify when models are consistent with relatively simple phenomena. We compare PCS inference with existing methods, such as selective inference, in high-dimensional sparse linear model simulations to demonstrate that our methods consistently outperform others in terms of ROC curves over a wide range of simulation settings. Finally, we propose a PCS documentation based on Rmarkdown, iPython, or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis. The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo.

翻译：我们提出可预测性、可比较性和稳定性(PCS)框架,以便从能够指导科学假设的生成和实验设计的数据中提取可复制的知识; PCS框架以机器学习中的关键想法为基础,利用可预测性作为现实检查和评价数据收集、数据储存和算法设计中的计算考虑; 以总体稳定原则增强PC,这在很大程度上扩大了传统的统计不确定性考虑; 特别是, 稳定评估数据科学生命周期中的选择(或扰动)的结果如何不同,包括问题拟订、预处理、建模(数据和算法扰动)和探索性数据分析(EDA); 此外,我们开发PCS推论,以调查数据结果的稳定性,并在模型与相对简单的现象相一致时加以确定; 我们比较PCS的推论与现有方法,例如选择性推论,在高维线性线性模型模拟中,以证明我们的方法始终比其他方法在广泛的模拟环境中的曲线。最后,我们提议根据Rmargowdow、iPRODO、iPRODOD 和Jumpheal Studio deal cuideal deal cultations recultations an recudustrations betradudududududustrations befulations an an an an recudududududududustration.