Small and medium-sized enterprises (SMEs) represent 99.9% of U.S. businesses yet remain systematically excluded from AI due to a mismatch between their operational scale and modern machine learning's data requirements. This paper introduces SmallML, a Bayesian transfer learning framework achieving enterprise-level prediction accuracy with datasets as small as 50-200 observations. We develop a three-layer architecture integrating transfer learning, hierarchical Bayesian modeling, and conformal prediction. Layer 1 extracts informative priors from 22,673 public records using a SHAP-based procedure transferring knowledge from gradient boosting to logistic regression. Layer 2 implements hierarchical pooling across J=5-50 SMEs with adaptive shrinkage, balancing population patterns with entity-specific characteristics. Layer 3 provides conformal sets with finite-sample coverage guarantees P(y in C(x)) >= 1-alpha for distribution-free uncertainty quantification. Validation on customer churn data demonstrates 96.7% +/- 4.2% AUC with 100 observations per business -- a +24.2 point improvement over independent logistic regression (72.5% +/- 8.1%), with p < 0.000001. Conformal prediction achieves 92% empirical coverage at 90% target. Training completes in 33 minutes on standard CPU hardware. By enabling enterprise-grade predictions for 33 million U.S. SMEs previously excluded from machine learning, SmallML addresses a critical gap in AI democratization. Keywords: Bayesian transfer learning, hierarchical models, conformal prediction, small-data analytics, SME machine learning
翻译:中小型企业(SMEs)占美国企业总数的99.9%,但由于其运营规模与现代机器学习的数据需求不匹配,仍被系统性地排除在人工智能应用之外。本文提出SmallML,一种贝叶斯迁移学习框架,能够在仅含50-200个观测样本的小型数据集上实现企业级预测精度。我们开发了一种三层架构,整合了迁移学习、分层贝叶斯建模与保形预测。第一层通过基于SHAP的特征提取流程,从22,673条公开记录中提取信息先验,将梯度提升树的知识迁移至逻辑回归模型。第二层采用自适应收缩的分层池化方法,在J=5-50家中小企业间进行跨实体参数聚合,平衡总体模式与个体特异性。第三层提供具有有限样本覆盖保证的保形预测集,满足P(y∈C(x))≥1-α,实现无需分布假设的不确定性量化。在客户流失数据上的验证表明,每家企业仅用100个样本即可达到96.7%±4.2%的AUC值——相较于独立逻辑回归模型(72.5%±8.1%)提升24.2个百分点(p<0.000001)。保形预测在90%目标置信度下实现92%的经验覆盖率。在标准CPU硬件上训练仅需33分钟。通过为先前被机器学习排除在外的3300万家美国中小企业提供企业级预测能力,SmallML填补了人工智能民主化进程中的关键空白。关键词:贝叶斯迁移学习、分层模型、保形预测、小数据分析、中小企业机器学习