Traditional boxplots are widely used for summarizing and visualizing the distribution of numerical data, yet they exhibit significant limitations when applied to skewed or heavy-tailed distributions, often leading to misclassification of outliers through swamping -- flagging typical observations as outliers -- or masking -- failing to detect true outliers. This paper addresses these limitations by systematically evaluating several alternative boxplots specifically designed to accommodate distributional asymmetry. We introduce ggskewboxplots, an R package that integrates multiple robust and skewness-aware boxplot variants, providing a unified and user-friendly framework for exploratory data analysis. Using extensive Monte Carlo simulations under controlled skewness and kurtosis conditions, implemented via the mosaic approach based on the Skewed Exponential Power distribution, we assess the sensitivity and specificity of each method. Simulation results indicate that classical Tukey-style boxplots are highly prone to swamping and masking, whereas robust skewness-adjusted variants -- particularly those leveraging quartile-based skewness measures or medcouple-based adjustments -- achieve substantially better performance. These findings offer practical guidance for selecting reliable boxplot methods in applied settings and demonstrate how the ggskewboxplots package facilitates accessible, distribution-aware visualizations within the familiar ggplot2 workflow.
翻译:传统箱线图被广泛用于数值数据分布的概括与可视化,但在处理偏态或重尾分布时存在显著局限性,常因淹没效应(将典型观测值误判为异常值)或掩盖效应(未能检测到真实异常值)而导致异常值误分类。本文通过系统评估几种专门为适应分布不对称性设计的替代箱线图,以应对这些局限性。我们介绍了ggskewboxplots——一个整合了多种稳健且考虑偏度的箱线图变体的R包,为探索性数据分析提供了统一且用户友好的框架。通过基于偏斜指数幂分布的mosaic方法,在受控偏度和峰度条件下进行广泛的蒙特卡洛模拟,我们评估了每种方法的敏感性与特异性。模拟结果表明,经典的Tukey式箱线图极易出现淹没与掩盖效应,而稳健的偏度调整变体——特别是那些利用基于四分位数的偏度度量或基于中位偶调整的方法——实现了显著更优的性能。这些发现为应用场景中选择可靠的箱线图方法提供了实用指导,并展示了ggskewboxplots包如何在熟悉的ggplot2工作流中促进易于使用的、考虑分布特性的可视化。