是什么造就了大众学术 AI仓库? (What Makes a Popular Academic AI Repository?)

Many AI researchers are publishing code, data and other resources that accompany their papers in GitHub repositories. In this paper, we refer to these repositories as academic AI repositories. Our preliminary study shows that highly cited papers are more likely to have popular academic AI repositories (and vice versa). Hence, in this study, we perform an empirical study on academic AI repositories to highlight good software engineering practices of popular academic AI repositories for AI researchers. We collect 1,149 academic AI repositories, in which we label the top 20% repositories that have the most number of stars as popular, and we label the bottom 70% repositories as unpopular. The remaining 10% repositories are set as a gap between popular and unpopular academic AI repositories. We propose 21 features to characterize the software engineering practices of academic AI repositories. Our experimental results show that popular and unpopular academic AI repositories are statistically significantly different in 11 of the studied features---indicating that the two groups of repositories have significantly different software engineering practices. Furthermore, we find that the number of links to other GitHub repositories in the README file, the number of images in the README file and the inclusion of a license are the most important features for differentiating the two groups of academic AI repositories. Our dataset and code are made publicly available to share with the community.

翻译：许多大赦国际研究人员正在GitHub 库中发表其论文的代码、数据和其他资源。本文中,我们将这些储存库称为学术AI储存库。我们的初步研究显示,大量引用的文件更有可能拥有受欢迎的AI储存库(反之亦然)。因此,我们在本研究中对学术AI储存库进行了经验研究,以突出大赦国际研究人员流行的AI储存库的良好软件工程做法。我们收集了1 149个学术AI储存库,其中我们把最前20%的恒星数量标为最受欢迎的,我们把底部70%的储存库标为不受欢迎的。其余10%的储存库被设为受欢迎和不受欢迎的AI储存库之间的空白。我们提出了21个特征来描述学术AI储存库的软件工程做法。我们的实验结果显示,在所研究的11个特征中,流行和不受欢迎的AI储存库在统计上差别很大,表明这两个储存库的软件工程做法大不相同。此外,我们发现在README档案中与其他GitHub储存库的链接数量是最重要的,RADME档案中图像的数目和我们所拥有的学术数据库中最重要的部分。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日

【干货书】数据科学家统计实战，附代码与409页pdf

专知会员服务

60+阅读 · 2020年11月6日

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

专知会员服务

17+阅读 · 2020年6月18日