Grey literature is essential to software engineering research as it captures practices and decisions that rarely appear in academic venues. However, collecting and assessing it at scale remains difficult because of their heterogeneous sources, formats, and APIs that impede reproducible, large-scale synthesis. To address this issue, we present GLiSE, a prompt-driven tool that turns a research topic prompt into platform-specific queries, gathers results from common software-engineering web sources (GitHub, Stack Overflow) and Google Search, and uses embedding-based semantic classifiers to filter and rank results according to their relevance. GLiSE is designed for reproducibility with all settings being configuration-based, and every generated query being accessible. In this paper, (i) we present the GLiSE tool, (ii) provide a curated dataset of software engineering grey-literature search results classified by semantic relevance to their originating search intent, and (iii) conduct an empirical study on the usability of our tool.
翻译:灰色文献对于软件工程研究至关重要,因为它记录了学术渠道中鲜少出现的实践与决策。然而,由于其来源、格式和API的异构性阻碍了可复现的大规模综合处理,大规模收集与评估灰色文献仍然困难。为解决这一问题,我们提出了GLiSE——一种基于提示驱动的工具,能够将研究主题提示转化为特定平台的查询语句,从常见的软件工程网络资源(GitHub、Stack Overflow)及谷歌搜索中收集结果,并利用基于嵌入的语义分类器根据相关性对结果进行过滤与排序。GLiSE以实现可复现性为设计目标,所有设置均基于配置,且每个生成的查询均可追溯。本文中,(i)我们介绍了GLiSE工具;(ii)提供了一个经整理的软件工程灰色文献搜索结果数据集,其中结果已按其与原始搜索意图的语义相关性进行了分类;(iii)对我们的工具可用性进行了实证研究。