OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.
翻译:OpenAIRE是欧洲研究的开放访问基础设施,由所有EC FP7和H2020供资研究项目的数据库组成,包括结果的元数据(出版物和数据集),这些数据储存在HBS NoSQL数据库中,后处理后作为HTML作为人类消费的HTML,并通过网络服务接口作为XML。作为便利统计计算的一个中间格式,CSV是内部生成的。为了将OpenAIRE数据与网上相关数据连接起来,我们的目标是将这些数据作为链接的开放数据(LOD)输出。LOD出口需要将其纳入整个数据处理工作流程,以便每天从基本数据中重新生成衍生的数据。因此,我们面临确定最佳转换方法的挑战。我们通过在HBase上绘制地图,通过绘制中间的 CSV 文档和XML输出图,评估了在MOD的绩效。