In data warehousing, Extract-Transform-Load (ETL) extracts the data from data sources into a central data warehouse regularly for the support of business decision-makings. The data from transaction processing systems are featured with the high frequent changes of insertion, update, and deletion. It is challenging for ETL to propagate the changes to the data warehouse, and maintain the change history. Moreover, ETL jobs typically run in a sequential order when processing the data with dependencies, which is not optimal, \eg, when processing early-arriving data. In this paper, we propose a two-level data staging ETL for handling transaction data. The proposed method detects the changes of the data from transactional processing systems, identifies the corresponding operation codes for the changes, and uses two staging databases to facilitate the data processing in an ETL process. The proposed ETL provides the "one-stop" method for fast-changing, slowly-changing and early-arriving data processing.
翻译:在数据仓储中,Exp-Transform-Load(ETL)将数据从数据源定期提取到中央数据仓库,以支持商业决策。来自交易处理系统的数据与插入、更新和删除的频繁变化有关。对于ETL来说,传播数据仓的变化并保持变化历史具有挑战性。此外,ETL的工作在与依赖者处理数据时通常按顺序进行,在处理早期到达数据时,这种顺序不是最佳的,因此。在本文件中,我们提议为处理交易数据建立一个两级数据中继的ETL。拟议的方法检测交易处理系统的数据变化,确定相应的修改操作代码,并使用两个中继数据库为ETL进程中的数据处理提供便利。拟议的ETL为快速变化、缓慢变化和早期到达数据处理提供了“一站式”方法。