Canonical data models (CDM) have gained traction as a pattern for data integration in streaming pipelines that extract, transform and load data (ETL). CDMs are in particular useful for integrating microservice systems. (Villaca et.al., 2020; Oliveira et.al., 2019) However, the transformation to a CDM is complex. (Lemcke, 2012) In this paper, we present a new solution that is based on a new dynamic mapping matrix (DMM). The DMM has been implemented into an app called Message ETL (METL). METL is the key part of a new ETL streaming pipeline at EOS. EOS is part of the Otto-Group, the second-largest e-commerce provider in Europe. The pipeline is based on Kafka streams. METL transforms Kafka messages, that contain a set of data objects described by one extracting schema. It transforms each of these n' different messages into m' outgoing messages. Each outgoing message contains a sub-set of the incoming data objects, but describes them with a different schema, namely a CDM schema. For the mapping, METL requires a matrix that consists of m'xn' mapping blocks. There are three problems, namely the sparsity of the matrix, the adaption of the matrix to changes in schemata versions and time efficiency. We solve these problems by block-partitioning, sub-matrix formation and pattern generalization. In this process, we derive permutation matrices. We show that they can be used for automated updates, for parallel computation in near real-time and compacting. The permutation matrices form the dynamic mapping matrix. For the solution, we draw on research into matrix partitioning (Quinn, 2004) and dynamic networks (Haase et.al., 2021).
翻译:Canonial数据模型(CDM)作为提取、变换和装载数据的管道(ETL)数据集流中的数据整合模式而获得牵引力。清洁发展机制对于整合微服务系统特别有用。 (Villaca et.al.,2020;Oliveira et.al.,2019) 然而,向清洁发展机制的转变是复杂的。 (Lemcke,2012) 在本文件中,我们提出了一个基于新的动态映射矩阵的新解决方案。 DMMM 已经应用到一个名为 Mession ETL (METL) 的应用程序中。 METL 是 EOS 新的ETL 流流流管的关键部分。 EOS 是欧洲第二大电子商务供应商Otro组的一部分。 管道以 Kafka 流为基础。 METL 将卡夫卡信息转换为一组数据对象。 它将所有这些 n 不同的信息转换成 mentrial 消息都包含一个子集, METL 。 每个发送信息中包含一个数据对象的子集, 但是描述它们与一个不同的流流流流流, 也就是MITemal marial rial rition rial rial rition macal 。