Methods for Heterogeneity Detection During Multi-Dimensional Data Mart Integration
Author: Geno Stefanov
Abstract
The paper focuses on the detection of heterogeneities between multi-dimensional data marts. In many cases, data which resides in multiple and independently developed data marts is needed for decision-making. The multi-dimensional model introduces, in addition to the ER data model, dimension and fact entity. As a result of the multi-dimensional model elements, two groups of heterogeneities have been identified – dimension and fact. The former depends on differences between the dimensions’ hierarchies, their members, the names of the members, their levels and dimensions. The latter kind of heterogeneities occurs when facts in different data marts are in different names, values (inconsistent measures), formats or even on a different scale. Therefore, the paper examines and classifies the heterogeneities which can occur during the integration of independently developed data marts and four methods for heterogeneity detection are proposed and discussed. The methods are as follows: method for metadata extraction, method for detecting schema-instance heterogeneities, method for detecting heterogeneities among dimensions and method for detecting heterogeneities among facts. The paper ends with conclusions about the advantages of the proposed methods for heterogeneity detection during the integration of data marts.