Many applications today need information from diverse data sources, in which related data may be represented quite differently. In one common scenario, a DBA wants to add data from a new source to an existing warehouse. The data in the new source may not match the existing warehouse schema. The new data may also be partially redundant with that in the existing warehouse, or formatted differently. Other applications may need to integrate data more dynamically, in response to user queries. Even applications using data from a single source often want to present it in a form other than that it is stored in. For example, a user may want to publish some information using a particular XML DTD, though the data is not stored in that form. In each of these scenarios, one or more data sets must be mapped into a single target representation. Needed transformations may include schema transformations (changing the structure of the data) [BLN86, RR98] and data transformation and cleansing (changing the the format and vocabulary of the data and eliminating or at least reducing duplicates and errors) [Val, ETI, ME97, HS95]. In each area, there is a broad range of possible transformations, from simple to complex. Schema and data transformation have typically been studied separately. We believe they need to be handled together via a uniform mechanism. Database middleware systems [PGMW95, TRV96, ACPS96, Bon95] integrate data from multiple sources. To be effective, such systems must provide one or more integrated schemas, and must be able to transform data from different sources to answer queries against these schema. The power of their query engines and their ability to connect to several information sources makes them a natural base for doing more complex transformations as well. In this paper, we look at database middleware systems as tranformation engines, and discuss when and how data is transformed to provide users with the information they need.
Download Full PDF Version (Non-Commercial Use)