Data Warehouse Architecture – Concept and Models

According to William Inmon, data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of the management’s decision-making process. Data warehouse is a database containing data that usually represents the business history of an organization. This historical data is used for analysis that supports business decisions at many levels, from strategic planning to performance evaluation of a discrete organizational unit. It provides an effective integration of operational databases into an environment that enables strategic use of data. These technologies include relational and MDDB management systems, client/server architecture, meta-data modelling and repositories, graphical user interface and much more.

The emergence of cross discipline domain such as knowledge management in finance, health and e-commerce have proved that vast amount of data need to be analyzed. The evolution of data in data warehouse can provide multiple dataset dimensions to solve various problems. Thus, critical decision making process of this dataset needs suitable data warehouse model.

The main proponents of data warehouse are William Inmon and Ralph Kimball. But they have different perspectives on data warehouse in term of design and architecture. Inmon defined data warehouse as a dependent data mart structure while Kimball defined data warehouse as a bus based data mart structure. 

A data warehouse is a read-only data source where end-users are not allowed to change the values or data elements. Inmon’s data warehouse architecture strategy is different from Kimball’s. Inmon’s data warehouse model splits data marts as a copy and distributed as an interface between data warehouse and end users. Kimball’s views data warehouse as a unions of data marts. The data warehouse is the collections of data marts combine into one central repository. 

Although Inmon and Kimball have a different design view of data warehouse, they do agree on successful implementation of data warehouse that depends on an effective collection of operational data and validation of data mart. The role of database staging and ETL (extract, transform and loading) processes on data are inevitable components in both researchers data warehouse design. Both believed that dependent data warehouse architecture is necessary to fulfil the requirement of enterprise end users in term of preciseness, timing and data relevancy

William Inmon and Ralph Kimball Data Warehouse Differences:

 

William Inmon

Ralph Kimball

Paradigm

Inmon’s Paradigm: An enterprise has one data warehouse, and data marts source their information from the data warehouse. Information is stored in 3rd normal form.

Kimball’s Paradigm: Data warehouse is the collection of heterogeneous data marts within the enterprise. Information is always stored in the dimensional model.

Architecture

Architecture: Using TOP-DOWN approach

Architecture: Using Bottom-up approach

Concept

Data’s integration from various systems to centralized repository

Concept of dimensional modelling (Bridging between Relational and multidimensional DB)

Design

The design pattern dependent on 3rd normalization form, purpose is for data granularity.

Data’s marts are connected in a bus structure. Data’s marts are the “union” of data warehouse. This approach is known also as Virtual Data Warehouse.

ETL Methods

Data’s extraction from operational data sources. Data are feed in staging database area. Data are then transformed, integrate, and consolidate and transfer to Operational Data Store database. Data are then load to data mart.

Data extracted from legacy system and then consolidated and verified in staging database. Data feed into ODS and more data us added/updated. Operational Data Store contains fresh copy data that is integrated and transformed to the data mart structure.

Data mart

Data Marts are available as a subset of the data warehouse.

Data Marts can be placed at different at different servers or in geographical locations.

Data Warehouse Architecture

Data warehouse architecture can be viewed in many perspectives. A successful data warehouse system depends on database staging process which derives data from different integrated Online Transactional Processing (OLTP) system. In this case, ETL process plays a crucial role to make database staging process workable. Five data warehouse architecture that are common in use are;

  1. Independent Data Marts – Independent data marts also known as localized or small scale data warehouse. It is mainly used by departments, divisions of company to provide individual operational databases. This type of data mart is simple yet consists of different form that was derived from multiple design structures from various inconsistent database designs. Thus, it complicates cross data mart analysis. Since every organizational units tend to build their own database which operates as independent data mart, it is best used as an ad-hoc data warehouse and also to be use as a prototype before building a real data warehouse.
  2. Data Mart Bus Architecture – The design and architecture of data warehouse with unions of data marts which are known as the bus architecture or virtual data warehouse. Bus architecture allows data marts not only located in one server but it can be also being located on different server. This allows the data warehouse to functions more in virtual mode and combined all data marts and process as one data warehouse.
  3. Hub-and-spoke Architecture – The hub is the central server taking care of information exchange and the spoke handle data transformation for all regional operation data stores. Hub and spoke mainly focused on building a scalable and maintainable infrastructure for data warehouse.
  4. Centralized Data Warehouse Architecture – Central data warehouse architecture build based on hub-and-spoke architecture but without the dependent data mart component. This architecture copies and stores heterogeneous operational and external data to a single and consistent data warehouse. This architecture has only one data model which are consistent and complete from all data sources. Central data warehouse should consist of database staging or known as operational data store as an intermediate stage for operational processing of data integration before transform into the data warehouse.
  5. Federated Architecture – Federated data warehouse is an integration of multiple heterogeneous data marts, database staging or operational data store, combination of analytical application and reporting systems. The concept of federated focus on integrated framework to make data warehouse more reliable. Federated data warehouse are a practical approach as it focus on higher reliability and provide excellent value.

Hub and spoke and centralized data warehouse architectures are similar. Hub and spoke is faster and easier to implement because no data mart are required. For centralized data warehouse architecture scored higher than hub and spoke as for urgency needs for relatively fast implementation approach.

Leave a Reply

Your email address will not be published. Required fields are marked *