OLAP Council (1997) define Online Analytical Processing (OLAP) as a group of decision support system that facilitate fast, consistent and interactive access of information that has been reformulate, transformed and summarized from relational dataset mainly from data warehouse into Multi-Dimensional Databases (MDDB) which allow optimal data retrieval and for performing trend analysis. OLAP is an important concept for strategic database analysis. OLAP have the ability to analyze large amount of data for the extraction of valuable information. Analytical development can be of business, education or medical sectors. The technologies of data warehouse, OLAP, and analyzing tools support that ability.
Online Analytical Processing (OLAP) enable discovering pattern and relationship contain in business activity by query tons of data from multiple database source systems at one time. Processing database information using OLAP required an OLAP server to organize and transformed and builds MDDB. MDDB are then separated by cubes for client OLAP tools to perform data analysis which aim to discover new pattern relationship between the cubes.
The fact is that Online Analytical Processing (OLAP) and data warehouse are complementary technology which blends together. Data warehouse stores and manages data while OLAP transforms data warehouse datasets into strategic information. OLAP function ranges from basic navigation and browsing (often known as “slice and dice”), to calculations and also serious analysis such as time series and complex modelling. As decision-makers implement more advanced OLAP capabilities, they move from basic data access to creation of information and to discovering of new knowledge.
ONLINE ANALYTICAL PROCESSING (OLAP) ARCHITECTURE
In comparison to data warehouse which usually based on relational technology, OLAP uses a multidimensional view to aggregate data to provide rapid access to strategic information for analysis. There are three type of Online Analytical Processing (OLAP) architecture based on the method in which they store multi-dimensional data and perform analysis operations on that dataset. The categories are multidimensional OLAP (MOLAP), relational OLAP (ROLAP) and hybrid OLAP (HOLAP).
- In MOLAP, datasets are stored and summarized in a multidimensional cube. The MOLAP architecture can perform faster than ROLAP and HOLAP (C). MOLAP cubes designed and build for rapid data retrieval to enhance efficient slicing and dicing operations. MOLAP can perform complex calculations which have been pre-generated after cube creation. MOLAP processing is restricted to initial cube that was created and are not bound to any additional replication of cube.
- In ROLAP, data and aggregations are stored in relational database tables to provide the OLAP slicing and dicing functionalities. ROLAP are the slowest among the OLAP flavors. ROLAP relies on data manipulating directly in the relational database to give the manifestation of conventional OLAP’s slicing and dicing functionality. Basically, each slicing and dicing action is equivalent to adding a “WHERE” clause in the SQL statement. ROLAP can manage large amounts of data and ROLAP do not have any limitations for data size. ROLAP can influence the intrinsic functionality in a relational database. ROLAP are slow in performance because each ROLAP activity are essentially a SQL query or multiple SQL queries in the relational database. The query time and number of SQL statements executed measures by its complexity of the SQL statements and can be a bottleneck if the underlying dataset size is large. ROLAP essentially depends on SQL statements generation to query the relational database and do not cater all needs which make ROLAP technology conventionally limited by what SQL functionality can offer.
- HOLAP combine the technologies of MOLAP and ROLAP. Data are stored in ROLAP relational database tables and the aggregations are stored in MOLAP cube. HOLAP can drill down from multidimensional cube into the underlying relational database data. To acquire summary type of information, HOLAP leverages cube technology for faster performance. Whereas to retrieve detail type of information, HOLAP can drill down from the cube into the underlying relational data.
In Online Analytical Processing (OLAP) architectures (MOLAP, ROLAP and HOLAP), the datasets are stored in a multidimensional format as it involves the creation of multidimensional blocks called data cubes. The cube in OLAP architecture may have three axes (dimensions), or more. Each axis (dimension) represents a logical category of data. One axis may for example represent the geographic location of the data, while others may indicate a state of time or a specific school. Each of the categories can be broken down into successive levels and it is possible to drill up or down between the levels.
Online Analytical Processing (OLAP) partitions are normally stored in an OLAP server, with the relational database frequently stored on a separate server from OLAP server. OLAP server must query across the network whenever it needs to access the relational tables to resolve a query. The impact of querying across the network depends on the performance characteristics of the network itself. Even when the relational database is placed on the same server as OLAP server, inter-process calls and the associated context switching are required to retrieve relational data. With a OLAP partition, calls to the relational database, whether local or over the network, do not occur during querying.
ONLINE ANALYTICAL PROCESSING (OLAP)
Online Analytical Processing (OLAP) functionality offers dynamic multidimensional analysis supporting end users with analytical activities includes calculations and modelling applied across dimensions, trend analysis over time periods, slicing subsets for on-screen viewing, drilling to deeper levels of records, OLAP is implemented in a multi-user client/server environment and provide reliably fast response to queries, in spite of database size and complexity. OLAP facilitate the end user integrate enterprise information through relative, customized viewing, analysis of historical and present data in various “what-if” data model scenario. This is achieved through use of an OLAP Server.
OLAP functionality is provided by an OLAP server. OLAP server design and data structure are optimized for fast information retrieval in any course and flexible calculation and transformation of unprocessed data. The OLAP server may either actually carry out the processed multidimensional information to distribute consistent and fast response times to end users, or it may fill its data structures in real time from relational databases, or offer a choice of both.
Essentially, OLAP create information in cube form which allows more composite analysis compares to relational database. OLAP analysis techniques employ ‘slice and dice’ and ‘drilling’ methods to segregate data into loads of information depending on given parameters. Slice is identifying a single value for one or more variable which is non-subset of multidimensional array. Whereas dice function is application of slice function on more than two dimensions of multidimensional cubes. Drilling function allows end user to traverse between condensed data to most precise data unit.
MULTIDIMENSIONAL DATABASE SCHEMA
The base of every data warehouse system is a relational database build using a dimensional model. Dimensional model consists of fact and dimension tables which are described as star schema or snowflake schema. A schema is a collection of database objects, tables, views and indexes.
In designing data models for data warehouse, the most commonly used schema types are star schema and snowflake schema. In the star schema design, fact table sits in the middle and is connected to other surrounding dimension tables like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.
Most data warehouses use a star schema to represent the multidimensional data model. The database consists of a single fact table and a single table for each dimension. Each tuple in the fact table consists of a pointer or foreign key to each of the dimensions that provide its multidimensional coordinates, and stores the numeric measures for those coordinates. A tuple consist of a unit of data extracted from cube in a range of member from one or more dimension tables. Each dimension table consists of columns that correspond to attributes of the dimension.
Star schemas do not explicitly provide support for attribute hierarchies which are not suitable for architecture such as MOLAP which require lots of hierarchies of dimension tables for efficient drilling of datasets.
Snowflake schemas provide a refinement of star schemas where the dimensional hierarchy is explicitly represented by normalizing the dimension tables. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.
In addition to the fact and dimension tables, data warehouses store selected summary tables containing pre-aggregated data. In the simplest cases, the pre-aggregated data corresponds to aggregating the fact table on one or more selected dimensions. Such pre-aggregated summary data can be represented in the database in at least two ways. Whether to use star or a snowflake mainly depends on business needs.
As OLAP technology taking prominent place in data warehouse industry, there should be a suitable assessment tool to evaluate it. E.F. Codd not only invented OLAP but also provided a set of procedures which are known as the ‘Twelve Rules’ for OLAP product ability assessment:
Multidimensional conceptual view. OLAP operates with CUBEs of data that represent multidimensional construct of data. Event though the name implies three dimensional data, the number of possible dimensions is practically unlimited.
Transparency. OLAP systems should be part of an open system that supports heterogeneous data sources.
Accessibility. The OLAP should present the user with a single logical schema of the data.
Consistent reporting performance. Performance should not degrade as the number of dimensions in the model increases.
Client/server architecture. Should be based on open, modular systems.
Generic dimensionality. Not limited to 3-D and not biased toward any particular dimension. A function applied to one dimension should also be able to be applied to another.
Dynamic sparse-matrix handling. Related both to the idea of nulls in relational databases and to the notion of compressing large files, a sparse matrix is one in which not every cell contains data. OLAP systems should accommodate varying storage and data-handling options.
Multiuser support. OLAP systems should support more than one user at the time.
Unrestricted cross-dimensional operations. Similar to rule of generic dimensionality; all dimensions are created equal, and operations across data dimensions should not restrict relationships between cells.
Intuitive data manipulation. Ideally, users shouldn’t have to use menus or perform complex multiple-step operations when an intuitive drag-and-drop action will do.
Flexible reporting. Save a tree. Users should be able to print just what they need, and any changes to the underlying financial model should be automatically reflected in reports.
Unlimited dimensional and aggregation levels. The OLAP cube can be built with unlimited dimensions, and aggregation of the contained data also does not have practical limits.
Codd twelve rules of OLAP provide us an essential tool to verify the OLAP functions and OLAP models used are able to produce desired result. A good OLAP system should also support a complete database management tools as a utility for integrated centralized tool to permit database management to perform distribution of databases within the enterprise. OLAP ability to perform drilling mechanism within the MDDB allows the functionality of drill down right to the source or root of the detail record level. This implies that OLAP tool permit a smooth changeover from the MDDB to the detail record level of the source relational database. OLAP systems also must support incremental database refreshes. This is an important feature as to prevent stability issues on operations and usability problems when the size of the database increases.
OLTP and OLAP
The design of OLAP for multidimensional cube is entirely different compare to OLTP (Online Transactional Processing) for database. OLTP is implemented into relational database to support daily processing in an organization. OLTP system main function is to capture data into computers. OLTP allow effective data manipulation and storage of data for daily operational resulting in huge quantity of transactional data. Organizations build multiple OLTP systems to handle huge quantities of daily operations transactional data can in short period of time.
OLAP is designed for data access and analysis to support managerial user strategic decision making process. OLAP technology focuses on aggregating datasets into multidimensional view without hindering the system performance. OLTP systems is defined as a “Customer oriented” and OLAP is a “market oriented”. Major differences between OLTP and OLAP systems are shown below.
|Characteristics||Can handle large numbers of small online transactions.||Handles large volumes of data.|
|Query||Simple queries, such as Insert, Delete, and Update information.||Complex queries which require aggregations.|
|Database Design||Normal, with many tables.||Usually with fewer tables and can include star or snowflake schemas.|
|Method||Uses traditional DBMS.||Uses data warehouses.|
|Sources||The OLTP itself and respective transactions correspond to the sources of data.||The various OLTP databases become the data sources for OLAP.|
|Data Quality||Huge effort to ensure the data is ACID-compliant.||The data may not be as organized, but what really matters is the capacity to navigate through the dimensions of the data.|
|Functionality||Online database which modifies a system by controlling and running essential business tasks in real time.||Online database query management system that allows users to discover hidden insights, plan, support decisions, and solve problems.|
|Speed||Typically very fast processing.||Depends on the amount of data. Creating indexes can enhance query speed.|
|Backup and Recovery||Regular backups are vital to ensure the business keeps running since data loss can lead to monetary loss and legal issues.||Requires backup from time to time, and lost data can be reloaded from OLTP database when needed.|
It is complicated to merge OLAP and OLTP into one centralized database system. The dimensional data design model used in OLAP is much more effective for querying than the relational database query used in OLTP system. OLAP may use one central database as data source and OLTP used different data source from different database sites. The dimensional design of OLAP is not suitable for OLTP system, mainly due to redundancy and the loss of referential integrity of the data. Organization chooses to have two separate information systems, one OLTP and one OLAP system.
We can conclude that the purpose of OLTP systems is to get data into computers, whereas the purpose of OLAP is to get data or information out of computers.