Business intelligence is used for reporting purpose, to show the performance of any business using different dimensions like time, region, product etc.
Reporting requires data to be presented in a manner so that decision making can be done on that. For this purpose concept of data warehousing has been introduced.
The data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems to a target data warehouse is called data warehousing.
Design model for Data warehouse:
Generally two models are followed for storing data in data warehouse.
• Normalized approach – In this approach of storing data, the dimensional tables are normalized usually in 3NF. This approach is called snowflaking.
The advantage of this schema :
• Loading data from OLTP(source database) is easy .
Disadvantages of this schema :
• join data from different sources into meaningful information and then
• access the information without a precise understanding of the sources of data and of the data structure of the data warehouse.
• De normalized or Dimensional approach : In this approach the data are divided into dimensions and facts. The fact table is used to store measures to be analyzed while dimension tables are used to store the context on which facts in fact table are analyzed.
Advantages of dimensional approach :
• Complex query processing can be done efficiently.
• The schema can closely depict the business model.
• It incorporates the concept of slowly changing dimensions.
Difference between dimensional model and relational model:
• An E-R diagram (used in OLTP or transactional system) has highly normalized model (Even at a logical level), whereas dimensional model aggregates most of the attributes and hierarchies of a dimension into a single entity.
• An E-R diagram is a complex maze of hundreds of entities linked with each other, whereas the Dimensional model has logical grouped set of star-schemas.
• The E-R diagram is split as per the entities. A dimension model is split as per the dimensions and facts.
• In an E-R diagram all attributes for an entity including textual as well as numeric, belong to the entity table. Whereas a 'dimension' entity in dimension model has mostly the textual attributes, and the 'fact' entity has mostly numeric attributes.
Dimensional approach of data modeling leads to a design approach called OLAP (Online Analytical Processing). The core of OLAP system is an OLAP cube. It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database.
Data warehousing uses the OLAP design approach for data modeling. An OLAP uses a technique of OLAP cube for modeling and building a decision support system.
To understand an OLAP cube, lets take an example.
A company has got a project to warehouse all the sales data of different soft drinks company. The data will be used to create report of sales according to region, time and product of sale.
Here the data from analytical system will be extracted and stored in a data warehouse. The data warehouse will be designed using an OLAP cube as shown below.
Here, there are three dimensions on which our data warehouse reporting will be done. Time, Geography and Product.
If we have to find out the sales of a product class in a particular year for a particular region, we will move to that product class on product axis. Then move to that particular year on time axis and then for a particular geography, move in direction of geography axis. This will lead us to a cube that will give the value of sales we require. This multidimensional approach of warehouse design will help us drill to sales related data for any product class, for any year, month, week based in any geography.
This OLAP cube now has to be implemented on any data storage system. An OLAP designed on a relational database system is called as a ROLAP (Relational OLAP).
For a ROLAP system the data modeling schema followed to implement this multidimensional structure is STAR schema.
In te next post we will see how to model a star schema from a ER model schema..