Chapter 9 – Decision Support Systems, Data Warehousing and Data Mining – Introduction to Database Management Systems

Chapter 9






Decision Support Systems, Data Warehousing and Data Mining

Modern computing is largely about decision-making. The success of organisations depends on how correct and timely the decisions are. Data warehousing and data mining are powerful tools to this effect.

Chapter Highlights

  • Data and Information
  • Discussion of Decision Support Systems (DSS)
  • Detailed Coverage of Data Warehousing
  • Introduction to Data Mining
  • Concepts in Online Analytical Processing (OLAP)

9.1 INFORMATION AND DECISION MAKING

9.1.1 Data and Information

Data means raw facts.

Data can be of numerous types, such as:

  • Mark sheets of students
  • Pay-slips of employees
  • File of purchase and sales receipts

When we process data to achieve meaningful results, it becomes
information.

For example, the data items listed above can become information of some sort, as follows:

  • Result summary of an examination
  • Payroll report of a month
  • Purchase and sales register

Thus, information can be defined as data acquired by informing or by being informed. It is knowledge derived from data, study, experience, and research. Information is not raw facts whereas data is.

The relation between data and information is shown in Fig. 9.1.

Fig. 9.1 Data and information

9.1.2 Need for Information

The secret of any successful organisation lies in how much information it has, and how it uses that information to achieve its objectives. Any organisation aims at delivering a product, or providing a service. Of course, some organisations may be non-profit making, in which case their focus is on providing free aid, medical help, or education to deprived people. For any organisation to be successful, profit making or otherwise, the flow of information is vital. This flow of information may or may not be connected directly to the usage of computers. However, it is important that all the concerned people in an organisation have access to the right information that they seek.

Organisations are usually made up of groups, such as Administration, Finance, Payroll, Projects, Human Resources, as so on. Within each group there is usually a hierarchy of people, such as the managers, the middle management the clerical cadre and so on. An organisation chart depicts these levels and their functions. It also tells us about the reporting hierarchy and, thus indicates who holds the power of taking decisions. Usually, the decision-making process consists of aspects shown in Fig. 9.2.

Fig. 9.2 Levels of management and the decision-making process

Let us discuss this in brief.

  • Top management: The top management is concerned with strategic decisions. For instance, the Chief Executive Officer (CEO) or the President of a company deals with the long-term strategic decisions, which are likely to affect the organisation in the medium to long term. For example, the CEO may decide that the organisation needs to tap the US market in the next three years, or that the sales in the next five years should double.
  • Middle management: The middle management takes tactical decisions. Division managers, Section managers, Sales managers, HR managers, are all middle managers. They take decisions for the implementation of the strategic decisions taken by the top management.
  • Supervisory management: The supervisory managers take operational decisions. The focus of these managers is mainly the supervision of staff, monitoring of day-to-day progress of events and taking of corrective actions as and when needed.

Another way of representing the management structure is with the help of the management triangle, which shows the hierarchy of people in an organisation. This is shown in Fig. 9.3.

Fig. 9.3 Management triangle

9.1.3 Quality of Information

In order to take decisions at various levels, the managers in an organisation need different kinds of information. The different types of information can be classified as shown in Fig. 9.4.

Fig. 9.4 Types of information

The types of information are discussed in brief on next page.

  • Unstructured information: This type of information is summarised, may be outdated, subjective in nature and related to future events. It deals with a number of areas and covers activities inside and outside of an organisation.
  • Semi-structured information: This type of information is a combination of the other two types of information.
  • Structured information: This type of information is up-to-date, detailed, not subjective and concerned with past events. It deals with a small range of facts and deals with the internal details of an organisation.

9.1.4 Value of Timely Information

Information is used to take decisions, convey and evaluate actions, find results and exchange ideas. For information to be effective, data must be processed as soon as it is available and the information must be communicated to those who need it immediately.

Information is always in great demand. However, the main problems related to quality of information are accuracy and timely availability. In today's global marketplace, this need is even more acute. Corporations require information for taking decisions and extending relationships with customers, suppliers and partners.

The information used by organisations comes in from various different systems and sources. It is stored and organised in different databases. In this regard, data warehousing has rapidly become a novel and popular business application. Builders of data warehouses consider their systems to be key components of their IT strategy and architecture. There has been tremendous progress in terms of the hardware and software for developing products and services that specifically target the data warehousing market.

A data warehouse is the organised data that is situated after and outside the normal operational systems.

In simple terms data warehousing refers to the systematic manner of storing historical data, so that it can be retrieved and used effectively and speedily. The term data warehouse itself suggests that here data is stored just as physical warehouses store goods and materials. This is significant because it indicates that data is as important as the physical goods and services that it helps to produce, and produce better.

Decision Support Systems (DSS) are used to create knowledge bases, which are useful in taking decisions about organisational needs.

9.1.5 Historical Data

As we know, the primary focus of a computer-based system is the currently operational systems based on transactions and the data that they process. Normally, in any operational system, there are master databases such as customer database, supplier database, product database, and so on. Transactions such as orders, receipts, invoices, payments and returns update these databases continuously. The idea is illustrated in Fig. 9.5.

Fig. 9.5 Transactions update master databases

The DBMS ensures the integrity of these databases through rollbacks and other processes. In operational databases, not only are updated master databases maintained but also transactions, at least for some time for reporting and analysis. This is exactly where the problems related to data storage begin. How much data should be kept current and how much data should be archived, and when? It is not practical to keep data in the operational systems for long periods of time when not needed. For achieving this, a structure was designed for archiving data that the operational system has processed. The basic requirements of the running and archive systems are totally different. The running systems need performance, whereas the archive systems need flexibility and also bigger scope. It is not advisable to hamper the currently running systems with loads of archive data.

All early business data processing was done using mainframe computers. Although this has changed over the years, more than 70 per cent of business data processing is still done by mainframe systems. The most important reason behind this is that these systems are robust and fast and can store large amounts of data even if they are bulky and not so user-friendly. Therefore, the programs running on them for decades (embodying vast business knowledge and rules) are incredibly difficult to carry or port to a new platform or application. Such systems are called legacy systems. The data in such systems, when archived, goes on large tapes to remote data centers. An organisation needs many reports from the archive data. Thus, the programming effort for producing these reports is huge.

Of late, the increasing popularity of the personal computer has introduced many new tools/programs for business data analysis. With this, the gap between a computer programmer and the end user has reduced to some extent. Business analysts have many of the tools required (e.g. spreadsheets, advanced word processors) for data analysis and their graphic representation. In fact, due to this, many managers developed their own Decision Support Systems (DSS) based on these tools.

However, such data was usually created independently without any reference to the huge number of transactions residing on the main system; or if it was derived from it, the reports were not timely as it took a long time to process the large amount of archived and current data to produce the result of the query. Due to this, many chose the former approach. As a consequence, hundreds of different unconnected, and sometimes, inconsistent, systems arose on the desktops of business managers. A problem with this scheme is that it makes data fragmented and oriented towards very specific needs. Each user obtains only the information that he needs. Due to the lack of standardisation, the data fragments tend to be incapable of addressing the requirements of multiple users. Data warehousing addresses this issue in an effective manner, so that the individual pieces of knowledge and intelligence no longer remain isolated and scattered.

Data processing was a very common term since the 1960s. Electronic Data Processing (EDP) departments were set up to take care of data processing needs of businesses, military and other institutions. Over the years, this definition was changed to Information Technology (IT). Now, the term EDP is near-obsolete, although a few organisations still like to use it for their computing operational needs.

9.2 WHAT IS A DATA WAREHOUSE?

There are many definitions of a data warehouse, for example the one we saw in the introduction. Going one step further, we can say that a data warehouse is a systematic organisation of data that originates either inside an organisation or outside of it; and which is needed for historical purposes. Examples of data that can go into a data warehouse are: details of orders of a particular product for the last two years and a public database that contains the sales information of all competitors.

The idea is to organise the data in such a way that the summaries of historical data are kept along with the detailed transactions; such that a majority of the high-level management queries can be resolved quickly. Thus, a data warehouse forms the backbone of decision support systems. An important point is that the data stored in data warehouses is not for changes, but for drawing long-term conclusions. This is the main difference between an ordinary operational DBMS and a data warehouse. Any operational transaction (order, invoice, cash recipient etc.) updates a regular operational DBMS. However, data-warehouse is updated only at certain intervals and only summaries of transactions may be produced.

For faster response time and flexibility, the data warehouse may be organised around the concept of Multidimensional DBMS or MDBMS instead of Relational DBMS or RDBMS. This is another way in which it differs from ordinary DBMS. In data warehousing, normalisation of data is not important even if it enhances speed or responsiveness in answering a query; whereas in DBMS, normalisation is done.

Let us illustrate the difference between operational data and a data warehouse with the help of a diagram as shown in Fig. 9.6.

Fig. 9.6 Operational data and a data warehouse

As the figure shows, data warehousing involve a coordinated, planned and periodic copying of data, from various sources both inside and outside an organisation, into an environment that is optimised for analysis and processing of that data. Clearly, the main reason for data warehousing is the inability of the traditional computer systems to provide an organised and systematic access to data. If traditional systems could take care of the problem, data warehousing would not exist in the first place! If you could keep all the data in high-performance disks in the form of databases and retrieve it as and when needed across groups and locations, there would be no need to duplicate it! But as we saw earlier, this is not the case and therefore, data warehousing has emerged as a necessary step in managing historical data.

With the use of data warehousing, companies can identify sales trends, patterns of customer or vendor behaviour or product failure far more quickly than it was possible earlier. Today, data warehousing products that use the Internet as a means of sourcing and delivering data are also available.

9.3 DATA WAREHOUSING CONCEPTS

Having looked at the traditional use of archive data and explored some of the factors paving the way for the evolution of data warehouses, we will now turn our attention to studying the key characteristics of a data warehouse.

  • Warehousing data outside the operational systems

    The whole idea behind data warehousing is that the data stored for business analysis can be accessed most efficiently if it is separated from the data in the running systems. The reasons for separating the current data from analysis data have not changed significantly with the evolution of data warehousing systems. These reasons are related to performance issues and ease of maintenance. It is desired that the current data be accessible faster and with lesser efforts as compared to historical data. Therefore, the two should be kept separate.

  • Integrating data from more than one operational systems

    As discussed, for data warehousing systems to be successful, data from different operational systems can be combined. When the data needs to be brought together from more than one application, it is expected that this integration be done at a place independent of the source applications. Before the evolution of structured data warehouses, analysts would perhaps combine data extracted from more than one operational system into a single spreadsheet or a database. The data warehouse may very effectively combine data from multiple source applications such as sales, marketing, finance and production. Many large data warehouse architectures allow for the source applications to be integrated into the data warehouse on an incremental basis. The main reason for combining data from multiple source applications is the ability to cross-reference data from these applications. For example, the sales and orders data might be combined with the forecasts and the targets to do a variety of analyses.

    Imagine a company with seasonal sales. The management may want to know the orders backlog, actual sales and sales forecast for the next month (current/operational data) along with the sales data of the same month for the last 10 years (historical data). In this case, if we summarise the historical/archive data that the management is likely to require in the future for analysis, admit it in the data warehouse and integrate it along with the operational data (i.e. link it to records based on keys such as product code, customer etc.), it will help. But, for this, the data warehouse analysts have to analyse the management's current and future requirements for decision support reports/queries in order to be able to structure and design the data in the data warehouse.

  • Differences between transaction and analysis processes

    The primary reason for separating data for business analysis from operational data is performance degradation on the operational system. This degradation can happen due to the analysis processes. Operational systems always require high performance and quick response time. Operational systems are designed for acceptable performance for pre-defined transactions. For instance, an order processing system might specify the number of active order taken and the average number of orders in each hour. Even though many of the queries and reports that are run against a data warehouse are predefined, it is nearly impossible to accurately predict the activity against a data warehouse.

    This is why, at the cost of data redundancy/duplication, data warehouse is built with some operational data and some historical summarised data for fast reports/queries.

  • Data is mostly non-volatile

    Another important attribute of the data in a data warehouse system is: it is brought to the warehouse after it has become permanent. This indicates there are no modifications to be made in the data after it is in the data warehouse. This saves data warehousing systems the burden of maintaining transaction and database integrity (rollback etc.) and enhances speed.

  • Data saved for longer periods than in transaction systems

    Data from most operational systems is archived after it becomes inactive. For example, an order may become inactive after a set period from the fulfillment of the order; or a bank account may become inactive after it has been closed for a period of time. Large amounts of inactive data mixed with operational data can significantly degrade the performance of any transaction. Since the data warehouses are designed to be archives for the operational data, they are saved for a very long period.

Thus, in comparison to operational systems, data warehousing systems contain data that is non-volatile, historical and can tolerate with slower responses. These distinctions are very important and are summarised in Fig. 9.7.

Data warehousing means maintaining older data, and making it available as and when needed in the best possible format and access mechanism. It has gained immense popularity with the advent of Customer Relationship Management (CRM) systems.

In short, the separation of operational data from analysis data is the most fundamental data warehousing concept. Not only is the data stored in a structured manner outside the operational system, but businesses today are also allocating considerable resources to build data warehouses.

Fig. 9.7 Transfer of data from operational systems to a data warehouse

  • Using a data warehouse for better business processes

    Having understood what a data warehouse is and its construction at a broad level, let us review the processes supported by a data warehouse. Some of the processes involved in a data warehouse are predefined and not much different from traditional data analysis activity. However, other activities such as multi-dimensional data analysis and information visualisation were simply not available in traditional analysis tools and methods. They have been developed recently and form a very important part of a data warehouse system. This has been shown with the help of a figure later in the chapter.

  • Tools to be used against the data warehouse

    One of the objectives in the creation of a data warehouse is to make it as flexible and as open as possible. It is not desirable to spend too much in terms of software and training for using a data warehouse. The data warehouse should be accessible by most end-user tools and platforms. However, it is not possible to make every feature of the data warehouse available from every end-user tool.

    Primitive tools such as simple querying facilities built into common spreadsheet programs may be sufficient for a user who only needs to quickly refer the data warehouse. Other users may require more powerful multi-dimensional tools for analysis. Data warehouse administrators need to identify the tools that are supported for access to the data warehouse and the features that they offer. A user can start with a primitive, low-level tool that he is already familiar with and, after becoming familiar with the data warehouse, he may be able to justify the cost and effort involved in acquiring more sophisticated tool.

  • Standard reports and queries

    Many data warehouse users need to access a set of standard reports and queries. It is preferable to periodically produce a set of standard reports that are needed by different users, with the help of an automatic process. When these users need a particular report, they can just access the report that has already been produced by the data warehouse system rather than running it themselves. This is more relevant if the reports take a long time to prepare.

    Such a facility would require report server software. It will essentially periodically produce several reports automatically and index them. It is likely that these reports can be accessed only by using the client program for that server system. When the client makes a request for a specific report, the server software can do the index search and send the report if ready, or else can produce it and then send it. This facility would need to work with or even be a subset of the desired data warehouse access tool mentioned in the previous section. Many end-user query and analysis tools now include server software that can be run with the data warehouse to produce reports and query results. Keeping up with technology developments, these tools are now providing a web interface for accessing reports. The data warehouse users and administrators need to consider any reports that are likely to become standard reports for the data warehouse on a continual basis.

  • Queries against summary tables

    The summary reports in a data warehouse can form the basis of analysis. Simple filtering and summation against the summary views can be a major analysis activity against a data warehouse. These summary views contain predefined standard business analysis, based on certain rules. For instance, the product summary view may be the basis for a very large number of queries. Different users could choose different products and the time periods for product sales queries. These queries provide quick response and they very easy to build.

  • Data mining

    Often, the reports and queries run on the summary data are sufficient to answer quite a few what business questions. A drilling down into the real data provides answers to why and how queries. This process is called data mining.

    A data-mining user starts with summary data and drills down into the detailed data looking for arguments to prove or disprove a hypothesis. Data mining tools are evolving rapidly to satisfy the need to understand the behaviour of business units such as customers and products. These tools will go through the heaps of detailed data to find some correlations - again multidimensional - such as if a customer is from New York, then he is likely to place orders worth $500 or above for three successive months starting August (probability = 80%). It is up to the management to validate them or take them seriously. Data mining can use a lot of modern techniques such as neural networks. to find these correlations and find some method to madness in huge archive data.

  • Interface with other data warehouses

    Chances are that a data warehouse system is interfaced with other applications that use it as the source of operational system data. Also, a data warehouse may provide data to other data warehouses or smaller data warehouses called data marts.

    The operational system that interfaces with the data warehouse often become increasingly stable and powerful over a period of rime. As the data warehouse evolves into a reliable source of data that has been consistently moved from the current operational systems, many applications find that a single interface with the data warehouse is much easier and more functional than multiple interfaces with the operational applications. However, much of the operational state information is not carried over to the data warehouse. Thus, data warehouse is obviously not a replacement option for the operational systems. This is shown in Fig. 9.8.

Fig. 9.8 Systems interacting with a data warehouse

The figure shows the analysis processes that run against a data warehouse. Although, to a large extent, the activity against data warehouses is primitive querying, reporting and analysis, sophistication of analysis at the high end is fast evolving. More importantly, all analysis done at a data warehouse is easier and cost-effective as compared to the earlier methods. This simplicity is still the main attraction of data warehousing systems.

9.4 DATA WAREHOUSING APPROACHES

A data warehouse can be simple or it can be complex and the approach to creating a warehouse is determined accordingly. Three major categories of approaches are: enterprise data warehouse, data mart and operational data store. This is shown in Fig. 9.9. These approaches differ in size and complexity. Let us discuss them one by one.

Fig. 9.9 Data warehousing approaches

9.4.1 Enterprise Data Warehouse

An enterprise data warehouse provides a consistent and broad-level view of the entire organisation. The access mechanisms and standards used across the various departments are very similar as a result of this integrated approach. Clearly, the scope and size of an enterprise data warehouse are much bigger than individual data marts. Consequently, these are much more complex to build and maintain. However, they offer a very systematic, structured and centralised view of the data to the entire organisation.

An enterprise data warehouse is not restricted to a department or subject area. It takes years to build and costs many times more than a data mart. In comparison with a data mart, an enterprise data warehouse is updated less frequently (say weekly). The number of users of an enterprise data warehouse could be in the order of hundreds or even thousands.

9.4.2 Data Marts

A data mart is a subset of a data warehouse. Rather than looking at the global picture, a data mart is restricted to a particular subject area. A data mart is usually created specifically for a department or a business process to enable better decision-making. Therefore, there can be a number of data marts, each focusing on a subset of the overall business. People often need quick solutions to improve their departments or focus areas. An enterprise data warehouse for the whole enterprise usually takes a long time to build. Rather than waiting for it to be created, data marts are preferred by these departments. The challenge in this approach is the consistency and the interaction in the future.

Data marts are of two types as shown in Fig. 9.10.

Fig. 9.10 Types of data marts

9.4.2.1 Dependent data mart In this case, the data mart does not have an existence of its own in terms of extracting the data. The data in a dependent data mart is actually brought from the enterprise data warehouse. Hence, it is a subset of the latter. The enterprise data warehouse feeds data to the smaller, focused data marts. This is shown in Fig. 9.11(a). You need to build the enterprise data warehouse first and then extract data marts for better performance, access and management.

Fig. 9.11 Types of data mart

9.4.2.2 Independent data mart An independent data mart does not depend on an enterprise data warehouse. Instead, it derives the data from transactional systems directly and operates independently of the enterprise data warehouse. This is shown inFig. 9.11(b). It must be pointed out that integrating many such independent data warehouses to form a single enterprise data warehouse is not easy, unless it is planned accordingly right at the beginning.

Data marts are useful for focusing on specific areas. For example, when a company wants to improve sales, it might be useful to have a data mart in the sales and marketing department to analyse the sales figures and trends. That could form the basis for further decision-making.

Data marts are smaller and hence simpler to deploy. However, there is always a danger-data marts can grow quickly and can create independent islands of information - a problem that data warehousing itself originally aims to solve! To avoid this, logical integration of data marts into decision-support systems must be planned beforehand. Remember that this is different from trying to physically combine many data marts into an enterprise data warehouse. We are not talking about physical integration here. It is the overall application integration that is being referred to.

9.4.3 Operational Data Stores

The concept of operational data store (ODS) is relatively new. It is used to provide time-sensitive decision-support and operational reporting. Usually, an ODS is built from an operational system and is restricted in the form of subject area. An ODS contains integrated and subject-related integrated data from transactional systems. The data itself may be in a summary or detailed form. As data is modified in an operational system, a copy is sent to ODS. Therefore, in contrast to a data warehouse, data in an ODS changes on an ad-hoc and frequent basis. Data warehouses are subjected to less frequent and systematic changes.

An ODS combines the characteristics of data warehousing systems with that of operational systems. It combines data from one or more operational databases into a central storage from where users can access the latest information in an integrated fashion. An ODS usually contains current, detailed and operational data that can be accessed by many users simultaneously. The data in ODS is mainly used for operational rather than decision-making purposes.

Why have an ODS when you already have an operational system? The reasons are many. As we have seen, the operational systems need the highest performance. Therefore, they are generally not suitable for reporting purposes. Instead, they can provide inputs to ODS systems. ODS is an integrated view of data, rather than the separate views offered by operational systems. Furthermore, an ODS can act as an intermediate storage between an operational system and a data warehouse by holding data before it is sent to the data warehouse.

9.5 ONLINE ANALYTICAL PROCESSING (OLAP)

Data warehousing can be understood better by comparing the normal On-Line Transaction Processing (OLTP) data management activities to the On-Line Analytical Processing (OLAP) activities. Whereas OLTP systems are used for supporting day-to-day business operations, OLAP systems are decision-support systems. OLTP applications capture primitive data about business and are mostly transaction-oriented. On the other hand, OLAP systems need access to operational as well as historical data. Thus, OLAP applications tell an organisation what needs to be done, by analysing data and producing decision-support reports used by business managers. Also, OLTP applications help the organisation to actually act on these decisions! In many ways, OLTP applications are the basic ingredients of the Information Technology (IT) department of an organisation. On top are OLAP applications which provide various add-ons for quick decision-making on customer-care and services, increasing revenues, and so on. Clearly, OLAP applications need a data warehouse for these purposes.

Traditionally, creating, using and managing decision-support systems required a number of specialised tools and databases. However, there were several limitations, such as:

  • Lack of analytical sophistication
  • Database layout limitations
  • Inability to handle or process large amounts of data

For overcoming these limitations, organisations made use of many technology solutions, such as data warehousing/data mining and OLAP tools.

OLAP applications provide ways and means to analyse complex data, based on natural and intuitive set of business rules and dimensions, such as the profit figures per market, territory and product. Additionally, OLAP insulates a common user from the technical details of data storage and organisation. Thus, even non-technical users can use OLAP to perform analysis based on the available data. OLAP applications provide for complicated computations, which include sophisticated time-series analysis and also random ad-hoc interactive analysis. Example of the latter is a sales manager starting with a low-performing market and tracing the cause of the problem all the way down to a small feature in one product.

Decision support systems usually have limitations in performing detailed analysis due to one or more of the following reasons:

OLAP is a generic term for a group of decision support applications. Data is usually entered into a relational database through an online system that supports the notion of transactions. Once it is in the database, data is offloaded, reformatted and accessed in special ways to improve the execution and answer complex queries, which is not possible otherwise.

OLAP is categorised into four categories as shown in Fig. 9.12.

9.5.1 Desktop OLAP

Desktop OLAP is also called as client-side OLAP. Examples of this category are products that send data from a server to a client desktop computer. The data processing is mainly client-centric. It is used for end-user reporting and analysis. Small data warehouses are the best bet for desktop OLAP.

Online Analytical Processing (OLAP) is different from Online Transaction Processing (OLTP). OLAP is more closely related with decision-making, whereas OLTP is associated with day-to-day operations of computer systems which may or may not be related to decision-making.

Fig. 9.12 Types of OLAP

9.5.2 Relational OLAP (ROLAP)

The Relational OLAP stresses on query execution and processing inside the source database from where data was picked up. In ROLAP, records are stored as rows and columns. This gives a flat view of the data as shown inFig. 9.13. The figure shows the number of tourists in a season at different places, preferring different lodging types. This might be useful in certain cases. However, for a more detailed analysis, this may not be sufficient.

Fig. 9.13 A Relational data visualisation using ROLAP

9.5.3 Multidimensional OLAP (MOLAP)

Multidimensional OLAP is a specialised server-side database. It takes data from relational databases of transactional environments and stores it in a unique format. This allows faster query execution. Data at summary level is usually stored, and thus offers multiple views or dimensions of the original data. Especially useful in forecasting, financial planning and budgeting, the main characteristics of MOLAP are fast access to summary data and data analysis combined with data updates.

Let us contrast MOLAP from ROLAP diagrammatically. Contrast Fig. 9.14 with the earlier one to get a multidimensional view of the same data (the data are different, the main aim is to show the differences between ROLAP and MOLAP).

Fig. 9.14 Example of MOLAP

The name multidimensional comes from the fact that the data view is not just two-dimensional, but in the form of hypercubes. For example, the product data can be analysed against time and different locations-clearly a case for multi-dimensional analysis. This can be done even with relational databases, but not without compromising on access speed and complexity, as explained below.

A specialised Multidimensional Database Management System (MDBMS) is used in MOLAP. The internal design of an MDBMS differs from that of a conventional Relational Database Management System (RDBMS). MDBMS is more suitable for data warehousing applications and offers better access performance. While an RDBMS is optimised for data updates, an MDBMS is optimised for data access. A standard RDBMS joins two tables at a time. However, if you want to join three tables, two separate steps are required: first join A and B to form an intermediate table, say T; and then join table T with C. As a generalised convention, joining n tables needs (n-1) joins. However, using what is called as a star join, an MDBMS can join up to eight tables at one go. This improves performance tremendously.

Other reasons that make MDBMS perform much better in comparison to RDBMS are attributed to the fact that an MDBMS is read-only. Therefore, no transaction support, locking and log maintenance are required.

9.5.4 Hybrid OLAP

Hybrid OLAP is a combination of MOLAP and ROLAP. Such requirements are felt with relation to products that need to support both the client-side and server-side processing within a database. A typical case for hybrid OLAP is data center or a network-based processing of operational data stores.

For each of these categories, ready-made products are available in the market, which perform the OLAP job.

Client-side OLAP

Data mart

Data warehouse

Dependent data mart

Enterprise data warehouse

Hypercube

Information

Master database

Multidimensional OLAP (MOLAP)

Online Analytical Transaction Processing (OLAP)

Operational Data Store (ODS)

Port

Semistructured information

Strategic decisions

Tactical decisions

Data

Data mining

Decision Support Systems (DSS)

Desktop OLAP

Hybrid OLAP

Independent data mart

Legacy systems

Multidimensional Database Management System (MDBMS)

Neural networks

Online Transaction Processing (OLTP)

Operational decisions

Relational OLAP

Star join

Structured information

Unstructured information

  • Information is necessary for the successful running of any organisation.
  • People at various levels, including the managerial staff, need information to carry out their day-to-day activities.
  • Raw facts are data, which, when processed, become information.
  • Information can be structured, semistructured, or unstructured.
  • A data warehouse is the organised data that is situated after and outside the normal operational systems.
  • A data warehouse is a systematic organisation of data that originates either inside an organisation or outside of it, and which is needed for historical purposes.
  • The main characteristics of a data warehouse system are: it is outside the operational systems, it integrates data from more than one operational system, it is different from transaction-based systems and has longer life.
  • A data-mining system starts with summary data and drills down into the detailed data looking for arguments to prove or disprove a hypothesis.
  • Three major categories of approaches are generally made in the context of data warehouses: enterprise data warehouse, data mart and operational data store.
  • An enterprise data warehouse provides a consistent and broad-level view of the entire organisation.
  • A data mart is a subset of a data warehouse. Rather than looking at the global picture, a data mart is restricted to a particular subject area.
  • The operational data store (ODS) is used to provide time-sensitive decision-support and operational reporting.
  • The data in a dependent data mart is actually brought from the enterprise data warehouse.
  • An independent data mart does not depend on an enterprise data warehouse. Instead, it derives the data from transactional systems directly and operates independently of the enterprise data warehouse.
  • Online Analytical Processing (OLAP) applications tell an organisation what needs to be done by analysing data and producing decision-support reports used by business managers.
  • OLAP can be classified into Desktop OLAP, Relational OLAP, Multidimensional OLAP and Hybrid OLAP.

  1. Data is the same as information.
  2. Data is raw facts.
  3. Information is unprocessed data.
  4. Senior management needs data.
  5. Data warehouse deals with live data.
  6. Data mining and data marts mean the same thing.
  7. An independent data mart does not depend on an enterprise data warehouse.
  8. ODS stands for Online Decisions and Support.
  9. OLTP and OLAP mean different things.
  10. MOLAP is the same as hybrid OLAP.

1. Raw facts are called as __________.
(a) data
(b) information
(c) entropy
(d) none of the above
2. Unprocessed information is ________.
(a) data
(b) detail
(c) entropy
(d) none of the above
3. Senior management takes _______.
(a) strategic decisions
(b) tactical decisions
(c) operational decisions
(d) all decisions
4. Middle management takes _______.
(a) strategic decisions
(b) tactical decisions
(c) operational decisions
(d) all decisions
5. Supervisory management takes _______.
(a) strategic decisions
b) tactical decisions
(c) operational decisions
(d) all decisions
6. Data warehouse deals with _______.
(a) past data
(b) live data
(c) data of the future
(d) None of them
7. _______ deals with the entire organisation.
(a) Data mine
(b) Data mart
(c) Data mart
(d) Enterprise data warehouse
8. _______ is a subset of a data warehouse.
(a) File
(b) Table
(c) Data mart
(d) Data mine
9. In a _______ we start with summary data and then move into detailed data.
(a) data mine
(b) data mart
(c) data mine
(d) enterprise data warehouse
10. _______ is a combination of _________ and ________.
(a) Hybrid OLAP, ROLAP, MOLAP
(b) MOLAP, ROLAP, desktop OLAP
(c) ROLAP, desktop OLAP, hybrid OLAP
(d) ROLAP, MOLAP, hybrid OLAP

  1. Discuss the terms data and information.
  2. Why is information useful to organisations?
  3. Discuss the levels of management and the information that they need.
  4. What is a data warehouse? What are its types?
  5. Describe the term data mart.
  6. What are dependent and independent data marts?
  7. What is data mining?
  8. What is OLAP?
  9. What is a MOLAP?
  10. Discuss the term ODS.

  1. What is Management Information System (MIS)?
  2. How are forecasts related to information?
  3. Why has past data gained so much of prominence in the last few years?
  4. Study any one data-warehousing project.
  5. Which RDBMS products support data warehouse concepts?
  6. Is it easier to perform data mining by using a file system as compared to a DBMS? Why?
  7. In which fields is MOLAP more useful?
  8. Find information about data warehousing tools.
  9. Is data warehousing related to Enterprise Application Integration (EAI)? Investigate.
  10. Study the concept of information overload.