Data warehouses should be fast and accessible. Data modelling is the way to ensure this, says Ralph Kimball
Dimensional modelling is a technique for data warehouse design that has become popular because it is good at handling end-user queries. Despite its widespread acceptance, however, some misperceptions persist in the industry.
These false assertions are a distraction, especially when you want to align your team around common best practices. If folks in your organisation continually lob criticisms about dimensional modeling, their perceptions may be clouded by some common misunderstandings. This list of myths may help: it is extracted from The Data Warehouse Toolkit – The Definitive Guide To Dimensional Modelling by Ralph Kimball and Margy Ross
Myth 1: Dimensional Models are only for summary data
This first myth is frequently the root cause of ill-designed dimensional models. Because you can’t possibly predict all the questions asked by business users, you need to provide them with queryable access to the most detailed data so they can roll it up based on the business question. Data at the lowest level of detail is practically impervious to surprises or changes. Summary data should complement the granular detail solely to provide improved performance for common queries, but not replace the details.
A related corollary to this first myth is that only a limited amount of historical data should be stored in dimensional structures. Nothing about a dimensional model prohibits storing substantial history. The amount of history available in dimensional models must only be driven by the business’s requirements.
Myth 2: Dimensional Models are departmental, not enterprise
Rather than drawing boundaries based on organisational departments, dimensional models should be organised around business processes, such as orders, invoices, and service calls. Multiple business functions often want to analyse the same metrics resulting from a single business process. Multiple extracts of the same source data that create multiple, inconsistent analytic databases should be avoided.
Myth 3: Dimensional Models are not scalable
Dimensional models are extremely scalable. Fact tables frequently have billions of rows; fact tables containing two trillion rows have been reported. The database vendors have wholeheartedly embraced DW/BI and continue to incorporate capabilities into their products to optimise dimensional models’ scalability and performance.
Both normalised and dimensional models contain the same information and data relationships; the logical content is identical. Every data relationship expressed in one model can be accurately expressed in the other. Both normalized and dimensional models can answer exactly the same questions, albeit with varying difficulty.
Myth 4: Dimensional Models are Only for Predictable Usage
Dimensional models should not be designed by focusing on predefined reports or analyses; the design should center on measurement processes. Obviously, it’s important to consider the BI application’s filtering and labeling requirements. But you shouldn’t design for a top ten list of reports in a vacuum because this list is bound to change, making the dimensional model a moving target. The key is to focus on the organisation’s measurement events that are typically stable, unlike analyses that are constantly evolving.
A related corollary is that dimensional models aren’t responsive to changing business needs. On the contrary, because of their symmetry, dimensional structures are extremely flexible and adaptive to change. The secret to query flexibility is building fact tables at the most granular level. Dimensional models that deliver only summary data are bound to be problematic; users run into analytic brick walls when they try to drill down into details not available in the summary tables. Developers also run into brick walls because they can’t easily accommodate new dimensions, attributes, or facts with these prematurely summarised tables. The correct starting point for your dimensional models is to express data at the lowest detail possible for maximum flexibility and extensibility. Remember, when you pre-suppose the business question, you’ll likely pre-summarise the data, which can be fatal in the long run.
As the architect Mies van der Rohe is credited with saying, “God is in the details.” Delivering dimensional models populated with the most detailed data possible ensures maximum flexibility and extensibility. Delivering anything less in your dimensional models undermines the foundation necessary for robust business intelligence.
Dimensional models most certainly can be integrated if they conform to the enterprise data warehouse bus architecture. Conformed dimensions are built and maintained as centralised, persistent master data in the ETL system and then reused across dimensional models to enable data integration and ensure semantic consistency. Data integration depends on standardised labels, values, and definitions. It is hard work to reach organisational consensus and then implement the corresponding ETL rules, but you can’t dodge the effort, regardless of whether you’re populating normalised or dimensional models.
Presentation area databases that don’t adhere to the bus architecture with shared conformed dimensions lead to standalone solutions. You can’t hold dimensional modeling responsible for organisations’ failure to embrace one of its fundamental tenets.
Ralph Kimball is one of the pioneers of data warehousing, and his Kimball methodology, or dimensional modelling, is a standard in decision support. This is an extract from The Data Warehouse Toolkit – The Definitive Guide to Dimensional Modelling by Ralph Kimball and Margy Ross.