QlikView has rightly positioned itself as an extremely fast and user freindly analytical interface for decision support. Qlikview sits comfortably on top of a DW or can even be deployed as an ETL to Analysis (all in one tool). The debates are many as to wheather or not QlikView is sufficient, or even adequate, as an ETL and data storage tool and I will not be fuelling that debate any more or less in this post.
What is, however, apparent is that there are a hive of best practices from the DW and ETL industry that can be reused and applied directly in QlikView projects. In this first post in a forthcoming series on QlikView and Data Warehousing I will be examining how Kimballs Dimensional Modelling can be applied in Qlikview projects.
In short, Kimballs Dimensional Modelling can be condensed into these steps:
• Select the business process:
Business process events generate or capture performance metrics that translate into facts in a fact table. Most fact tables focus on the results of a single business process. Choosing the process is important because it deﬁnes a speciﬁc design target and allows the grain, dimensions, and facts to be declared
• Declare the grain:
Atomic grain refers to the lowest level at which data is captured by a given business process. this is especially important as granularity will determine EXACTLY what your fact tables will present. Deciding the grain for your fact tables is largely dependant upon the source data. Deciding the source data is therefore a necessary step when declaring the grain.
• 1. Decide on your sources of data.
• 2. Declare the grain of the fact table (preferably at the most atomic level).
• Identify the dimensions:
• Add dimensions for “everything you know” about the grain.
• Identify the facts:
• Add numeric measured facts true to the grain.
Consider now a typical QlikView project and the steps normally associated with it:
• Identify and limit the problem to be solved
• Acces the raw data. (E)
• Transform the raw data (T)
• Build the visualisation (L)
Steps 2,3 and 4 are typically regarded as the ETL within the development process. The extraction of the raw data is followed by a transformation which results in the loading and visualisation of the data.
Introducing Kimballs dimensional modelling into our QlikView Development process would yield the following:
1. Identify and limit the problem to be solved.
2. Declare the grain.
3. Identify the dimensions.
4. Identify the facts.
5. Acces the raw data.
6. Transform the raw data.
7. Build the visualisation.
What then does Kimballs Dimensional Modelling give to QlikView and why should it be incorporated into the QlikView Development process? Well, firstly, Kimballs Dimensional modelling is a De Facto standard when building large EDW (Enterprise Data Warehouses) in the world today and as such is a modelling method with a huge number of references and success stories. Secondly, and prehaps more importantly, is the fact that all QlikView applications are entirely dependant upon the quality of the data model. Thirdly, the Kimball model gives our datamodel a robustness and authority, thus making it easier for Developers to justify and motivate design chioces in the modelling stage.
How then should Developers approach the Kimball model, and how is it used to enforce and strengthen the QlikView datamodel? Using Kimballs Dimensional modelling is not difficult but requires an understanding for what denormalization means, what Business keys are and what Surrogate keys are. Furtermore, it is an advantage for Developers to have an an understanding of Slowly Changing Dimensions and how these are implemented.
When starting the Modelling process I prefer to condcut a workshop with the Business Users involved in the process I am attempting to visualize. The result of this workshop will give a high level Conceptual Model of the facts and dimensions within the area to be modelled. When dealing with Business Users I tend to avoid using terms such as measures, facts and dimensions. Instead, I will encourage the users to talk about the values or figures that they are interested in and how they usually look at these figures. I find Xmind to be a particulary efficient tool when conducting user workshops.
The Conceptual model gives us the basis of our facts and dimensions. In order to be able to finalize the structure of our Dimension and fact tables we will need access to the Source Systems. Once this is established we can define our Business keys and our remaining Dimension attibutes.
Of particular use in this stage of the modelling process is the Kimball toolkit, in esscense an Excel sheet where the dimension and fact tables can exactly be defined.
Step one in the defining of dimension tables is to correctly establish the Buiness key for the table. A Business key or natural key is an index which identifies uniqueness of a row based on columns that exist naturally in a table according to business rules. For example, Business keys are Customer code in a Customer table. The Primary key in the dimension is a Surrogate, or Technical key. The Surrogate key is simply an integer (1,2,…n) for each row in the dimension table. The Surrogate key will be the link between the fact table and the dimension table.
Once the Business key is ascertained remaining attributes in the dimension table are defined and described. For each column we will define how Nulls are handled and we will define the Display name for the column. This is of particular interest to the end users as the display name column dictates how the data will be labelled in the final application. We will even define the SCD Type for the attributes in the Dimension table. I will save SCD (Slowly Changing Dimension) for a later post. For now it is sufficient to say that the SCD dictates how we will manage changes to the dimension table.
When the dimension tables are built we will also take into consideration the data type for each attribute. For text attributes the char size will be defined as will the data format type for date attibutes. Default values will also be defined.
In a similar way we will define the original source data types, the table they came from, or file, and the original attribute name.
Of particular interest in the dimension tables are the attributes for auditing and tracking changes. Audit keys are used to track the incoming data thereby making it easier to distinguish What was loaded and when. It is a common Practice to define Audit keys for Extraction, insert and update.
Audit keys can easily be defined in the QlikView Load script. By simply using the time function derived from the Now() function We can obtain a integer value.
(Click to view a larger picture)
Note that the Now() function utilises the timer mode. We pass 0 into Now() to ensure a New value for each auditkey. By passing 0 into the Now() function We get a value each time the script is executed.
For in depth information about timer mode settings I recommend the following post by Matt Fryer: qlikview-funtions-today-and-now
The row is current attribute indicates the status of the current record (true , false) and the start and end dates dictate when the row became valid and when it expired.
The completed excel sheet is our metadata for the data model. We can use This meta data to comment Fields and or tables in our Load script.
The column definitions in the Load script follow the instructions of the metadata, allowing us to correctly manage Null values and providing the correct formatting to the read data. As datatypes do not exist in QlikView, providing the right format is a good practice to optimize the loading of data into RAM. A particularly Good post on how QlikView stores data in memory can be read here: symbol-tables-and-bit-stuffed-pointers
As I have previously explained, the Kimball dimensional model gives us a way to involve end users and encourages the Developer to think about the Data model and how the data from the source systems can be modelled in facts and dimensions. By using the dimensional modelling toolkit, we can involve end users and encourage them to think about how the data will be displayed. Furtermore, the toolkit gives us excellent MetaData which we can either use to comment fields and tables in the QlikView script or load as a separate entity into our Data model.
In this post we have covered an introduction into Kimballs dimensional modelling and how we can use these principles in QlikView projects. In the next post we will look at Slowly changing dimensions and how we can impement these in Qlikview.