In my previous post we discussed Kimballs dimensional modelling and how It provides a structure and methodology to the Qlikview data modelling process. Understanding the Kimball model is relatively straightforward, but requires the QlikView developer to be well versed in such topics as Star schemes, Slowly changing dimensions, dimensions, facts and audit keys. In this bloggpost I will be discussing Slowly changing Dimensions Type 1, which are the simplest SCD type and certainly the most common.
Before we delve into SCD logic, and how we implement this in Qlikview I will take a quick look at New Rows and how these are managed.
Consider first a source table, which we use to populate our dimensions. Before a new row is added to the database we see that we have 19 rows
We add a new row
When we run the extract job the changes can clearly be seen in the qvd viewer (qvd viewer is not part of the default QlikView installation but can be fetched free of charge here)
When loading the dimension table we first need to establish any differences between the source table and the rows in the Dimension table. We will use our Business key (ID in the extract table) to find these changes. The Business Key will have been defined in the modelling stage and is an index which identifies uniqueness of a row based on columns that exist naturally in a table according to business rules. We name our business keys with the convention BK<keyname>.
A simple QlikView script will find the rowcount of the dimension table in its current state. We will assign the counted value to a local variable vL.MaxRecVersionNo for later use. Note that the variable name prefixes follow the QDF naming convention. I will cover QDF in detail in a later post, for now it is sufficient to say that vG prefix denotes a global variable wheras vL denotes a local variable.
A similar script will give us the rowcount of the extracted data. Again, the counted value will be stored in a variable vL.Counter
We now have 2 assigned variables vL.Counter and vL.MaxRecVersionNo. We will use these variables in the conditional syntax for the sub calls. If the counter variable is greater than the total rows of the actual dimension then the sub will be called.
Appending an existing Qvd file is relatively straightforward. The source data is loaded and formatted in accordance with the metadata from the modelling process. Observe that we use the variable vL.MaxRecVersionNo to filter the rows that are new.
The next step when appending rows to an existing Qvd is to simply concatanate the existing data onto the new data.
Slowly changing dimensions (SCD)
A common scenario in EDW (Enterprise Datawarehouse ) environments is how changes to Dimension tables are managed. There are a number of definitions for managing SCD:s, referred to as Types ranging from 0-7. We will not cover all of these Types but will focus on the most common, Type 1 and 2. Type 2 will be covered in another post. For a detailed account of SCD:s please read here
In esscence the SCD logic refers to the capturing of slow and unpedictable changes to our dimensions. Consider the following scenario: you have a patient database with all records of your patients. Your patient data is associated with the patient journal. When the data in your patient register changes, for example adress, postal area, name you will most likely want to record thses changes in order to preserve a history of your patients. This is what is called Slowly Changing dimesnions, and is a challenge for all BI vendors.
In my data model I have defined a Dimension table with the name Dim_Player. During my modelling stage I established that I needed to monitor changes to players address. I did not need to preserve history for any addresses, but wanted to know that a change had taken place. This type of change is called an SCD Type 1 and is possibly one of the most commen SCD cases.
In the Qvd viewer we can see the current state of the dimension table. Assume that we are going to change the adress details for Player 19. We can see that the current adress value for this player is Hemvägen.
We now make the changes to the source table, changing the Adress field to Granitvägen.
After we have extracted the data we can see that the new adress for Player 19 has been imported.
The question now is how are these changes incorporated into the dimension table? Well, the main part of the solution is the Qlikview Hash function. We use this function first when we extract the raw data by applying the hash function to the columns we want to check for changes.
If we check the contents of the csv file we can easily see the values that have been generated for the hash column, in this case Ext_SCD1.
When we loaded the dimension table a sinilar SCD file was generated, to keep track of the curent SCD values in the dimension table. We will now compare the hash values of these 2 files to find any changes. First we create a mapping table, using the SCD file generated from the extract.
We then load the SCD file created from the current dimension table.
A quick review of the Dimension table SCD file, The column has been named Dest_SCD1, reveals that player ID 19 has been changed and as such we will expect to see a change in this row when we reload the dimension.
We now map the 2 scd tables with help of the business key
In order to manange the changes correctly we will compare the 2 SCD values and mark them accordingly.
We find a count for the changed rows which will be used in the conditional syntax for calling the SCD1 load sub
The value of vL.ChangedRows is passed as a condition into the SCD1 subroutine. If the value is > 0 then the sub will be called.
We load the Business key and the SCD value for the changed row.
We use an inner join on the extracted data to find the exact row that has been changed. This is the only row that will be loaded.
The next step is to recreate the dimension table from the resident table just created. We hardcode a value for the RowChangeReason column.
After loading we can clearly see the changes that have been made. The audit keys help us to keep track of the changes.
Tracking changes in a Data Warehouse are a common practice and give us the ability to easily see what was changed and when. We use 3 types of Auditkey to track the main processes during the ETL stage, extrct, insert and update. The only requirement for the AuditKey is that it is unique for each time the job is run. A simple way of achieving this is by retreiving the time from the Now() function.
This concludes my second post about QlikView and Kimballs Dimensional Modelling. In the next post I will be looking at SCD2 and how these dimension types are managed in QlikView.