Self Service Data Prep with Lavastorm Analytics

In this blog series we will be taking a look at the The Lavastorm Analytics Engine. Lavstorm is categorized as Self Service Data Preparation Software and is much more than an out of the box ETL tool. At a first glance it is easy to compare the product to counterparts such as SSIS or SAS Institutes DI Studio. However, when one looks a bit deeper it becomes apparent that Lavastorm is a much more powerful and flexible tool than these two. Lavastorm positions itself as an Anlaytical Platform, the plumbing behind visual front end tools like Qlik Sense, Tableau or Spotfire. Lavastorm provides the foundations and buliding blocks to build and maintain powerful data flow pipelines to transform and manipulate data before publishing the end result in a vendor specific format.

Self Service Data Preparation as a term is relatively new and is very closely associated to the Self Service BI wave that is dominating the Data Visualization market right now. Products like Qlik Sense and Power BI provide the user with excellent tools to bulid a dashboard or report which they then can distribute to both desktop and mobile devices. What these products do NOT give the user are easy to manage preparation tools from which they can easily build and maintain complex data flows. This is esentially what Self Service Data Preparation vendors are targeting, The white space between the raw data and the final visualisation.

Getting the Software

Getting the software and licence key was a breeze, as was installation. Simply go to this adress  , fill out the application and you will receive an e-post with your licence and install instructions.My philosophy when evaluating Software is first and foremost learning by doing. Great Software, for me, is Software that does not require the user to spend hours researching the product. The user interface should be intuitive, inticing and instructive. The user should be able to start using the product based on his/her prior experiences with similar Software or based on his/her area of skills.

After the, very simple, install procedure was complete i was able to start building and modelling complex lineages. Drag and drop functinality and a sparse canvas from which to work on are the main features in the Lavastorm desktop environment, the BRE. The concept of  Self Service is very apparent in the GUI design and feature choice. The Software invites you to explore the node library through visual cues placed in optimal points on the screen. The Human eye has a tendency to explore the corners of the screen, and in the bottom left hand corner you can find the nodes cleverly categorized and arranged for easy access.

nodeszoom

Whilst Self Service is the focal point for the product, advanced users are also catered for by providing interfaces to Java, Python and R. These features are accessed easily under the category Interfaces and Adapters.

interfaceszoom

Lavastorm BRE (Business Rules Engine)

The canvas area or worksheets in Lavastorm consist of graphs and libraries. Graphs are essentially the workflows in your solution wheras libraries are your global nodes. Libraries store nodes which can be configured and reused across the solution.The Library view is accessed on the top left hand side of the current worksheet. Once in the library, pre-compiled adapters can be be custom configured for reuse across your solution.

1zoom

The available libraries are visible on the left hand side of the pane. Click on Lavastorm or All to gain access to the pre-compiled nodes which you can simply drag into your own library (local library)

2zoom

Once you have dragged a node onto your Local library pane, you will be encouraged to provide a name for it. The DB Query node is a typical node that will be reused across a solution. By dragging this node into your local library pane and configuring it with the necessary credentials you will be able to reuse this node as often as you like in your solution.

3.1zoom

I was very impressed by the range and grouping of the nodes in lavastorm. The product really does enable access to a vast array of sources and formats and provides som very interesting custom transformations. What I was particulary impressed with was the specific output nodes for Qlikview, Tableau and Spotfire. With these output nodes Lavastorm puts itself in a very good position to provide backend support for these predominantly front end tools.

3zoom

Testing

A quick test of the product proved to be a very satisfactory experience. I wanted to test a common ETL scenario by importing som raw data, transforming it and loading it as structured data in the qvx format (for use in QlikView). I set up an ODBC System DSN on my machine and configured it against my SQL Server 2012 installation. I placed a DB_Qury node in my local library, configured it with the necessary authentication settings and ran it against one of my databases. The configuration windows in the DB_Query node are relatively straightfoward. The SQL query syntax is written in the SQL Query window  and the Connection window refers to the authentication settings for the node. The Optional window is what I experimented most with as it was here I could experiment with Lavastorms own scripting language BRAIN. I found BRAIN  very easy and appealing to use. The rules are few and the syntax is very straightforward. I have only spent a couple of hours with BRAIN, and there is heaps to learn, but In that short space of time I have grasped the fundamentals of the keywords: Emit, Rename, Override and Exclude.

Building a Dimension table

In the screenshot below I have used BRAIN to format the input, create a couple of hash values as checksums and omit some of the inbound fields. The Syntax for manipulating fields in BRAIN is verysimple:

FieldName.Function1().Function2().Function3().

4zoom

The Surrogate key for the table has been created with the RecordNumber() function. I have added 1 to this number as I do not want a Surrogate key of 0. The hash values have been created in 2 steps. The first step was to create a concatanation of the required values. The assigned field name for this new string was then sent into the hashcode() function to create the final checksum value. I found that I had to do this in 2 steps as the hashcode() function only accepts one argument. There may be other ways to achieve this in BRAIN with one function, if not then it would be a nice feature if the hashcode() could accept several arguments.

5zoom

The next step is to load the output into QlikView. From the output categories In the graph view, pull a QlikView output node onto the canvas and configure the output window with an appropriate filename. I configured the window for exception behavior by asking the node to raise an error if the file exits. All output in Lavastorm is clearly visible by clicking on the output indicator on each node.

6zoom

The completed data flow for the Dimension table is very simple. We read the data from the database, apply som filter logic and output the data to Qlikview (qvx) and csv.

14zoom

 Building a Fact table

After the Dimension table was created I wanted to build the fact table. Two  database reads and an inner join were required to get the data. This could of course have been achieved directly from one database read with the appropriate join syntax, but I wanted to test Lavastorms correlation nodes.

7zoom

I used the Inner Join node for my transformation and quickly found that the inbound fields are required to be sorted. Fortunatley this is easily acheiveable in Lavastorm with the radio button settings SortLeftInput and SortRightInput. It is important that both of these are set to true.

8zoom

The output rows are declared and formatted in the Output window for the inner join node.

9zoom

When loading the Fact table we want to make sure that the Business keys exist in the Dimension tables. To do this we use the Lookup function to exactly match the incoming Buisness keys to the Business keys in the the current dimension table. We repeat the lookup procedure for all Business key references in the Fact table. We can clearly see that the Lookup transformation returns 54 rows, thus exactly matching the rows in the current dimension table.

10zoom

On the General window the input keys or Lookup values are defined.

11zoom

The Output window is used to define the output from the Lookup. We exclude the referance to the matching key value in order to retain uniquness for the lookup value.

12zoom

The final step in in the Fact table data flow is to load the data. For this we use a Qlikview output node. The node is very easily configured by simply denoting a file path and an exception rule in case the file exists.

13zoom

The final Dataflow for the Fact table is slightly more complex and includes a couple of Lookups and filters before outpuuting the result to Qlikview(qvx) format.

15zoom

Viewing the final output in Qlik Sense is very straight foward. Simply create a connection to the folder where the qvx files reside, fetch the qvx files and read them into Qlik Sense.

sensezoom

Overall I spent a couple of hours on Lavastorm and was able to read a datasource, create a dimension table with change logic and create and load a Fact table. I had no prior knowledge of the product and was nonetheless able to start using it instantly and to effect. There is plenty to test and experiment with in Lavastorm and in the next post I will be looking at som of the more powerfull correlation nodes like X-ref and deep X-ref.

Links

http://www.lavastorm.com/

Share on LinkedInTweet about this on TwitterShare on Google+Share on FacebookEmail this to someone

Kommentera

E-postadressen publiceras inte. Obligatoriska fält är märkta *