Implementation of Datawarehouse with Kimball Methodology

Author: Moises Inestroza

The business intelligence is an array of methodologies, software and technologies which provide inside and well-structured information to companies and organizations that works as support for taking decisions.

It allows to get data related to entry into new markets, product offering, removal of information islands, financial controls, cost optimization, production planning, customer profile analysis, products profitability and so on, to analyze them in order to turn them into knowledge.

In practice, the business intelligence is a determinant factor to succeed and its implementation wouldn’t be so easy if we didn’t have an instrument as the Kimball Methodology, for the construction of Datawarehouse circumscribed to the company scope in an integrated and steady way but variable over time.

As a result of our experience on the Datamart design for datawarehouse system based on the Kimball Methodology we have rescued some notes and suggestions we will present in the following article which we consider relevant for those who want to implement and explore solutions of that type:

  • Once the process that will be designed a OLAP solution (database oriented to analytic processing) is selected, it is necessary to perform an analysis and create a OLTP diagram (database oriented to transaction processing) of each implicated model. To understand clear enough the dynamic of these data by verifying through consulting scripts, It will be crucial to make decisions of OLAP model. The diagram must embody all the structures to consider, that is to say, all the information that is wanted to remove to datawarehouse and its relations.
  • To identify correctly the “Grain” where the information is, in other words, the minimal detail level to get a certain measure or identify certain event that has occurred.
  • To carry out an analysis to create a OLAP model which specifies the dimension tables and resulting events, preferably proposed in a “Star Topology”. In this data model all the information must come together from different identified sources.
  • It is advisable to name the tables with the appropriate prefix according to each table OLAP, (DIM,FACT) nature. The names of the DIM tables (dimensions) have to end up in singular, as for the FACT tables (facts) in plural.

Importance of OLAP (DIM, FACT) tables

As we know, OLAP model is made up of dimension and fact tables. The dimension tables (DIM) are tables that contain attributes or fields which are used to restrict and gather stored data in a fact table (FACT).

For those small catalogs that represent little or no frequency over time (countries, category). These ETL logic is basic. It consists on cleaning and reloading the data from the source. Theoretically, these types are known as SCD type 1(Slow Change Dimension)

We wish to keep a versioning of dimensions in order to have a tracking of changes in values in certain ETL logic entity. The current registration has to be compared in the DWH to the OLTP current registration source. For example, in the case of a customer who has a Civil Status attribute which at certain point is “Single” but later his/her token gets updated to “Married”. We have to keep both customer’s versions, just in case we want to analyze his/her transactions later by comparing his/her behavior when he/she was Single or Married.

Upon finding any difference in some of his/her attributes, a new registration is made in the DWH dimension with a new subrogated key and feeding 4 additional control attributes:

VERSION, CURRENT_VERSION (Flag with 0 for No and 1 for Yes), DATE_SINCE (Timestamp date since the new version is valid).

  • It is necessary to create subrogated keys for all the dimensions, that is to say a new key, a unique correlated value different from the natural key which comes from the registration in the OLTP source system that is also kept.
  • In the fact table must appear all the events or transactions that happen such as sales, payments, purchases, logs and so on. It is advisable to include from each linked dimension, not only your subrogated key but also your natural key since at certain point, the registration in the dimension may have changed the version so there have occurred transactions linked to the previous version and the current version. By having a natural key all the facts linked to certain entity could be identified more easily without needing to travel to the dimension.
  • The whole DWH table must have a timestamp which allows to determine what date, time, minute and second certain register was created in the database. This is very useful when for supporting and monitoring as well as audit purposes.
  • In every dimension default registrations must be created for those cases where the information is not available, does not apply or simply does not exist so distinguish those scenarios in the facts. To this effect it has to be inserted into each dimension and must be used with subrogated keys and natural keys, values that mustn’t match with the real keys values, for example: (0)value for the case N/D , Not defined, the value minus one( -1) for the case N/A, Not applicable) and the value minus two (-2) to be determined. (TBD)
  • It is necessary to be very aware of how to handle the Null cases, suggesting to assess the following alternatives: to leave the Null value in numeric variables type amount, counting and fees since the aggregation functions already know how to handle the Null cases so if it is changed to 0 value, it won’t be possible to determine which are the null cases or which are the true cases with (0). If there is a Null in the alphanumeric fields, leave it ‘N/A. As for the foreign id dim there mustn’t be left any Null value. It has to be aiming at the default registration of 0,-1,-2.
  • To have a “Time” dimension with date pre-calculation already made so this way saving those exploiting the DWH information from this processing task. The time dimension could have the following information:
  • Month, year, first day of the month

  • Full date including the name of the day.

  • The short name of the day, the week day in numbers.

  • The short name corresponding to the day of the week.

  • Weekend flag.

  • Flag whether it is a workable day or not, that is to say if they are working days.

  • The day of the month in number.

  • The day of the year in number.

  • The week of the month in number.

  • The number of week of the month.

  • The week of the year.

  • The number of days the month has.

  • The end of the month date.

  • The month of the year.

  • The name of the month of the year.

  • The name of the shortest month of the year.

  • The combination of both year and month in numbers.

  • The year quarter which corresponds a certain date.

  • The year quarter name which correspond to a certain date.

  • The year quarter termination date.

  • The year semester which corresponds to a certain date.

  • Flag if it is month-end closing.

Having outlined the notes  at the present article, we hope we have helped you to have a better understanding and significance of a datawarehouse functioning in order to be corporate successful. Finally we put Go Consultores to you orders where any of our teamwork will be glad to present you our service.