Open Sauce


or Size Matters continued…

Earlier, I wrote a blog post named Size Matters in which I tried to offer a very average scoping estimate of the phases of a DWBI implementation, based on my experiences.  I did this because a project manager asked for it. I did it very hesitantly, anticipating some controversy, and controversy did happen. When I exposed the estimates to the TDWI Linked In audience, I got a lot of different opinions and widely varying estimates, depending on point of view. Naturally. Estimates depend on the requirements, or what you’re trying to achieve at any specific moment. One size doesn’t fit all. Some even claimed to have a secret sauce that estimates the scope, depending on the number of source systems and other factors.

This got me thinking about coming up with an “open sauce” (pun on “secret sauce” and “open source”)  to help project managers scope their DWBI efforts. There are several aspects that can add complexity to a DWBI project. One is the complexity of the ETL layer. The other is the complexity of the BI layer. Of course, the design of the data warehouse also contributes to the scope. So does QC, and UAT. The requirements gathering effort, itself, can be quite complex, depending on the state of the organization’s understanding of what they need in terms of managing their performance or analyzing their data.

There is no easy way to do this, or a formula. However, I can think of an approach to do this that works well in most situations. You will have to customize it to your situation, of course. This applies to the first iteration, as well as subsequent iterations. The assumption is that the technology platforms are already chosen – picking the right technology is an entirely different topic for another upcoming blog post.

Requirements come first

You may not immediately think of requirements gathering as constituting the bulk of your effort. However, you may be surprised. Certainly it is the most critical step in the methodology. Clear requirements definition is going to ultimately make or break your DWBI effort. This is not only because you yourself are trying to understand what you are trying to build. It is also about sending a message to the business side of the organization – a message that states that it is their needs that is driving the DWBI effort. It is too easy to fall into the trap of a data driven approach – an effort driven by the data sources in the organization and their integration – a kind of “build it and they will come” approach. The risk of failure here is higher, because it will be harder to prove to the business side, whose support is critical to the DWBI effort, that the effort will ultimately be of value to them. Unless the whole organization, including the business side, thinks along the lines of a data driven approach, that approach can result in failure, and one needs to be careful about blindly approaching it that way.

In DWBI solutions, requirements are most useful if they are in the form of business questions (or analysis questions) such as, “Am I being able to achieve my objective of Sales Growth, Year over Year? Can I look at it by Customer, by Product Lines drilling into Product Types and Products, and by Sales Region drilling down into Customers and Customer Sites?” This gives you a clear indication of the KPIs/metrics/measures being used to measure performance against objectives in the organization. It also tells you about the context around the measurement – the dimensions, and the hierarchical structures built around their attributes. When you first discuss this with your users, you may start with the analysis areas or major objectives at a high level, and gradually break it down into a list of detailed business questions at the right level of granularity. By the end of the requirements gathering effort, you should be able to create a list of KPIs/metrics/measures with definitions, and a list of dimensions and hierarchies, preferably with the attribute level detail as well.

Part of requirements gathering is also a discussion around the visualizations that the user will see, and the type of interactivity. Visualization examples are dashboards and reports, and each may include multiple charts, graphs and lists. Without going overboard, you should be able to list the key visualizations and describe them, if not graphically, at least in words. Interactivity can mean (a) prompts changing the context by applying various filtering criteria to the data, (b) OLAP-style drill-down and drill-across type of behavior, or (c) drill-throughs into other reports. Having a documented basis for the visualizations and interactivity, which is then sent back to the users for feedback and validation, will save you a lot of grief later. Keep your BI Architect handy during this exercise, as there is a dangerof proposing something entirely too “blue sky” to be supported by the technology of choice.

To accurately scope the requirements gathering effort, spend a little bit of time (a week or so) gathering some high-level information, before you commit to a fixed deadline. This information should allow you to scope the detailed requirements effort itself. The information should include the number of user groups and individual users (CEO, CIO, CFO, COO) you would have to interview. Also, try to gauge the current state of strategic thinking in the organization. Are objectives clearly defined? Does the business generally agree on the KPIs/metrics/measures? Depending on where the maturity of organizational thinking with respect to performance management is, the requirements gathering scope could vary greatly. If the organization is at the very early stages of maturity in their thinking, and this is the first time a DWBI effort is being rolled out to the organization, it could take 6 weeks to 3 months (sometimes more) to get all the groups to agree upon a consistent set of definitions of KPIs/metrics/measures, dimensions, hierarchies, and visualizations – as well as the priorization of requirements, an area where the sense of importance of each group can become a major complicating factor. However, requirements gathering is the most critical activity in your DWBI effort, and rushing it will cause you nothing but pain later – even failure. If there is pressure to start development (perhaps because a team is waiting for you to fire the starter pistol), one tactic is to gradually define areas of clarity in the requirements, for which development can begin. Defend the need to produce a clear requirements document with gusto. It might be the most important thing you did to ensure the success of your DWBI effort.

Requirements Translate to Conceptual BI Model

Once requirements have been clearly defined, you should be able to translate the detailed business questions to a conceptual BI model. A BI model is a conceptual representation of the information elements, the metrics,  dimensions, and hierarchies, and their inter-relationships. It should also represent the visualizations of information, including dashboards, reports, and cubes (OLAP structures) that users would see. Occasionally, specialized scorecarding interfaces need to be considered as well, for monitoring of performance metrics.

You need to develop a clear vision of the key BI information elements and visualizations that you will need to consider to support the requirements. Note that, often, these artifacts come in collections, to represent a complex BI system, such as a set of reports or dashboards. Typically, many BQs will be supported by a BI collection of reports and dashboards. It’s a many-to-many relationship. Talk to an experienced BI practitioner in the technology platform of choice for this high level understanding of the BI system that you would need.

Pay special attention to the C-level executives and Group Leaders’ vision for a dashboard. These are the people that are navigating the ship. They need the right feedback mechanism to ensure that their journey is in the right direction, making adjustments if needed.

If you spent adequate time on the requirements gathering, the conceptual BI model development should not take a lot of time. Plan to spend 2 weeks on each analysis area, such as HR, Sales, or Finance, depending on complexity. You will need to adjust this ballpark estimate to your exact situation, of course. Be sure to send the Conceptual BI model back to your users for feedback.

When your conceptual BI model is finalized, you should be able to get a list of reports from it, and discuss ballpark estimates with your BI Architect. It is difficult to suggest a ballpark without knowing the complexity of the report and the technology platform. It is best this be left to your specific situation.

Major Foundation Entities determine ETL effort

Now that you have defined your Conceptual BI Model, you are ready to design your DW data model, which will also help you estimate your ETL effort. Start with the major entities that you would need. This is the bulk of your effort. Think of the major dimensional entities, such as Customer, Product, Vendor and Account. Then, think of the fact entities such as Sales Transactions, Cost Details, Receivables Transactions, and Payable Transactions. Then, list small lookup tables or the Calendar dimension at this stage. They’re minimal effort each, but the total may add up.

Now, for every one of the major entities, think about the number of sources you have to go after and the data quality or integration issues that might occur. Determine whether such data quality or integration issues need to be handled in the DW ETL layer, or whether there are other systems in place, such as MDM or ODS, to address this. Each of the major entities listed above can result in multiple man-months of effort, depending upon the number of data sources and the complexity of the rules. For example, if you have to deduplicate customer records in ETL, or reconcile products and product hierarchies across organizations, each of these themselves could be man-months. It is better to crack such problems in a piecemeal manner, iteratively. In other words, solve your Customer or Product integration issue in your operational system first, and then use a single integrated master data source as a source for the corresponding dimension in your DW. Also note that, usually, in the case of major entities, you will have to adopt a Changed Data Capture (CDC) approach to extract the data from source systems. Depending on the level of complexity, assign 2-4 weeks for each major entity in your data model. Simple lookups should not take more than half a day each. Again, these are ballparks and need to be adjusted to your specific situation.

BI Needs Performance Support

Next, relate the BI artifacts to your foundation entities. Think about any additional data transformations you may need to support the BI artifacts, that might be implemented as summary data structures, using ETL. These will make your BI simpler. Otherwise, you can think about implementing these transformations in BI, but it does complicate that layer, and can potentially result in performance issues, especially if the BI technology platform is a pure SQL generator (generates SQL based on the report dynamically at the time of the report request and sends it back to the database). List the number of summary tables (you can group these into data marts, based upon the analysis area) that you need. The ETL logic for summary tables is typically not that complex. The number of them would influence the overall scope. Talk to your ETL resource one more time to get a ballpark of the summary layer development.  A simple summary should not take more than a few days. A straightforward Group By and Sum should take a day or two. There are alternatives to summary tables as well, including OLAP databases (cubes) and materialized views. The level of effort will depend on the implementation choice. A single OLAP structure can be equivalent to a number of summary tables. A disadvantage to OLAP structures is that they are less scalable than database tables. List your proposed BI artifacts and discuss them with your BI resource to get ballpark estimates.

A DW design that is well aligned with business requirements results in a simpler BI layer, which is a good thing. You want most of your business rules to be captured in your ETL logic, and your metrics and other derivations to be in your data model. This is a much more open (SQL accessible) architecture and results in better sharing of the derivations across the enterprise. Also, less complex reports means better performance, especially if the reporting technology is based upon dynamic SQL generation against a relational database. Of course, data caching, using a proprietary OLAP or columnar database, or a data warehouse appliance, will help alleviate the performance of reporting.

Once you have the major entities listed and have understood how many sources each would draw upon and what data quality or integration issues you might face, you can begin to conceive what the level of effort might be for the foundation ETL. If you have a specific technology platform in mind, such as Informatica,  Abinitio, or Datastage, you should bring in an experienced senior ETL developer or architect in that technology and discuss the complexities for each entity at a high-level. An experienced developer or  architect should be able to give you a ball park figure. An inexperienced or extremely detail-oriented person will usually want a lot of details. This is not helpful at this stage.

Now that you have the ballpark estimates for each entity, you have a ballpark for the ETL effort for the DW foundational data model. This is typically the bulk of your effort. Next comes the BI effort. This may include some summary data mart development, to support the BI layer. That requires some ETL as well. We will get into that later.

All’s Well that Ends Well

 Finally, take your ballpark estimates for ETL and BI development, and apply a % to it to roughly estimate QC. The level of QC depends on the level of unit testing being done on the development. and can equal the development effort or exceed it. Furthermore, data quality in a DWBI effort requires particular focus, and you should have dedicated individuals on this activity. Data quality issues, or a lack of trust in the quality of data in the DW, is well known as a DW killer.

Think about a strategy for UAT as well. The scope of this is very much dependent on the number of users the solution is being rolled out to, and the role of the user (report user, report author, etc.) Consider user training needs as well. Exposing a new DWBI effort to the organization without some initial guidance usually does not go well.

Feedback

This was the longest post I have ever written. It started out with a casual entry and developed into a multi-weekend effort. I hope this is useful to the DWBI community.  If you want some of these areas to be further elaborated, by all means, let me know. Good luck with your DWBI efforts!

3 Comments

  1. I am sorry for my brash remarks but I am always amazed how all data is thrown in to massive garbage bins and then advance technologies are used to segregate different pieces of data from the bin for aggregation and analysis.

    In my view, enormous investments in DWBI become inevitable because of the current RDBMS architecture, that separates from its contexts. It is by possible by adopting an alternate strategy of storing data along with all its contexts, which would obviate the need for elaborate infrastructure and procedures for KM, BI and data mining

    1. MAJ, Could you please explain? I am interested to find out more about the basis behind your opinion. I am assuming your opinion is that DW should not be required as part of the analytical framework in an organization. Can you tell me why in detail?
      Thanks, Shubho

  2. totally agree with you.. but you know very few ppl follow this approach insead they speak on the air try to follow the so called rubbish called CMM .. unnecessary stupid so called “Quality” process on paper

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s