Size Matters


A project manager asked me the following question – are there templates available to size the various phases of a DW/BI project? I tried to come up with one, but realized its futility. There are too many variables. I have listed the phases and some guidance to scoping each of them, but assigning even rough estimates to any of them is going to be too much of a SWAG even for someone as dangerously eager to incur the risk of random flak from strangers as myself.  I did it anyway. I’ll regret this.

Her question pertained to an Agile methodology (repeated iterations) but I think that aspect is immaterial to this conversation.

The phases are –

Requirements Gathering, of course

Design – Data Model, ETL Design, Report Design, OLAP Design

Development – ETL, Reporting, OLAP

Testing

Release – UAT if needed

Typically, and I offer this with a BIG pinch of salt, requirements gathering should take the least amount of time, followed by design and testing. Development should be the most time consuming activity. Having worked on a variety of platforms and designs, I have found, generally speaking and in broad terms, that ETL development seems to be, most of the time, more effort intensive than Report development. However, again, it depends on many factors, such as how well the design aligns with the requirements. If the data warehouse design directly supported the requirements by having the appropriate metrics and dimensions in place, report development should be more straightforward, but, again, that is a very generalized statement.

Having made the appropriate CYAs, I hesitantly offer the following rampant generalizations for a breakdown.

Requirements – 10%

Design – 10%

ETL Development – 30%

Report Development  – 20%

Testing – 20%

Release – 10%

In a 10 day Agile cycle, this translates to 1 day of requirements, 1 day of design, 3 days of ETL development, 2 days of report development, 2 days of testing overall, and 1 day for packaging and releasing the deliverable.

Other factors that influence the variability of estimates are the exact requirements, of course, the technology platform used (some make it harder to write reports or easier to write ETL), the experience level of resources, etc.

That is the best I could do. I think probably the best advice is to have someone in the room who has gone through the process a few times.

13 Comments

  1. I am a experience proffessional of 25+ IT experience and 10+ years as Architect

    since the availabity of various tools for ETL and reporting, I will suggest following break up

    Requirements – 30%

    Design – 20%

    ETL Development – 10%

    Report Development – 10%

    Testing – 20%

    Release – 10%

    In a 10 day Agile cycle, this translates to 3 day of requirements, 2 day of design, 1 days of ETL development, 1 days of report development, 2 days of testing overall, and 1 day for packaging and releasing the deliverable.

    regards

    Pankaj Parikh

  2. I was interested in your post because I like to take a very similar approach, althought not formally adopting the agile method. The cycle varies from 1 to 4 weeks (on an existing EDW) to apply increments of functionality. The difference is that I try to break down independent projects for EDW/DB and for Access-Layer/Report. The lag allows for some maturation time to ensure that any issues are resolved before exposed to users.

    I usually deal with large data and highly interrelated structures, particularly where aggregations are concerned, and sometimes I separate the effort into three interrelated projects, i.e. base data with ETL, derived data with scripts/procs, and finally access and reporting. A break between establishing the data infrastructure and building the access layer views and reports is advantageous. Of course the analysis and design may require some common vision, but this is not hard to factor in.

    Regards,
    Michael

    1. Michael,
      I appreciate your comments. They are very pertinent to this type of conversation. They would be useful to someone looking for guidance for scoping their own efforts.
      Regards,
      Shubho

  3. For the requirements gathering part, I have published quite a few articles on the Requirements Management, hope you’ll get a chance to read some.

    IMO, requirements gathering should not be a trivial task, as this is the root of most (if not all) future problems in the project.

    1. Thanks. I did not mean to trivialize the requirements gathering phase, but was trying to come up with a rough figure for iterative DW cycles. I think this post did earn me a lot of flak in the end, as I thought. Thanks for your comments. I’ll be sure to check your site for the information on requirements gathering.

  4. Shubho,
    Excellent article!

    I agree with the Pankaj that the requirements gathering is, and should be, the most time intensive process. Also, despite of the majority of the BI/IT/IS community’s feeling to the contrary, I believe that requirements gathering/fine-tuning can go on through the most of the development cycle, until the designated “buffer” time to allow for the stable delivery.

    V

  5. I agree with the comments that requirements gathering should take more time, but it seems to be more like 10% rather than the required 30% in most projects for some reason due to lack of understanding of requirements for requirements (if that makes sense).

    I do believe that the ETL development component is under estimated in all previous posts. From experience, I have seen that that the ETL development and ETL SIT is the most time consuming aspect of any DW project. Reason is that often we deal with differing sources and therefore routines for historical data versus the daily/weekly/monthly etc delta loads. I have seen the ETL development (including SIT) taking up to about 50% of the project time. This includes the reconciliations back to source systems etc (often hugely under estimated).

    UAT is another aspect that is under esimated. Proper UAT of any DW/BI solution is time consuming.

    H.

  6. I will be indebted if someone pls comment on this .. I have an interview very soon..

    if we follow the following for data mart sizing and effort is it correct ?.. please comment asap ..

    This we can assume

    2 resource (one db and one ETL ) 4-6 weeks for Reqmt … i.e. say 60 mdays ..

    dev effort ( including documentation ,physical model , detaild design , chnage data capture , unit testing ,
    metadata rules , os scripting etc etc ..)

    Type 1 dim = 15-20 m days avg 17 madys
    Type 2 dim = 15-25 m days avg 20 m days
    Type 3 dim = 15-25 m days avg 20 m days

    fatct table = 15-25 m days avg 20 m days ( partitions , indexes , fk’s etc etc )
    aggegate fact tables = 15-25 mdays avg 20 m days

    Now suppose we have 10 type2 dimensions, 5 fact and 5 aggegates ( all avg it) comes to around 300 mdays

    so total till now 300+60=360 m days ..

    add 3 resources for performance testing ( statspack, explain plan , tkprof , benchmarking if possible , )=70mdays

    total till now =360+70 =430 m days

    add 3 resources for testing( integration /system etc ) 4-6 weeks say around 80 m days

    total now 430+80= 510 m days

    around 70 mdays for initial data loading and testing so 510+70=580 m days

    add 3 resources for UAT 4-6 weeks around 70 m days = 580+70=650 m days

    add QC 5-10 % to it .. say 40 m days = 650+40=690 m days .. round it to 700 ..

    add 10-15 % for project management say 100 days = 700+100=800 mdays ..

    add 50 days buffer = 800+50=850 mdays ..
    is this approach correct .. pls tell .. i have a interview very soon ..
    i have good understanding for the technical part in DW but unfortunately never have a chance to work in any project..
    please reply ….i will be indebted…

    1. Sasanka,
      These are very generalized numbers and cannot be applied to every situation. Without knowing the requirements, it would be risky. One cannot say with any degree whether the numbers are wrong or right. Your numbers sound reasonable for facts or dimensions of moderate complexity, but there are so many factors.
      The first step is to gather the requirements of the project. A business analyst or project manager is more appropriate as the primary resource for that. This person will need to draw upon a DW Architect and a BI Architect for a high-level understanding of implementation complexity. The requirements itself can take 6-8 weeks. If the technology platform is pre-decided, the detailed planning for the effort can be done at that stage. I would push back for a ballpark estimate until the requirements have been gathered. Look at my later post, Open Sauce, for an overview of the planning process. This post was too generalized to be useful.
      Good luck on your interview.
      Thanks,
      Shubho

  7. Shubho Thanks for the reply .
    I totally agree with you.

    There are other factors like capacity planning, HW/SW configuration , no of sources , volume of data , complex transformation logic , procedure for change data capture etc etc …

    i assumed the client already had internally did their own analysis and as an app service provider we will just implement this DW. that is why i gave only 60 m days for requirment analysis .

    The characteristic of the system as follows

    No of source is only one .
    Change data capture is decided and it is through oracle async auto log mode.
    informatica , oracle 10g , external table, mat view for aggr will be used
    data volume is moderate and it will be trickle load and 500 mb -1.5 gb of data each day.
    the data transformation is not that complex …

    what i wanted to confirm is that if the requirment and initial analysis phases are completed , can i follow my approach for rest of the estimate without going into FP based or any other apporach. Whether some one will call my approach crap.

    I read your open sauce , i liked it and from that i took some idea for the earlier post.

    will be glad if you can reply. Thanks in advance.

    1. Sasanka,
      There is nothing wrong with your approach, though it seems a little on the conservative side though. 850 man days for an application with 5 facts, 10 (Type II) dimensions and 5 aggregates (simple mat views) seems too much. That translates to a 10 member team occupied for 4.5 months. That doesn’t sound right, especially if it is one data source, the CDC portion is already handled on the Oracle side, there are no data quality issues you have mentioned and the summary tables are simple materialized views. My rough in-my-head estimate comes out to about 5 X 20 + 10 X 10 + 5 X 5 = 225 for development, unit testing and documentation, 4 man weeks for Requirements at the most and 2-4 weeks of final QC and UAT, totally up to about 300 man days, or a team of 4 roughly working for 3.5 months. 2 ETL developers, 1 DW Architect, 1 QC. If you had to include BI, which wasn’t included in your estimates, you might add one more BI developer to the team.
      Hope that helps,
      Shubho

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s