I attended a round table on the Impact of Big Data on Business Intelligence. The event was organized by NewVantage , a Boston based management consultancy, and included Wayne Eckerson, the previous well known face of TDWI.
Invitees included leaders of Boston financial and other firms, and I went along upon invitation by Wayne, who I knew from contributing to the Linked In TDWI group when he was moderating it. I finally met him, as well as Jens Meyer of First Marblehead, who I had known through LinkedIn years ago.
Participants included people like myself who aren’t really experienced in Big Data technologies and practices, and people who had that experience. There were really interesting people who participated, I must say, and many good points were made and insights provided. It got me thinking and prompted me to write a blog after 2 years. How time flies.
Big data has been disruptive to EDW and business intelligence, in my experience as part of an IT organization. This has happened because of business going out to Big Data solution vendors directly, bypassing IT. IT simply did not have the answer to some of business’ big data needs and honestly does not seem to have the vision to understand how it fits. As a result, business groups who are becoming more aware of such externally provided solutions and do not like partnering with IT in the first place, are developing partnerships with Big Data Vendors directly, involving IT only to provide EDW data to such solutions. This seemed like a IT failure at first, but in the end, this felt like the right thing to do, and these cloud solutions became part of the overall solution in the end, an extension of IT-provided EDW and BI. I took this insight with me to the discussion and my focus was to get answers to a couple of questions –
- I was aware of a skills gap within IT that kept it from meeting the Big Data need. What were these skill sets? I knew about the lack of the predictive skill set within a typical IT organization, but were there others?
- While I recognized that technologies such as Hadoop allowed querying against vast amounts of data in file based systems in their place (without replication), I saw a gap or barrier between that data and data that traditionally lay in the EDW. How could that barrier be overcome, and data or analysis from the Big Data side be seamlessly merged with data from the EDW?
After a two hour discussion and, as I mentioned, some good insights, I arrived at the following answers to my questions.
- There is a skill gap within IT and within BI teams, and it wasn’t just predictive as I was thinking. Some of the newer technologies don’t come with a lot of user-friendly interfaces as the traditional DBMS and BI tools. You need programmers. This explains why the newer BI positions require Java, C++ or Python skills.
- IT is truly falling behind because it cannot think beyond its traditional view, and is still caught up in its arcane processes and need to resist and avoid risk. A top-down leadership transformation is needed to think differently. While on this topic, an interesting gentleman named Tarek Abu-Jaber from Harvard Pilgrim Healthcare made some seemingly innocuous but, to me, visionary statements which can be encapsulated as, “Why Not?” Why can’t IT solve the problem of creating architectures that would allow unstructured and streamed data to be analyzed as easily, and in conjunction with, structured data from the DW? IT needs to start recognizing that unless it can provide solutions to the interesting problems faced by analysts today, it will become part of a stoid old bloc that everyone sees as the mainframe of analytics – dependable work-horses but essentially uncool and has-been.
- There is a need for architecture that overcomes the barrier between file-based systems and the EDW, between structured and unstructured, etc. Individual solutions exists in each of these silos, but I am not seeing anything that merges the two seamlessly. Wayne briefly mentioned of a Data Lake approach and technology that creates virtual tables on the file systems that might address this, but I think it’s still early stages and probably not as seamless as it seems in theory. Also, there’s bound to be performance hurdles that need to be overcome. IT is usually good at architecture and should lead the fray on this, but… (see 2)
There was another topic of discussion that I pursued as a sidebar that has been of interest to me for years but I have not seen any advances in these fronts, and this had to do with the ease of building a DW.
The speed with which IT produces analytic solutions was a hot topic. There are two aspects to this. One has to do with process. IT loves process and has a dire need to ensure every step is always followed ad nauseum. The other has to do with the technology. There are too many disparate layers in the stack and it takes too much time to build logic into the solution. The result? Weeks to months to produce a solution rather than days. The process aspect can be addressed with Agile methodologies, but this can be a right angle shift for companies not used to that approach. The technology aspect I alluded to in a post written years ago where I talked about vendors that build car parts instead of providing us cars that can be started right away. Changing a single field name in an EDW database can take days, because the change can impact many ETL processes and reports. When will we see a vendor solution that is whole, all the way from source system to BI, with no need for staging, fact or dimension, metadata layer, and finally visualization. All simplified such that the data is taken from source directly to visualization, with no layers between, with ability to scale.
There were other topics of discussion as well, such as Data Governance and the creation of a new role of Chief Data Officer.
All in all, a good day for pontificating and pondering. I feel like a newBi(e) in BI yet again, like I did 16 years ago, and it is an exciting feeling.