David Corrigan continued his big data discussion with a second podcast that focused on IBM’s big data platform. (Catch up on the first post, “What is Big Data?”)
As Corrigan explained, “Big data is not just an individual technology, implemented within a silo. It is a set of technologies that need to be integrated with each other and integrated with enterprise capabilities, meaning business processes, applications, relational databases, etc.”
So what’s involved in IBM’s big data platform?
Corrigan stressed, “The big data platform is what makes the data available in whatever format it needs to be: as streaming information, unstructured or semi-structured data, highly structured information.” Big data encompasses a wide range of technologies, from analytics, Hadoop, stream computing, information integration, data warehousing, and more, depending on your needs and goals.
Specifically, the Hadoop-based InfoSphere BigInsights helps manage any variety of information and large volumes of data. InfoSphere Streams tackles stream computing and analytics, analyzing information in motion.
These technologies must work together, Corrigan explained, “Because big data is not a silo, data must be integrated with a big data platform, and the insights generated from that platform need to be integrated with the other systems in your enterprise.”
But a big data must go farther and offer the ability to “discover, profile, integrate and transform information to and from the big data platform.” Meanwhile, a big data platform should enable a focus on information governance, data quality, privacy, security and the lifecycle of information.
A focus on analytics
Thinking about big data as a platform has enabled a shift in thinking. Corrigan explained, “It’s more than just a monitor that we’ve slapped on it. It’s a way we’re approaching development… We see this as a development platform for a new class of analytic capabilities.”
Based on several conversations with customers, Corrigan notes, “Many organizations are getting very excited about this new class of analytic applications to analyze social media or customer sentiment or content or predictive analytics.”
“The ability to analyze data – be it a variety of data, data in motion, structured data at rest in a data warehouse – we really see this as a platform for building the next generation of analytic applications.” By bringing analytics to all the engines, data and insights can be integrated across an entire enterprise, not just siloed in the source system.
Visualizing big data
Corrigan cited the new and growing role of data scientists as essential for understanding big data. Data scientists not only understand the data itself and how to analyze it, but are tasked with tying the data to a business purpose or objective.
Hence, “The ability to bring big data together and then to visualize it for the data scientist to test hypotheses” is key. The IBM big data platform enables this through “something very familiar to most business users: the spreadsheet.” This spreadsheet-style interface enables data scientists and other users to look at potential correlations and test hypotheses.
Corrigan also discussed the evolving skillset needed to tackle big data technologies and offered several tips for choosing how to start your big data initiative. Learn more in the podcast, or visit
To effectively compete in today’s changing world, it is essential that companies leverage innovative technology to differentiate from competitors. Learn how you can do that and more in the Smarter Computing Analyst Paper from Hurwitz and Associates.