Stream processing: analyze data in real time for a more agile response


Editor’s Note: We continue our Smarter Computing Breakthroughs series this week with a post on Stream Processing from Nagui Halim, IBM Fellow and Director, Chief Architect of Big Data in IBM Software Group. The Breakthroughs Series will introduce you to key technological developments IBM has advanced to strengthen our integrated portfolio of systems, software and services – technologies that are often the unsung drivers behind the IT infrastructure that enables a smarter planet. You can find links to previous Breakthroughs posts at the bottom of this post.

Stream processing takes a different approach — and delivers much faster insights

Big data and analytics solutions are dominant subjects in the IT world right now, and it’s easy to see why. By detecting and acting on previously unseen trends, patterns and information of all kinds — in areas ranging from customer demand to infrastructure performance — organizations can create many compelling forms of new value.

One of the most exciting elements of the data/analytics story, though, concerns a new class of analytics tools and how they can tackle a new type of workload: stream processing.

Stream processing (if implemented effectively) can empower organizations to get insights from data far more quickly, and in far more ways, than ever before — even in cases where insights are needed in real time, or very close to it.

To understand the difference between stream processing and traditional analytics approaches, begin with the way the raw data is handled. In a conventional analytics architecture, the data is presumed to be stored in a repository (such as a relational database, or on a larger scale, a data warehouse). The organization then runs queries against that data: How many of X were sold to customers of Y demographic? What kinds of inventory challenges has X site experienced, following Y marketing campaign, compared to other sites in the last five years?

Stream processing represents a fundamental break from that paradigm. Instead of collecting and storing the data first, which creates a significant delay, you run analytics against the data as it becomes initially available to the organization.

An incredible range of new use cases for analytics

The new capabilities implied by that change are limitless, and already they’re beginning to transform the way organizations think about data and what they can do with it.

One obvious example: social media. As new products or services are released to the public, and the public responds with public tweets, Facebook wall posts, LiveJournal blog entries and in other ways, this data — all very valuable to the organization — can be assessed in something very close to real time, to understand how satisfied (or unsatisfied) those customers really are.

And because the world is generating more data, of more types, than ever before, stream processing will be applied in more and more ways going forward. Other use cases include:

Stock trading – new data is created in mass volumes by the second; stream-based analysis yields much faster, and thus much more useful, intelligence.

Network/infrastructure performance – consider telecommunications organizations that have tremendously complex infrastructures, spanning both wired and wireless platforms. Isolating technical issues to root causes, and resolving those causes rapidly, is both enormously difficult and quite essential.

Government and law enforcement – one obvious application would be facial pattern recognition, used to identify known terrorists in public environments such as airports as drawn from real-time video sources. This is a task that would be completely unachievable using traditional analytics tools.

Energy – as electrical grids become smarter, and smart meters are increasingly deployed en masse, energy analysis conducted via stream processing can reveal new opportunities to reduce power consumption and yet maintain necessary services and performance levels, helping entire nations “go green” in a smart, well-informed way.

Security – one example would be the analysis of botnet activity. When malware achieves control of target systems on a mass scale, and then orchestrates malicious campaigns such as a Denial of Service attack aimed at a particular government or organizational website, stream processing can help establish just where the attacks are coming from, and how they are being carried out, as they happen.

IBM InfoSphere Streams: An optimized, exceptionally versatile stream processing platform

You might think that IBM, as a leader in big data analytics and IT solutions generally, would be interested in stream processing — and you’d be right. At IBM, we’ve been chartered with creating technologies designed not just to enable the concept of stream processing, but also to apply it in as many ways as possible, to maximize the value it can create for our clients and for the world.

InfoSphere Streams is the product that IBM built to host these advanced solutions, and it boasts both powerful programming expressivity and incredible performance.  As an example, Streams applications in production at one client are handling ten billion messages a day with sub-second latencies. The implication is that sophisticated applications built on the Streams platform can handle the analysis of the largest data volumes and create actionable results on a continuous basis in very little time.

But it’s also important to understand just how versatile this solution is — a consequence of the fact that IBM gave the development team the freedom to build the best possible solution, from the ground up, in the pursuit of holistic excellence.

InfoSphere Streams, as a result, is much more than just another analytics tool. It is better understood as a platform and execution environment for stream processing analytics — one that can support as many different forms of stream processing as will be needed, and can combine their results to yield exceptionally sophisticated insights.

For instance, imagine leveraging social media, video, audio and stock data simultaneously, not just to understand how customer/analyst opinions are changing following a major product launch, but also to project the impact on the organization’s quarterly results and market capitalization. This is the sort of task that just a few years ago would have been nearly unimaginable for analytics solutions, but which InfoSphere Streams can make a practical reality.

Another example: the City of Dublin has already leveraged InfoSphere Insights to get much clearer insight into traffic patterns, used both to provide travel time estimates for Dublin citizens every day and to help guide future urban development for a better outcome in years to come.

This illustrates the way stream processing has both immediate and long-term potential — both of which organizations are rapidly embracing as they become aware of the bright possibilities, which will only become brighter going forward.

Previous Smarter Computing Breakthroughs posts:

  1. System Interconnect Improvements: Don’t Forget the Network
  2. Middleware Optimized Systems
  3. Information Integration, Pt. 1
  4. Information Integration, Pt. 2
  5. Unified Management
  6. Data Security
  7. Image Management
  8. Cross-Platform Virtualization
  9. SMP Interconnect Fabric — Simpler, Faster, Smarter Scalability
  10. Intelligent Threads: Tuning the POWER 7 Processor to your Workloads
Smarter Computing Analyst Paper - HurwitzTo effectively compete in today’s changing world, it is essential that companies leverage innovative technology to differentiate from competitors. Learn how you can do that and more in the Smarter Computing Analyst Paper from Hurwitz and Associates.

Subscribe to the Smarter Computing Blog

Recent Posts

New LTO 7 tape drive delivers for cloud and analytics workloads

Jamie Thomas

While some predicted the demise of tape-based storage, IBM, the leader in branded tape for the past twelve years, believed in its comeback. To gain all the most current advantages of IBM tape storage solutions for cloud, analytics and virtualized workloads, IBM today announced the new 6TB, 300MB/s IBM LTO Ultrium 7 tape drive.

Continue reading

IBM plans to expand storage capabilities for cloud and unstructured content with Cleversafe

Jamie Thomas

IBM has announced a definitive agreement to acquire Cleversafe Inc., a leading developer and manufacturer of object-based storage software and appliances. Read more about Cleversafe and how it will complement IBM Storage.

Continue reading

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>