Keep your friends close and your data center closer


The technology industry has a problem. Disk drives — devices used for over 50 years to store and retrieve digital information — move data too slowly. Companies regularly use 3 terabyte disk drives — roughly equal to the capacity of  about 100 iPads — but the drives can only move data at 50 to 100 megabytes per second. Many organizations need to analyze data at 100 gigabytes per second — a difference of a few of orders of magnitude.

A disk drive contains so much data today that at current rates, it would take over one month to share one of those 3 TB drives at internet cable speeds!

This is becoming more of a problem in the era of Big Data as data grows rapidly in quantity and importance. Companies need to get to their data faster.

To solve this issue, companies typically spread their data across multiple drives to speed up access in a practice known as redundant array of independent disks or RAID. However, this can lead to delays and, if a disk fails, potential data loss. With the number of disk drives (50,000+) typically deployed at large customer locations, even the highest level of redundancy available, RAID 6, is not enough for data protection. Disk drives are constantly rebuilding and traditional RAID technology is having trouble keeping up.

Anticipating the data access problem over a decade ago, IBM Research developed the IBM General Parallel File System (GPFS™) to help businesses cope with the exploding growth of data, transactions and digitally-aware devices. Since then, the advanced file management software platform has been used together with some of the largest and most sophisticated supercomputers in the world as well as for general purpose file serving and sharing/archiving environments. In fact, over a dozen IBM products and offerings including the IBM SONAS and IBM Storwize V7000 Unified storage systems are built on GPFS.

Recently, through the DARPA-funded PERCS project, IBM Research in Almaden, California extended GPFS to develop GPFS Native RAID (GNR), a software layer beneath GPFS that interacts directly with the disk drives themselves. Modern servers have more than enough processing power to manage the disks directly, so GPFS Native RAID eliminates the need for expensive external RAID arrays. This essentially cuts out the middleman and makes data more quickly available for analysis. This capability has been available in the IBM Power 775 server for well over a year and is managing scores of petabytes of data worldwide.

Today, IBM is announcing the IBM System x GPFS Storage Server, which makes GPFS and GNR available with IBM System x® servers and, therefore, more generally available. It runs on standard hardware without the need for water cooling or other enhancements and can be attached to any GPFS compute cluster. The Storage Server can run in any datacenter and even in most office environments. It will be delivered as a complete, integrated storage solution consisting of servers, solid state drives (SSDs), disks and software for the IBM Intelligent Cluster.

Prior technology solutions have been aimed at organizations with petabytes of data. Now, IBM is applying this innovation to its commercial, off-the-shelf servers. By cutting out the need for stand-alone storage, the new GPFS Storage Server can cut costs while delivering higher performance and complete protection against data corruption and loss.

For example, the Juelich Supercomputing Centre (JSC) at  the German research center Forschungszentrum Juelich will use the IBM System x GPFS Storage Server instead of a large storage array, connecting it to the IBM Blue Gene/Q-based “JUQUEEN” supercomputer. JUQUEEN was yesterday ranked as the fastest supercomputer in Europe, according to the TOP500 list of the world fastest supercomputers, and is made available via a peer review process for use by scientists at the Research Centre Jülich, universities and research laboratories in Germany and Europe as well as industrial users.

For modern installations like JUQUEEN, GNR is needed to effectively handle storage for 100,000-disk class petascale systems that will experience disk failures on a daily basis. It was a key component of IBM Research’s ability to assemble together 200,000 hard drives to create a single storage cluster of 120 petabytes — or 120 million gigabytes — back in 2011. This mammoth data repository could store one trillion files or two billion hours of MP3 music.

GNR also makes fundamental use of SSDs to do what they do best — temporarily store small blocks of data and maintain system logs. SSD technology is built-in to the new System x GPFS Storage Server as an integral part of its design, rather than just an add-on.

This is where the industry is headed. Instead of exclusively writing data to disk, GPFS will help keep data right were it is needed. To play on the old saying, keep your friends close and your important data closer.

Smarter Computing Analyst Paper - HurwitzTo effectively compete in today’s changing world, it is essential that companies leverage innovative technology to differentiate from competitors. Learn how you can do that and more in the Smarter Computing Analyst Paper from Hurwitz and Associates.

Subscribe to the Smarter Computing Blog
This entry was posted in Big Data and tagged , , , . Bookmark the permalink.

Recent Posts

Public, private and dynamic hybrid cloud: What’s the difference?

Leone Branca

The economic advantages of utility computing, better time-to-market and flexibility are driving more and more businesses to move their critical systems into the third-party infrastructures of cloud providers. So what makes each cloud model different?

Continue reading

Explaining smarter infrastructure to your kids

Mark Dixon

If you had to explain to kids why infrastructure matters, what would you say to help them understand? Infrastructure is so important to our lives, so perhaps we should pay a little more attention to it.

Continue reading

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>