Posted on October 05, 2009 
Industry experts say that unstructured data such as digital images, video and audio now represent more than half of all stored information and will continue to grow faster than traditional transaction-based and file-oriented storage for the foreseeable future. That is causing organizations to rethink the way they store and retrieve information.
Conventional storage architectures are chiefly designed for online transaction processing (OLTP) databases and view all storage as either files or blocks of data. In today’s Web 2.0 world, however, data can’t always be described in terms of rows and columns. In a typical RAID storage architecture, files such as images and videos that have no internal data structure must be transformed into data files so that storage hardware and software can deal with them. It is a time- and compute-intensive process.
Storage Volatility
Complicating things from a storage perspective is the unpredictability of “new media” applications. Digital video sites, online image repositories and social-networking sites tend to produce highly viral content that can lead to sudden, extreme data growth.
“In recent years, rich media, video content and larger graphics have driven exponential growth in demand for disk space, and it is impossible to tell where it is going to lead,” said Steve Rogers, VP of Engineering & Strategic Planning, Jeskell. “Who knows when the next Facebook or YouTube is going to take off and cause a huge uptick in unstructured data flowing into your organization?
“Customers need the ability to quickly and fluidly scale performance and capacity to accommodate rapid data growth, while maintaining the ability to efficiently access older data. That is a difficult proposition in traditional OLTP-based storage architectures. Adding capacity is easy enough, but that often winds up degrading performance and management.”
A Revolutionary Approach
With its XIV Storage System, IBM has developed a next-generation storage solution that addresses this marketplace shift by managing storage not just as data and files but as massive content repositories. Moreover, the XIV system represents a radical departure from the multi-tiered approach to storage that is common in most data centers today.
With the XIV system, IBM has introduced a single, Tier-1 storage layer while promising lower costs, better performance and easier management than multi-tiered storage. The key is a clustered block storage architecture that delivers such massive scalability that there is no need to migrate data between tiers due to changes in requirements or importance. By giving all data and applications Tier-1 performance, reliability, availability and features, the XIV reduces complexity and provides an overall better level of service.
“The main thing about clustered block storage in general and the XIV system in particular is that it is designed to simplify storage in virtually every way,” said Rogers. “It scales on the fly without impacting host I/O operations, provides automatic load balancing without any special management tuning or planning, and it is self-healing in the event of a disk failure. That all adds up to tremendously low total cost of ownership.”
Chunky Data
Clustered storage is not a particularly new concept — it has been available for file storage for nearly a decade. The clustered block storage architecture is a much newer approach. These solutions reach deep into stored data to optimize granular “chunks” of data in order to tune performance, increase utilization, and perform any number of advanced storage features more efficiently.
The XIV system uses multiple block-level disk arrays, or nodes, that are powered by multiple clustered controllers and feature sophisticated virtualization and data management software. Scalability is ensured with thin provisioning capabilities that enable the definition of a logical volume size that is larger than the volume’s physical size. This allows the system to automatically add physical capacity when it is needed, instead of wasting capacity in anticipation of growth. In the event that additional nodes are needed, they are simply added to the cluster, which automatically puts the new nodes into service.
XIV packages all stored content into 1MB chunks that are spread across the entire system to reduce disk workload and improve overall performance. This so-called "RAID-X" architecture does not require a failed drive to be manually rebuilt — it simply heals itself by redistributing data. A 1TB drive can be rebuilt in about 40 minutes.
Redundancy and More
Using standard, off-the-shelf hardware components, the massively paralleled architecture delivers unprecedented data protection and availability — active-active N+1 redundancy of all disk drives, modules, switches and UPS units, as well as concurrent multi-path host connectivity. XIV also sports other high-end features such as unlimited snapshots, I/O load balancing and automatic configuration.
In addition, IBM recently announced the addition of asynchronous mirroring to the XIV system, which enables remote disaster recovery over unlimited distances and enhances the platform's business continuity capabilities. Also, new quad-core processors deliver performance improvements of up to 30 percent over previous versions of the system, IBM says.
IBM will continue to offer high-speed disk arrays such as the IBM System Storage DS8000 and DS5000 for organizations that still rely primarily on OLTP applications. In fact, the DS8000 is still considered IBM’s flagship mainframe array and recently received a host of important upgrades, including automated thin provisioning capabilities and innovative self-encrypting drives to improve security. But for organizations that are feeling the pinch of rapid and extreme growth of unstructured data, XIV is a dynamic, future-proof system that is capable of growing to meet unpredictable needs.