Posted on January 12, 2009 
As if ballooning volumes of business-critical data didn’t pose enough of a data protection challenge, widespread adoption of virtualization has further increased the complexities of data backup. Traditional backup products are simply not designed for servers that may support as many as 128 virtual machines (VMs). Organizations need new solutions optimized for the virtualization environment.
“All virtual machines share a single physical host with finite resources that generally are fully allocated to maximize application performance,” said Vince Conroy, CTO, FusionStorm. “As a result, I/O- and network-intensive backup processes place a strain on the minimal system resources that are available. Backups need to be reengineered with these constraints in mind.”
According to Conroy, data de-duplication is a key element of the VM-optimized backup environment. Also known as global compression, commonality factoring, single-instance storage and referential integrity, data de-duplication eliminates redundant copies of data to reduce storage costs and shrink backup and recovery times. It also makes wide-area backup an operational reality. Since only de-duplicated data moves across the WAN, organizations can securely replicate vital data without high bandwidth costs or physical transportation risks.
“Data de-duplication has come to the forefront as a way to reduce the amount of disk space required for backups and to support bandwidth-constrained branch office backup,” Conroy said. “Virtualization is the next frontier for this technology. Minimizing the amount of data to be backed up minimizes the load on virtualized systems — it’s that simple. As an added benefit, data de-duplication supports green IT initiatives by reducing space and power requirements within the storage infrastructure.”
One Out of Many
Data de-duplication solutions can be used in-line as the data is backed up or after the backup has been completed. Although computationally intensive, in-line data de-duplication reduces storage capacity requirements because only globally unique blocks of data are saved on the backup disk.
“Storage-based data de-duplication — also known as single-instance store — can dramatically reduce the amount of disk space required for backups,” Conroy said. “It makes disk-based backup more cost-effective, eliminating backup tapes and mitigating the risk associated with shipping tapes offsite.”
Source-based data de-duplication further optimizes the backup environment. Data is “fingerprinted” at the source before it is backed up to disk so that only data that has changed is sent across the wire, reducing the load on the network up to 95 percent.
“These technologies combine to create the VM-optimized backup environment. First, only unique data segments are backed up, with 20-byte identifiers pointing to duplicate instances. Then only new data segments are identified and backed up,” said Conroy.
Unique Solution
EMC has long been an innovator in data de-duplication. EMC Avamar backup and recovery software provides customers with dramatic data de-duplication benefits for their virtualized environments, reducing the backup window by as much as 90 percent. EMC recently enhanced this product offering with the introduction of the EMC Avamar Virtual Edition for VMware Infrastructure. This fully virtualized deployment model combines data de-duplication gains with the server and storage efficiencies of VMware's Virtual Infrastructure.
In addition, EMC introduced the EMC Avamar Data Store, a complete solution consisting of EMC Avamar software running on pre-configured EMC-certified hardware. Together these offerings simplify and expand the deployment options for EMC Avamar’s patented global de-duplication software to manage backup growth for remote offices, branch offices, data center LANs and VMware environments.
“The innovative Avamar Virtual Edition for VMware Infrastructure and Avamar Data Store extend EMC’s leading data de-duplication software capabilities and transform the way customers evaluate and approach data protection,” said Conroy. “VMware customers can deploy EMC Avamar software within the virtual infrastructure and eliminate the need for dedicated backup server infrastructure, while dramatically speeding backups and simplifying backup management.”
Working Together
Complementing EMC’s integration of Avamar software with VMware Consolidated Backup (VCB) for fast and efficient backup within and across virtual machines, Avamar Virtual Edition enables customers to deploy Avamar’s de-duplication technology easily, effectively and in a repeatable fashion on VMware ESX Server hosts. It also enables replication between Avamar virtual machines or from Avamar virtual machines to the new Avamar Data Store or to standard Avamar servers. It supports up to 1TB of de-duplicated backup capacity (which under a typical backup schedule, would require approximately 37TB of traditional tape or disk storage), and can leverage the VMware shared server and storage infrastructure to further lower cost and simplify IT management.
Avamar Data Store is available in two models — a scalable multi-node model and a single-node model. The multi-node Avamar Data Store is ideal for deployment in the data center where backup data is being consolidated from multiple remote locations or to protect VMware environments and LAN-attached servers. The single-node model is designed for deployments in distributed or remote offices that require faster, local recovery performance. Both models support replication, either from the remote office to the data center for consolidation, or between data centers for disaster recovery purposes.
“When customers consolidate servers through virtualization, they're also consolidating backup streams. Avamar not only supports VCB but fully leverages the efficiency of VMware Infrastructure to create a VM-optimized backup environment,” said Conroy. “EMC Avamar delivers on the promise of data de-duplication to decrease storage and bandwidth requirements and reduce the burden on virtualized servers.”