Backup Problems That Can Affect Recoverability
Author:
May 2008
Storage vendors like to talk about backup, but storage managers are quick to point out, "Backups are worthless; restores are priceless." Unfortunately no matter how diligent you are in the design and implementation of your backup strategy, relying solely on information from backup hardware and software can result in significant delays or even failures when it comes time to recover. Backups that are reported as successful may still not meet the data consistency or policy requirements of the application using the data. Verifying application recoverability and ensuring that you can meet business availability requirements are essential.
NetApp has helped numerous IT teams across a wide range of industries do a full assessment—from applications to infrastructure—that takes into consideration all elements that can impact backups. These assessments have identified a variety of commonly occurring "hidden" problems that can cause major issues if a disaster or failure occurs.
A Comprehensive Backup and Recoverability Assessment Methodology
Typical backup assessments look only at the backup environment to determine whether or not key backup hardware and software are operating correctly. These evaluations don't examine anything upstream from the backup process. A more thorough assessment may be necessary to establish true confidence in the recoverability of your environment. This approach should include (in order):
1. Analysis of application alignment down to the infrastructure level to understand how applications, databases, and file systems map to servers, networks, and storage to identify all dependencies on storage and backup.
2. Careful evaluation of the nuts and bolts of backup and replication, including:
- Database journaling and transaction logging
- Snapshot schedules
- Replication schedules
- Analysis of the backup catalog, backup schedules, etc.
3. A determination of recovery objectives and needs. The ideal approach involves a formal interview process that includes the line of business owners, application owners, IT staff—anyone who has information about a specific application, including recovery needs, objectives, business impact, available downtime, etc. Every discussion should be broken down by application.
With this information in hand, you can see if the desired recovery point objective (RPO) and recovery time objective (RTO) can be met for each application and begin to identify potential problems. It may also be valuable to look at the recovery capacity objective (RCO, a relatively new concept) to understand how much storage is needed to recover.
Recovery Challenge: Meet Strict RPO and RTO Objectives
Hidden Problem: Scheduling Conflicts Increase Recovery RiskIn most organizations, the company sets universal goals for the amount of acceptable data loss (RPO) and how long a given application can be down (RTO), and the storage team designs a backup plan to achieve these goals. While the plan may look solid on paper, in many large organizations the IT team supports a variety of applications and environments and may find it difficult to be 100% confident in their ability to meet these targets.
Solution
The solution in this case was to separate the replication processes for the production and test/dev environments and use information gathered about each application to synchronize replication schedules with important events (such as archive log completion) to create a consistency point during each replication cycle. This ensures that replication of production data will not be affected by test/dev and that the target volumes will always contain the data needed for recovery to a given point in time.
An additional outcome of this analysis was the determination that, given the fast cycle time of test/dev environments, there was no need to back up test/dev data to tape. This decision saved a significant amount of tape resources and time
Products such as NetApp SnapDrive® also provide the basis for OS-aware Snapshot copies. SnapDrive scripting capabilities can be very easily combined with VMware ESX/Server API-based scripting. Many customers have used these capabilities to write scripts to create consistent Snapshot copies and SnapMirror copies for disaster recovery. These "copied" virtual machines can be mounted in standby VMware environments to confirm consistency and availability.
The above article is an excerpt from a NetApp Tech OnTap article, October 2007 issue. To read the complete article, please visit
www.netapp.com/go/techontap. To find out how NetApp products can help protect your organization, please contact CDW Berbee, John Uchaker at 513-677-4119. CDW Berbee, drawing on strategic partnerships with Cisco, IBM and Microsoft and the far-reaching experience of its hundreds of engineers, has assisted clients with a full range of technology solutions. For other information, please visit
www.berbee.com.