Hand in hand Database Design and Data Backup, Recovery continue…
Recovery Strategies
The recovery strategy will be built around the backup strategy. Any recovery situation naturally implies that some failure has occurred. The recovery action will depend on what that failure was. As such, a recovery strategy will consist of a set of failure scenarios and their resolution. The aim of this section is to categorize the main potential failures. How each failure is dealt with specifically is of course RDBMS-dependent. Whatever software you choose, the recovery steps for the failure scenarios below need to be fully documented:
- instance failure
- media failure
- loss or damage of tablespace or data file
- loss or damage of a table
- loss or damage of a redo log file
- loss or damage of archive log file
- loss or damage of control file
- failure during data movement
- other scenarios
The recovery plan for each of the scenarios should take into account the use of any operations that are not logged. For example, if a table is created with the create table as select statement with logging of journal information switched off, recovery of the tablespace with that table will need to take this into account. If the recovery is from a backup taken before the table was created, there will be no roll- forward information to create and repopulate the table. The operation that created the table will need to be rerun. If, on the other hand, the backup was taken after the table was created, then the table will exist and, more importantly, any changes made after the original create statement can be rolled forward using journal files.
There are a number of data movement scenarios that need to be covered. Some of the more common data movements are:
- data load into staging tables
- movement from staging to fact table
- partition roll-up into larger partitions
- creation of aggregations
A plan needs to be prepared and documented for each of these situations. The plan must include a description of the processes involved, including details of tables used, views that need to be created, and so on. This is important, because it enables anyone dealing with a failure of the process to track what has occurred. It also gives the system management and database administration teams a fighting chance of finding and fixing a problem.
Any other scenarios, such as rolling out bad data, will require DBA evaluation on a case-by-case basis. If necessary, external help, such as the hardware or RDBMS support help desks, can be called, to help resolve the problem. It is worth recording the phone numbers of these organizations in the recovery documentation.
Testing the Strategy
The oft-forgotten cousin of the backup recovery world is testing. All too often, backup strategies are developed and never tested. In such a complex environment as a data warehouse the tendency is to put backup testing to one side while everything else is being tested. The general outcome is that backup never gets tested, or at best is tested only minimally.
It is not sufficient simply to check that your backup scripts are backing up the intended files. All the failure scenarios identified above need to be explored, with the suggested recovery path being implemented in full. This will add at least a week to the project test plan, and possibly as much as a month, depending on the size and complexity of the data warehouse.
Testing does not end when the data warehouse goes live. Warehouses are evolving environments, and will typically start out smaller than their ultimate size. The backup and recovery tests need to be carried out on a regular basis. We suggest that the full test suite should be run twice a year, or at the very least annually. As the system itself is live, and many of the tests will be destructive, people are often wary of running these tests. The way to deal with this is to have a test and development environment that matches the live environment.
That is not to say that the test environment has to be full-scale and exactly like the live environment. It does, however, have to be large enough to allow full-scale testing to be performed. The architecture of the system should always be the same as the live system. For example, if the live system is a cluster of SMP machines, it is better for the test machine to be a cluster of smaller machines, rather than a larger standalone machine.
When running the regular backup tests, it is advisable to avoid performing the tests at busy times, such as end of year. Try to position the tests to be run at low periods in the business year. Part of the benefit of the tests is the knowledge of the system and the software being used that operational staff gain. For this reason it is important that the tests be scheduled to maximize the coverage of the systems and operations people.
Possibly related posts: (automatically generated)
Hand in hand Database Design and Data Backup, Recovery continue…
- Hand in hand Database Design and Data Backup, Recovery
- Using the New Wizards in SQL Server 2000
- Improving Security
- Website Hosting Sever, some Pitfalls you need to avoid part 1
- Ten Scamps of a Data Warehousing Failure continue...
- Keeping Multiple Database Applications in Sync with Replication
- Anatomy of the Online Store
- Managing the Windows 2000 WINS Server continue…
- Scripting Database Objects
- What Is Contained in the System Catalog?
- October 2nd

They’ll remove your cabinet doors, drawer fronts and hardware, made cabinetry and accessories, and cover all other exposed surfaces with matching laminate or vinyl veneer. … Cabinet Hardware
Scalable web server solutions with failover Yes No User, group, and object level security optionally, use Windows authentication of users… … Enterprise Server