Oft-forgotten Data Disaster Recovery
What exactly constitutes a disaster? We define a disaster to be where a major site loss has taken place, usually meaning that the site has been destroyed or damaged beyond immediate repair. This means that the system will be either beyond repair in situation, or unavailable for an extended period. It calls for judgment to decide what constitutes a sufficiently long period to turn a simple outage into a disaster. It is advisable to decide on the criteria for making that judgment before any situation occurs, as attempting to make such a decision while in the middle of a crisis is likely to lead to problems.
As most data warehouses are not operational systems, let alone mission critical, it is rare to have a disaster recovery plan for a warehouse. Add the cost of a disaster recovery site and the hardware required keeping such a large system running, and many companies decide not to bother with disaster planning.
The cost of not having a disaster plan can be enormous. In data warehouseswith daily loads, a disaster will mean a long-term outage. If this outage is more than a few days it can mean that it is impossible to catch up with the load process. This leaves a hole in the database, and can have the effect of statistically invalidating a whole period’s data. For example, the current month may become invalid, and if year-to-date aggregates are kept, or year-on-year comparisons are important, a whole year’s worth of data may become effectively invalid.
For this and other reasons it is important to plan for at least minimal functionality disaster recovery. The obvious fallback plan is to use the test and development environment as a fallback server. This will require that it be positioned some distance apart from the main system.
When planning disaster recovery, identify the minimal system required. Decide which components of the data warehouse environment and application are required. It is important to check whether the chosen components of the warehouse have any dependencies on other parts. Continue with this process until the minimum set of components that both satisfy the requirements and can run independently have been identified.
Having decided to design or develop a minimal fallback system, or a full-blown disaster recovery plan, the next step is to list the requirements. Recovering from disaster requires the following:
- replacement/standby machine,
- sufficient tape and disk capacity,
- communication links to users,
- communication links to data sources,
- copies of all relevant pieces of software,
- backup of database,
- application-aware systems administration and operations staff.
As discussed above, the replacement or standby machine does not have to be as large as the main machine, but it must have sufficient capacity to run the minimal system. It must also have sufficient capacity and power to allow the recovery to take place in a meaningful time-frame.
The disaster system must have enough tape capacity to perform the recovery in a reasonable time-scale. Having sufficient disk space to run the minimum independent system is not always enough. To allow the initial recovery to happen quickly, extra disk capacity may be required. For example, it may be required to get files loaded for recovery, while accepting feed files onto the system. This will allow processing of new data to begin as soon as that part of the application is available.
If the system is to be accessible to users, clearly the communication links they need to access the machine have to be in place. It is important to ensure that any links used have sufficient bandwidth and capacity. This is particularly important if the links used are already in use by other systems. There is no point in making the disaster system available to the users if they cannot use it.
To continue the daily load process requires that the links to the source systems be in place. The same requirements apply here as to user links. These links may be as simple as a tape drive, if the feed files can be shipped on tape. Note, however, that if the delivery mechanism changes, it must be ensured that the source system can support that change. Another consideration is if a source system was also destroyed by the same disaster. Does the disaster system for the source system support what is required?
If the disaster system is not in constant standby mode, for example if it is being used for another less important system, it will need the relevant software installed. All of this software is probably available from backups. Even so, copies of the software should be available locally, ready to install just in case. Note that this may require upgrading of the operating system, if the system is not on the same release. Where possible, on the disaster system, go to a combination of versions and releases that has been tested in normal running.
The database will have to be restored from backup, and possibly rolled forward. One point to note is that, unless replication is being used, it will be impossible to bring the database right up to time. At the very least, the current online redo log file will be missing, and it is probable that a lot more will be missing. This will mean that all the missing work will need to be redone. So access to backups of the feed system data will be required. Some RDBMSs have support for hot standby databases. This allows a copy of the database to be kept in recovery mode, with archive log files being applied as they arrive on the disaster machine.
Systems administration and operations staff who are application aware will be required. Bear in mind that, during a disaster scenario, key staff may also be lost. The plan needs to cope with this, and staff at the disaster site will need to be trained in the running of the application.
Finally, test the disaster recovery plan. It would be nothing less than a miracle for a disaster recovery plan to work first time. It needs to be thoroughly tested, preferably on an ongoing basis. As with backup, recovery testing helps to familiarize staff with procedures, and will also highlight anything that has been overlooked.
Possibly related posts: (automatically generated)
Oft-forgotten Data Disaster Recovery
- Client/Server must Know
- Count on Mobile Software
- Network Access Control Databases
- Hand in hand Database Design and Data Backup, Recovery
- Dedicated Server with Canadian Web Hosting
- Hand in hand Database Design and Data Backup, Recovery continue...
- Network and Servers Technical Compare continue...
- Key Differences Between Unix/Linux and NetWare continue...
- Website Hosting Sever, some Pitfalls you need to avoid part 2
- The Apache Web server, a rich Java Web site continue...
- October 2nd

Introducing Forgive for Kipling , a new handbag collection designed by the one and only Duchess of funk, Forgive. … Web Interface Design
The system is comprised of easy to use, advanced computer software and hardware that collects your pulse data through an ear sensor or finger and leads you to improve your memory, mental function, physical health, decision making, and stress level. … Storage Management Software