Hand in hand Database Design and Data Backup, Recovery
Effect on Database Design
There is a major interaction between the backup strategy and the database design. The two go hand in hand. Simple things such as the partitioning strategy chosen for the fact data can have an enormous effect on the backup requirements. Design of the database and backup have to be performed at the same time.
It is all too common to see the design of the backup strategy being done as an afterthought. This is a mistake, because it can lead to major cost and time overruns. Data warehouse systems are such large complex systems that the backup has to be an integral part of the whole system.
If the backup strategy is left until later it is more than likely that you will discover the hardware is insufficient to support the overnight window. Remember that the backup is likely to consume a considerable amount of CPU power while it is running, and it can take several hours to complete the nightly backup. As this is only part of the whole overnight processing that has to be completed, it is easy to see how you can end up in a situation where the overnight processing cannot be completed in the allowed time-frame.
To mitigate these sorts of problems you need to design the whole data warehouse system in a unified fashion. It is particularly important to manage the design of the backup, the database and the overnight processing together.
Design Strategies
When considering the backup strategy, one of your main aims should be to reduce the amount of data that has to be backed up on a regular basis. This approach will give you the best chance of bringing down the effect of backup on the overnight processing. We shall concentrate on backup strategies that are designed around daily data loads. There are data warehouse systems that are not loaded every day, but at greater intervals. Typically these will be loaded weekly at week end or monthly at month end. These cases can be thought of as special cases of the daily load. All the same issues apply to these cases; you just have more data to load and manage, and typically a weekend rather than an overnight period to fit it in to.
Read-only tablespaces are one of the main weapons in the battle to reduce the amount of data that needs to be backed up. Having the ability to mark certain areas of the database as being read-only is vital. Making a tablespace read-only means the tablespace cannot be changed even accidentally, without bringing it back to a read—write state. Therefore, the table-space can be backed up once and stored somewhere safe, and it does not have to be backed up again.
Actually, it is recommended that read-only tablespaces be backed up twice to protect against tape media failure. Clearly the second backup should be taken to a different tape from the first. Ideally it should also be made to a different tape device, and if possible a different controller should be used.
The use of read-only tablespaces needs to be considered in conjunction with your database design strategy. Where possible, unchanging data should be isolated in separate tablespaces so that they can be marked Read Only. When partitioning the large objects such as the fact table it is important to use a partitioning scheme that allows the bulk of the partitions to be marked as read-only.
One key decision that has to be made is the size of partition to be used. The size of the regular load partition affects the backup strategy, because that dictates the size of the regular backup.
Another way of reducing the regular backup requirements is to reduce the amount of journaling or redo generated. This is possible with some RDBMSs, because they allow you to turn off logging with certain operations. These options will typically be allowed only for large singular operations. For example, data load, index creation and create table as select are some of the operations that can often be performed without logging of journal information. Clearly, these operations cannot be recovered in the normal sense and therefore must be easily repeatable. This is important to ensure that these operations do not place the data warehouse at risk
Automation of backups
Given the data volumes being handled and the likely complexity of any viable backup strategy, it is important to automate as much of the backup process as possible. For example, by the use of stacker or silo technology, large backups can be accomplished without operator intervention. This minimizes the chance of error, and avoids any potential delays.
It is equally important to automate the scripts that run the regular backups. These scripts need to be sophisticated enough to report any problems and to either deal with them directly or allow the job to be restarted once the problem has been fixed.
The backup routines will be managed from the warehouse manager. The backup routines should be further integrated into the system management routines and handled by whatever system management and scheduling software is being used.
Possibly related posts: (automatically generated)
Hand in hand Database Design and Data Backup, Recovery
- Hand in hand Database Design and Data Backup, Recovery continue...
- Improving Security
- Client/Server must Know
- Using the New Wizards in SQL Server 2000
- Introduction to Web 2.0 Website Patterns
- Network Access Control Databases
- Website Hosting Sever, some Pitfalls you need to avoid part 1
- Sharing Data, Database Design and Data Backup
- Oft-forgotten Data Disaster Recovery
- Network and Servers Technical Compare continue...
- October 2nd

I am always on the go, creating Websites, Web graphics, photo sites and multimedia creation…CDs & DVDs. … Rudimentary Site
Customer will not use, or allow others to use, Customer s home computer as a web server, FTP server, file server or game server or to run any other server applications or to provide network or host services to others via Charter s network. … Innovative Web Applications