Sharing Data, Database Design and Data Backup

Posted by arlene

Since a distributed system will have, as a given, the fact that data should be shared between users the decision to share an entity such as a relational table is not a decision that is taken by the designer. However, underneath the fact that data is shared are a number of design decisions which can have a major effect on performance. The two main decisions are where to situate the tables that make up a relational application and what locking strategy to adopt. The first has, to a certain extent, been dealt with in the previous section; it is worth looking at the second in a little detail.

Most of the decisions about locking will be made with respect to the database management system that is used. In general such systems allow locking at one or more sizes: page locks, table locks, database locks and row locks.

A row lock prevents a number of programs accessing the row of a relational table, a page lock will lock the area of file memory which a table or part of a table will be stored in, a table lock restricts access to the whole of a relational table and a database lock will control access to the whole collection of tables which make up a database.

Living the Web 2.0Locks are defined as a property of a table or database when the database designer defines the structure of the database. Many of the decisions about what sort of locking to adopt are common sense ones, for example when a table is going to be the target of a bulk update it is better for efficiency reasons to use a table lock and when data is just being read by online transactions and updated by batch processes to use the largest lock size in order to minimise the amount of locking that occurs.

Although many locking decisions are straightforward, some locking issues are much more subtle. For instance, the relationship between deadlock occurrence, the locking strategy adopted and performance can be a subtle one, where trade-offs haveto be considered. For example, locking at the row level will enable more concurrency to take place at the cost of increased deadlocking where resources have to be expended in order to monitor and release deadlocks; conversely locking at the table level can lead to a major reduction in deadlock occurrence, but at the expense of efficient concurrent operations.

There are a number of ways of designing a system to minimise the effect of deadlocks. The first is to monitor where the deadlocks are occurring and then modify any database code which creates this situation. For example, you may find that changing the order in which tables are accessed will remove some deadlock occurrences. The second is to spread the occurrence of database locks more evenly across the tables.

Another way of minimising distributed deadlocks is to carry out data replication. Already I have detailed how this technique can be used to locate data close to users; it is also quite a powerful technique for minimising locks since, if there are n clients accessing a replicated set of d databases, then the probability that a lock will occur will be the order of n/ d of the single database example.

Another strategy which is applied post-design is to experiment with the deadlock break interval. Many database systems do not employ devices such as wait-for graphs in order to detect deadlocks before they happen. What they do is regularly examine locks which have been around for some time and release those which have been in existence for a period exceeding the deadlock break interval. Many database systems allow this interval to be set by the database administrator when the database is specified; in my experience varying the deadlock break interval can have a drastic effect on performance for many database designs.

Another factor which many database systems allow to be varied is the isolation level of a program. There can be as many as four isolation levels which a DBMS will allow the database administrator to choose. They are:

  • Dirty read. This is where a transaction can read data which has been modified but the changes that have occurred have not been committed.
  • Committed read. Here a transaction is not allowed to read dirty data and overwrite another transaction’s dirty data.
  • Cursor stability. Here a row beinag read by one transaction cannot be changed by another transaction.
  • Repeatable read. All items are locked until a commit has been executed.

As you proceed down the list of bullet points above the strength of the isolation increases; this will increase the number of locks and hence the greater the chance of deadlock occurring and performance dropping. The design principle here should be that the isolation level chosen should be the weakest consistent with the application data integrity and the demands of the application.

For example, if you have an application which reports on data in a stored database and where 100 per cent accuracy in the data is not required, for example a transaction which reported on some averaged amount such as sales over the past n months, then such an application can be specified as having a dirty read isolation level.

Possibly related posts: (automatically generated)
Sharing Data, Database Design and Data Backup

6 Responses to “Sharing Data, Database Design and Data Backup”

  1. Any information collected by us concerning an identified or identifiable natural person (”Personal Dataquest;) will be used concerning the registration of your domain name(s) and to this Agreement and as required or permitted by ICANN or an applicable registry policy. … Free Domain Name

  2. CA ARC serve Backup for Windows offers excellent data protection for distributed servers, databases, applications and multiple clients including Windows, NetWare, Linux, Mac OS X and UNIX environments. … Data Protection

  3. #39&I; m a web rookie, and I was able to use your tools and upload our site in a matter of minutes. … Free Templates

  4. , the web’s leading online file storage and sharing service, is a young, backed start-up in Palo Alto, providing online sharing to millions of users as well as to partners and developers through our public API. … Providing Web Hosting

  5. We may also use a service that collects data remotely by using web beacons or tags embedded in our site’s content. … Network Solutions

  6. Besides those cute pictures of your kids and grandkids you’ll probably find cash, credit cards, driver s license, Social Security card, health insurance card, library card, discount cards, membership cards, telephone calling card, and important phone numbers. … Calling Australia

Leave a Reply

LogoAlexa CounterFeedBurner Counter