Previous PageTable Of ContentsNext Page


2. Advantages of a DBMS

2.1 Early Methods of Computer Data Storage

2.2 Database Management Systems

2.3 Data Models

2.4 Data Independence

2.5 High-Level Language Support

2.6 Transaction Management

2.7 Integrity Control

2.8 Access Control

2.9 Recovery Methods


2. Advantages of a DBMS

One of the most common tasks for computer systems today is storing and managing data. This data can be any information you want to store about some subject. It can include facts, figures, pictures or almost anything else. In general, any collection of information about some particular subject is a database.

Before the widespread use of computers, this information was stored on paper in file folders and filing cabinets. To retrieve some information, you would go to a file cabinet, take a file, and look at the information in the folder. As the collection of information got larger, it became more and more difficult to retrieve data in a timely manner. Even remembering where to find the information became harder.

2.1 Early Methods of Computer Data Storage

When people first began using computers to store information, they stored it in files with a specific, known format. They had to remember where the files were located, and they had to know how to find the data in the file. This was not much different than storing it in filing cabinets.

Also, to get the information from the files, you had to write a special program to retrieve the data. Once this program was written, you could retrieve the data very quickly. However, if you decided to change the way your data was stored, or you wanted to look for different data, you had to write a new program.

These programs were usually written by programmers in a company's information systems department. As the amount of information available on the computer grew, the requests for new and different ways to look at the data also increased. Large backlogs of user requests for programs became more common, and there were sometimes delays of weeks or months for the programs required to view specific data. This led to the need for a system of storing data where the data and how it was stored was independent of the methods used for retrieving it.

2.2 Database Management Systems

One of the most common tasks for computer systems today is storing and managing data. This data can be any information you want to store about some subject. It can include facts, figures, images, or almost anything else. In general, any collection of information about some particular subject is a database.

The information in a database is managed by a Database Management System, or DBMS. A DBMS is basically a computer record-keeping system, whose purpose is to maintain information and make that information available on demand. There are several important features that make a DBMS powerful and flexible enough that one underlies virtually all information systems today. A typical DBMS is expected to provide a number of specific functions:

  • Data model-the DBMS must have some means of representing the data in a way that you can easily understand. The data model is actually a mathematical abstraction which ensures all the data present in the database is available to you, and through which you view the data.

  • Data independence-the DBMS should insulate you from any changes in the physical storage structure of the database. A request for specific information should return the correct results, even if the physical storage structure of the database has changed.

  • High-level language support-the information in the database should be accessible by using a high-level language. This language should allows you to define, access, and manipulate data without having to know what the physical storage structure of the database looks like.

  • Transaction management-the DBMS should provide some method to ensure that multiple transactions on the same data do not interfere with each other. This allows multiple people to use the database simultaneously.

  • Integrity control-the DBMS should guarantee that data in the database does not have invalid values, and that there is no inconsistency in related data. This prevents you from accidentally entering invalid data or performing operations that can violate some dependency between data.

  • Access control-the DBMS should provide facilities for protecting the security and privacy of your data from unauthorized users. This prevents unauthorized users from gaining access to the database, and prevents authorized users from viewing sensitive data that they do not need to see.

  • Recovery methods-the DBMS should provide a method for backing up and restoring the data in the database, in the event of some type of system failure.

  • Most of the features of a DBMS listed above provide several benefits over the older file-based computer storage systems they replaced. These features and their benefits are explained in detail below.

2.3 Data Models

The way data is physically stored on computers probably has little or no significance to you. All of the data may be stored as simple binary numbers that span several files or even several disks. The DBMS uses a data model to represent this stored data in a way that is meaningful and easy for you to understand.

The data model is a mathematical abstraction of the data that provides structures and access techniques for the data. This allows the data to be retrieved and manipulated by users and application programs without the need to worry about where the data is located or how it is actually stored.

There have been several popular data models over the years, but the relational data model is currently the most popular and most widely used. This is also the data model used by DBMaker. The relational data model presents information to the user in the familiar form of tables with rows and columns. Each row contains data on one subject or item, and each column contains attributes of these subjects or items, such as name, size, quantity, etc.

2.4 Data Independence

One of the biggest advantages of a DBMS is data independence. Data independence allows changes to the structure of a database, without requiring application programs or users to make any changes in the way they access the data. There are two kinds of data independence: physical and logical.

Physical Data Independence

As mentioned previously, early file-based systems stored all of their information in files with a specific format. To retrieve data from the files in this situation, a programmer who knows the format of those files has to write a program. If there is any change to the structure of the data, the program has to be changed so it can read the information from the new structure in the proper order. If you want to look at the data in a file in a different way, a new program has to be written. The organization of the data and the access techniques for retrieving that data are built into the application logic and code. This type of system is said to be data dependent.

In a DBMS the physical structure of the database may be changed without affecting application programs or altering the users view of the data. These changes may affect the speed or efficiency of application programs, but the user programs should not have to be altered. This is possible because the DBMS uses the abstraction provided by the data model to make the physical structure of the database transparent to both users and application programs. The data is translated from the way it is physically stored and accessed on disk to the representation and access techniques used by the outside world, or the logical view.

If the physical structure changes, the DBMS is aware of these changes and still provides the same logical view as before the changes. Because the logical view presented to the outside world remains constant, application programs and user interactions based on the logical view of the data do not have to be altered to provide for the change in the physical structure. Therefore the DBMS is said to have physical data independence.

Logical Data Independence

Sometimes it is necessary to make a change in the logical structure of the data. This may result from the desire to add more information to a database, for example. As long as the logical view of the existing data is not altered, this should have no effect on user interactions or application programs. This is because the data model allows the use of abstract characteristics such as names to access data instead of the physical characteristics used in a file based system. Since adding data will not alter these abstract characteristics for a particular data item, no changes in access methods and techniques are required. Existing programs and user queries will run unaffected, and will only have to be modified if the new data must be used. This is known as logical data independence.

2.5 High-Level Language Support

Most databases usually include the capability to use some type of high-level query language. These high-level languages allow a user to define, access and manipulate data without having to reference the physical storage structure of the database.

Without support for a high-level query language, the information in a database could only be accessed by writing a program that uses the application program interface (API) of the database, or another similarly low-level method. This low-level access method is very useful for creating user applications that automate common and repetitive tasks, but it does not allow any easy way to do one-time ad-hoc (unplanned, unexpected) queries. Every time someone wanted to do an ad-hoc query, they would have to write a program to perform the query. This would involve a significant amount of effort and training on the part of users, or would greatly increase the workload for application programmers just for doing a one-time query.

The inclusion of a high-level language makes performing ad-hoc queries a relatively simple task. Most high-level languages supported by databases use an English-like syntax that makes them very easy to learn. Ease of use does not mean these high-level query languages are limited, however. They are usually very powerful, and are able to perform any and all functions required of the database. DBMaker, like most other Database Management Systems, uses Structured Query Language (SQL). SQL is the de-facto standard query language used in the industry today.

2.6 Transaction Management

Database management systems are designed to store a large amount of information and provide access for a large number of simultaneous users. Because these users may be performing operations on the same data simultaneously, some type of transaction management is required to ensure that the correct data is written to the database.

What is a Transaction?

A transaction is traditionally defined as a logical unit of work. What this really means is that a transaction is one or more operations on a database that must be completed together to leave the database in a consistent state. It is easy to see how a single operation on a database can be a transaction. It is self-contained and must either complete successfully (and change the data) or fail (and leave the data unchanged).

It is not as clear how multiple operations can make up a single transaction. This is best illustrated by a simple example. Suppose you store two different kinds of information in your database: records of shipments sent to customers and records of the items currently in stock (both including quantity of items). When an item is shipped to a customer, it is added to the shipments list. This is one operation on the database. However, the quantity of the item shipped must also be subtracted from the items currently in stock.

If both of these are not completed together, the database will be in an inconsistent state. Either the quantity of items in stock will be too high (items shipped, but not subtracted from items in stock) or too low (items subtracted from items in stock, but not shipped). Both of these operations together make up a single transaction, and both must complete successfully or both must fail.

If a transaction completes successfully and changes the data, we say the transaction has been committed. If it fails and leaves the data unchanged, we say it has been rolled back.

Concurrency Control

With the large amount of information being stored on computers, it does not make sense to allow only a single user to access a database at one time. There are usually many people that require access to the same data at the same time, and delays may result in decreased productivity and the loss of valuable time. As a result, databases usually allow concurrent access. This allows multiple users to access the database simultaneously.

This is not a problem if the users are accessing different data, but can become a problem when they operate on the same data. When two user transactions operate on the same data without any coordination, the results become unpredictable. Some transactions may read obsolete data, or modifications that were apparently completed successfully may be lost.

To prevent this transactions are serialized. This means two transactions that are executed concurrently will give the same results as if they were performed one after the other, and each user can access the database as if there were no other users concurrently accessing the database. In order to do this, sometimes a transaction must wait for another transaction to finish using the same data item in the database. This usually occurs when a transaction tries to read or modify a data item simultaneously with another transaction that modifies the data. If the second transaction was allowed to proceed without coordination, the results can be unpredictable.

This can be illustrated with a simple example. Suppose one transaction (and remember, a transaction can contain multiple operations) modifies a data item and then continues on to perform other operations. While the other operations are being performed, a second transaction modifies the same data item and also continues. Before either transaction can be committed, the first transaction encounters an error and is rolled back. The DBMS returns the database to the state it was in before the transaction occurred, and gives the data item its original value. But because the second transaction has not yet been committed, the value it placed in the data item is lost.

To allow transactions to be serialized and prevent uncoordinated access to the database, some form of concurrency control is required. One form of concurrency control often used by a DBMS is locks.

The Lock Concept

In its simplest form, a lock on a data item allows a DBMS to guarantee a transaction will have exclusive access to that data item. No other transaction can perform operations on that item while it is locked. However, in a typical multi-user DBMS, this is not always a practical approach. Instead a more complex model is used where there are:

  • different types of locks, such as share and exclusive locks

  • different levels of locks, such as row, page and table locks

Types of Locks

A share lock allows multiple transactions to access a data item simultaneously, but with one restriction: the transactions cannot modify the data item. This may occur when multiple transactions want to read the value of a data item, but will not change it. In this case multiple accesses are acceptable because they will not interfere with each other.

When a transaction wants to modify a data item, allowing other transactions to read or modify the data item at the same time can lead to inconsistencies in the database. When a transaction wants to modify a data item, an exclusive lock is used to prevent other transactions from accessing the data. This allows the transaction to continue with its other operations certain that the data item will remain in a stable state for the duration of its lifetime.

A DBMS may also use different levels of locks, although the reasons behind this are more for performance issues than concurrency control. In a relational DBMS, the smallest data item that can usually have a lock placed on it is the row. These rows are grouped together into pages, which are further grouped into tables. In a DBMS that supports it, pages and tables may also be locked as a single item, and this also locks all of the smaller data items contained within it.

Lock Escalation

If a transaction has to access most or all of the rows in a page, the time required to acquire all of the locks and resources used to keep track of the locked items will be quite high if each row is locked individually. Using a page lock in this case will reduce the time and resources used, but at the cost of concurrency for other transactions. If a second transaction wants to acquire a lock on one of the rows in the same page that the first has locked, it will now be unable to do so. However, this is usually offset by the gain in performance.

A similar situation occurs when a transactions accesses most or all of the pages in a table, where using a table lock instead of a page lock will decrease the time and resources used, again at the expense of concurrency. The level of lock used on a specific data item can usually be set manually in the transaction. In a DBMS that supports it, a lock on an object may automatically change to the next higher level when the DBMS determines that performance will be improved while still maintaining an acceptable level of concurrency. This is known as automatic lock escalation.

2.7 Integrity Control

People naturally assume the data in a database is accurate. Indeed, this is the primary reason database systems have evolved: to provide the ability to retrieve accurate data in a timely manner. To ensure this is true, a DBMS must have some form of integrity control. Basically integrity control ensures that the data in a database is consistent and valid.

Inconsistency can result when there is redundancy in the database, such as when the same data exists in two separate places in the database and the DBMS is not aware of the duplication. It would then be possible for a transaction to update one and only one of the two entries. After that the DBMS could possibly supply incorrect or contradictory information to its users, and is clearly in an inconsistent state. A DBMS with integrity control will generate an error if this occurs in a properly designed database.

It is also possible to retain the redundancy in the database, as long as it is controlled. In this case the DBMS is aware of both entries, and any change to one will cause the DBMS to update the other one as well. This is generally referred to as a cascading update. You should note that while it is possible for the database to temporarily be in an inconsistent state while the update is proceeding, the DBMS will make this data unavailable to users until the update is finished.

Ensuring data is valid is also an important function of the database. A payroll database that shows an employee worked 400 hours in a week instead of 40 clearly contains invalid data. There is no way for the DBMS to determine this value is invalid itself, but a DBMS with the proper integrity control functions can allow the Database Administrator to define and implement integrity constraints. These integrity constraints will check to ensure the data is valid whenever a transaction tries to add or modify data in the database.

2.8 Access Control

The centralized and multi-user nature of a DBMS requires that some form of security control is in place, both to prevent unauthorized access and to limit access for authorized users. Security control can generally be divided into two areas, user authorization and transaction authorization.

User Authorization

User authorization helps to protect a database against unauthorized use, usually by requiring that a user enter a user name and a password to gain entry to the system. The password is usually known only to the user and the DBMS, and is protected by the DBMS at least as well as the data in the database. However, it should be noted this user name and password scheme can not guarantee the security of the database. It does not prevent you from choosing a password that is easy to guess (like the name of a spouse or pet) or from recording your password in an accessible location (like on the front of your computer!).

Transaction Authorization

Generally, not all users are given the same access rights to different databases or different parts of the same database. In some cases, sensitive data such as employee salaries should only be accessible to those users who need it. In other cases, some users may only require the ability to read some data items, where other users require the ability to both read and update the data.

A Point-Of-Sale (POS) system is a good example of the second case: clerks working in a store might need read access for the price of an item, but should not be able to change the price. Employees at the head office may need to read and update the data, in order to enter new prices for the item.

Transaction authorization helps to protect a database against an authorized user trying to access a data item they do not have permission to access (this may occur either intentionally or unintentionally). The DBMS usually keeps a record of what rights have been granted to users on all of the data objects in the database, and checks these rights every time a user transaction tries to access the database. If the user does not have the proper rights to a data item, the transaction will not be allowed. It is the responsibility of the Database Administrator to explicitly grant the rights assigned to each user.

2.9 Recovery Methods

Although we would like to think otherwise, any DBMS may be the victim of a software or hardware failure. These failures can generally be divided into two types, known as system failures and media failures. After a failure occurs the DBMS should have some method to recover the information that was entered into the database. In fact, this is one of the main advantages a DBMS has over the file-based systems they replaced.

Recovery from System Failures

A system failure is a failure of the volatile storage of the computer system. Volatile storage is the term used for the main memory in a computer system. A system failure may be caused by a power failure, a program/operating system crash, or some other reason. The most common method of protecting against system failures is the use of a transaction journal or transaction log.

The transaction journal is a history of all changes made to the database. Since the exact state of a transaction in progress cannot be reliably determined in the event of a system failure, it cannot continue to completion when the system restarts. The DBMS uses the transaction journal to undo all changes that have already been written to disk for all transactions that terminated abnormally.

Similarly, it is possible for a transaction to have completed before the system failure, without having all changes it made to the data written to disk. (The data may still be stored in the DBMS system buffers at the time of the failure.) In this case, the DBMS uses the transaction journal to redo or rollover all the transactions that were completed, but not yet written to disk.

Recovery from Media Failures

Media failure is a failure of the stable storage or secondary storage of the computer system. These terms are usually used to refer to the disk storage system of a computer system. Media failures are usually caused by physical trauma to the disk itself, such as a head crash, fire, earthquake, or exposure to vibration or g-forces outside its physical operating limits. When a media failure occurs, there is nothing that can prevent the loss of data on the affected disk. However, the database can be successfully restored if the database provides archiving or data mirroring operations.

Archiving allows you to back up your database at periodic intervals; every night for example. This allows you to save a backup copy of every file that has changed since the last backup. When a media failure occurs, you can use these backup copies to reconstruct your database up to the point in time of the backup. It is important to note that all changes made since the last backup will be lost. This type of archiving is suitable for some database systems, but is not robust enough for critical applications such as electronic banking or airline reservation systems, where every effort must be made to ensure no data is lost in the event of a media failure.

Data mirroring involves continuously creating an archive copy of the entire database. To do this, a copy of the entire database at a single point in time and a duplicate transaction journal are created. Then any changes made to the database are written to both logs simultaneously, in effect creating two copies of the database. If a media failure occurs between the time the two logs are written, then only that part that was written in one log and not the other will not survive. But since there is a record of it in one log, an error message can be sent to the user notifying them of the loss during recovery.

Obviously, when using either of these methods, you want to store the backup copy in a location where it and the original database will not be destroyed at the same time.

Previous PageTop Of PageTable Of ContentsNext Page

Copyright 2002 SYSCOM Computer Engineering Co. All rights reserved.