|
  
2.
Advantages of a DBMS
2.1
Early Methods of Computer Data Storage
2.2
Database Management Systems
2.3
Data Models
2.4
Data Independence
2.5
High-Level Language Support
2.6
Transaction Management
2.7
Integrity Control
2.8
Access Control
2.9
Recovery Methods
2.
Advantages of a DBMS
One of
the most common tasks for computer systems today is storing and managing
data. This data can be any information you want to store about some
subject. It can include facts, figures, pictures or almost anything
else. In general, any collection of information about some particular
subject is a database.
Before
the widespread use of computers, this information was stored on paper
in file folders and filing cabinets. To retrieve some information, you
would go to a file cabinet, take a file, and look at the information
in the folder. As the collection of information got larger, it became
more and more difficult to retrieve data in a timely manner. Even remembering
where to find the information became harder.
2.1
Early Methods of Computer Data Storage
When people
first began using computers to store information, they stored it in
files with a specific, known format. They had to remember where the
files were located, and they had to know how to find the data in the
file. This was not much different than storing it in filing cabinets.
Also,
to get the information from the files, you had to write a special program
to retrieve the data. Once this program was written, you could retrieve
the data very quickly. However, if you decided to change the way your
data was stored, or you wanted to look for different data, you had to
write a new program.
These
programs were usually written by programmers in a company's information
systems department. As the amount of information available on the computer
grew, the requests for new and different ways to look at the data also
increased. Large backlogs of user requests for programs became more
common, and there were sometimes delays of weeks or months for the programs
required to view specific data. This led to the need for a system of
storing data where the data and how it was stored was independent of
the methods used for retrieving it.
2.2
Database Management Systems
One of
the most common tasks for computer systems today is storing and managing
data. This data can be any information you want to store about some
subject. It can include facts, figures, images, or almost anything else.
In general, any collection of information about some particular subject
is a database.
The information
in a database is managed by a Database Management System, or DBMS. A
DBMS is basically a computer record-keeping system, whose purpose is
to maintain information and make that information available on demand.
There are several important features that make a DBMS powerful and flexible
enough that one underlies virtually all information systems today. A
typical DBMS is expected to provide a number of specific functions:
-
Data model-the DBMS must have some means of representing the
data in a way that you can easily understand. The data model is
actually a mathematical abstraction which ensures all the data present
in the database is available to you, and through which you view
the data.
-
Data independence-the DBMS should insulate you from any changes
in the physical storage structure of the database. A request for
specific information should return the correct results, even if
the physical storage structure of the database has changed.
-
High-level language support-the information in the database
should be accessible by using a high-level language. This language
should allows you to define, access, and manipulate data without
having to know what the physical storage structure of the database
looks like.
-
Transaction management-the DBMS should provide some method to
ensure that multiple transactions on the same data do not interfere
with each other. This allows multiple people to use the database
simultaneously.
-
Integrity control-the DBMS should guarantee that data in the
database does not have invalid values, and that there is no inconsistency
in related data. This prevents you from accidentally entering invalid
data or performing operations that can violate some dependency between
data.
-
Access control-the DBMS should provide facilities for protecting
the security and privacy of your data from unauthorized users. This
prevents unauthorized users from gaining access to the database,
and prevents authorized users from viewing sensitive data that they
do not need to see.
-
Recovery methods-the DBMS should provide a method for backing
up and restoring the data in the database, in the event of some
type of system failure.
-
Most
of the features of a DBMS listed above provide several benefits
over the older file-based computer storage systems they replaced.
These features and their benefits are explained in detail below.
2.3
Data Models
The way
data is physically stored on computers probably has little or no significance
to you. All of the data may be stored as simple binary numbers that
span several files or even several disks. The DBMS uses a data model
to represent this stored data in a way that is meaningful and easy for
you to understand.
The data
model is a mathematical abstraction of the data that provides structures
and access techniques for the data. This allows the data to be retrieved
and manipulated by users and application programs without the need to
worry about where the data is located or how it is actually stored.
There
have been several popular data models over the years, but the relational
data model is currently the most popular and most widely used. This
is also the data model used by DBMaker. The relational data model presents
information to the user in the familiar form of tables with rows and
columns. Each row contains data on one subject or item, and each column
contains attributes of these subjects or items, such as name, size,
quantity, etc.
2.4
Data Independence
One of
the biggest advantages of a DBMS is data independence. Data independence
allows changes to the structure of a database, without requiring application
programs or users to make any changes in the way they access the data.
There are two kinds of data independence: physical and logical.
Physical
Data Independence
As mentioned
previously, early file-based systems stored all of their information
in files with a specific format. To retrieve data from the files in
this situation, a programmer who knows the format of those files has
to write a program. If there is any change to the structure of the data,
the program has to be changed so it can read the information from the
new structure in the proper order. If you want to look at the data in
a file in a different way, a new program has to be written. The organization
of the data and the access techniques for retrieving that data are built
into the application logic and code. This type of system is said to
be data dependent.
In a
DBMS the physical structure of the database may be changed without affecting
application programs or altering the users view of the data. These changes
may affect the speed or efficiency of application programs, but the
user programs should not have to be altered. This is possible because
the DBMS uses the abstraction provided by the data model to make the
physical structure of the database transparent to both users and application
programs. The data is translated from the way it is physically stored
and accessed on disk to the representation and access techniques used
by the outside world, or the logical view.
If the
physical structure changes, the DBMS is aware of these changes and still
provides the same logical view as before the changes. Because the logical
view presented to the outside world remains constant, application programs
and user interactions based on the logical view of the data do not have
to be altered to provide for the change in the physical structure. Therefore
the DBMS is said to have physical data independence.
Logical
Data Independence
Sometimes
it is necessary to make a change in the logical structure of the data.
This may result from the desire to add more information to a database,
for example. As long as the logical view of the existing data
is not altered, this should have no effect on user interactions or application
programs. This is because the data model allows the use of abstract
characteristics such as names to access data instead of the physical
characteristics used in a file based system. Since adding data will
not alter these abstract characteristics for a particular data item,
no changes in access methods and techniques are required. Existing programs
and user queries will run unaffected, and will only have to be modified
if the new data must be used. This is known as logical data independence.
2.5
High-Level Language Support
Most databases
usually include the capability to use some type of high-level query
language. These high-level languages allow a user to define, access
and manipulate data without having to reference the physical storage
structure of the database.
Without
support for a high-level query language, the information in a database
could only be accessed by writing a program that uses the application
program interface (API) of the database, or another similarly low-level
method. This low-level access method is very useful for creating user
applications that automate common and repetitive tasks, but it does
not allow any easy way to do one-time ad-hoc (unplanned, unexpected)
queries. Every time someone wanted to do an ad-hoc query, they would
have to write a program to perform the query. This would involve a significant
amount of effort and training on the part of users, or would greatly
increase the workload for application programmers just for doing a one-time
query.
The inclusion
of a high-level language makes performing ad-hoc queries a relatively
simple task. Most high-level languages supported by databases use an
English-like syntax that makes them very easy to learn. Ease of use
does not mean these high-level query languages are limited, however.
They are usually very powerful, and are able to perform any and all
functions required of the database. DBMaker, like most other Database
Management Systems, uses Structured Query Language (SQL). SQL
is the de-facto standard query language used in the industry today.
2.6
Transaction Management
Database
management systems are designed to store a large amount of information
and provide access for a large number of simultaneous users. Because
these users may be performing operations on the same data simultaneously,
some type of transaction management is required to ensure that the correct
data is written to the database.
What
is a Transaction?
A transaction
is traditionally defined as a logical unit of work. What this
really means is that a transaction is one or more operations
on a database that must be completed together to leave the database
in a consistent state. It is easy to see how a single operation on a
database can be a transaction. It is self-contained and must either
complete successfully (and change the data) or fail (and leave the data
unchanged).
It is
not as clear how multiple operations can make up a single transaction.
This is best illustrated by a simple example. Suppose you store two
different kinds of information in your database: records of shipments
sent to customers and records of the items currently in stock (both
including quantity of items). When an item is shipped to a customer,
it is added to the shipments list. This is one operation on the database.
However, the quantity of the item shipped must also be subtracted from
the items currently in stock.
If both
of these are not completed together, the database will be in an inconsistent
state. Either the quantity of items in stock will be too high (items
shipped, but not subtracted from items in stock) or too low (items subtracted
from items in stock, but not shipped). Both of these operations together
make up a single transaction, and both must complete successfully or
both must fail.
If a transaction
completes successfully and changes the data, we say the transaction
has been committed. If it fails and leaves the data unchanged,
we say it has been rolled back.
Concurrency
Control
With the
large amount of information being stored on computers, it does not make
sense to allow only a single user to access a database at one time.
There are usually many people that require access to the same data at
the same time, and delays may result in decreased productivity and the
loss of valuable time. As a result, databases usually allow concurrent
access. This allows multiple users to access the database simultaneously.
This is
not a problem if the users are accessing different data, but can become
a problem when they operate on the same data. When two user transactions
operate on the same data without any coordination, the results become
unpredictable. Some transactions may read obsolete data, or modifications
that were apparently completed successfully may be lost.
To prevent
this transactions are serialized. This means two transactions
that are executed concurrently will give the same results as if they
were performed one after the other, and each user can access the database
as if there were no other users concurrently accessing the database.
In order to do this, sometimes a transaction must wait for another transaction
to finish using the same data item in the database. This usually occurs
when a transaction tries to read or modify a data item simultaneously
with another transaction that modifies the data. If the second transaction
was allowed to proceed without coordination, the results can be unpredictable.
This can
be illustrated with a simple example. Suppose one transaction (and remember,
a transaction can contain multiple operations) modifies a data item
and then continues on to perform other operations. While the other operations
are being performed, a second transaction modifies the same data item
and also continues. Before either transaction can be committed, the
first transaction encounters an error and is rolled back. The DBMS returns
the database to the state it was in before the transaction occurred,
and gives the data item its original value. But because the second transaction
has not yet been committed, the value it placed in the data item
is lost.
To allow
transactions to be serialized and prevent uncoordinated access to the
database, some form of concurrency control is required. One form
of concurrency control often used by a DBMS is locks.
The
Lock Concept
In its
simplest form, a lock on a data item allows a DBMS to guarantee a transaction
will have exclusive access to that data item. No other transaction can
perform operations on that item while it is locked. However, in a typical
multi-user DBMS, this is not always a practical approach. Instead a
more complex model is used where there are:
Types
of Locks
A share
lock allows multiple transactions to access a data item simultaneously,
but with one restriction: the transactions cannot modify the data item.
This may occur when multiple transactions want to read the value of
a data item, but will not change it. In this case multiple accesses
are acceptable because they will not interfere with each other.
When a
transaction wants to modify a data item, allowing other transactions
to read or modify the data item at the same time can lead to inconsistencies
in the database. When a transaction wants to modify a data item, an
exclusive lock is used to prevent other transactions from accessing
the data. This allows the transaction to continue with its other operations
certain that the data item will remain in a stable state for the duration
of its lifetime.
A DBMS
may also use different levels of locks, although the reasons behind
this are more for performance issues than concurrency control. In a
relational DBMS, the smallest data item that can usually have a lock
placed on it is the row. These rows are grouped together into pages,
which are further grouped into tables. In a DBMS that supports it, pages
and tables may also be locked as a single item, and this also locks
all of the smaller data items contained within it.
Lock
Escalation
If a transaction
has to access most or all of the rows in a page, the time required to
acquire all of the locks and resources used to keep track of the locked
items will be quite high if each row is locked individually. Using a
page lock in this case will reduce the time and resources used, but
at the cost of concurrency for other transactions. If a second transaction
wants to acquire a lock on one of the rows in the same page that the
first has locked, it will now be unable to do so. However, this is usually
offset by the gain in performance.
A similar
situation occurs when a transactions accesses most or all of the pages
in a table, where using a table lock instead of a page lock will decrease
the time and resources used, again at the expense of concurrency. The
level of lock used on a specific data item can usually be set manually
in the transaction. In a DBMS that supports it, a lock on an object
may automatically change to the next higher level when the DBMS determines
that performance will be improved while still maintaining an acceptable
level of concurrency. This is known as automatic lock escalation.
2.7
Integrity Control
People
naturally assume the data in a database is accurate. Indeed, this is
the primary reason database systems have evolved: to provide the ability
to retrieve accurate data in a timely manner. To ensure this is true,
a DBMS must have some form of integrity control. Basically integrity
control ensures that the data in a database is consistent and
valid.
Inconsistency
can result when there is redundancy in the database, such as when the
same data exists in two separate places in the database and the DBMS
is not aware of the duplication. It would then be possible for a transaction
to update one and only one of the two entries. After that the DBMS could
possibly supply incorrect or contradictory information to its users,
and is clearly in an inconsistent state. A DBMS with integrity control
will generate an error if this occurs in a properly designed database.
It is also
possible to retain the redundancy in the database, as long as it is
controlled. In this case the DBMS is aware of both entries, and any
change to one will cause the DBMS to update the other one as well. This
is generally referred to as a cascading update. You should note that
while it is possible for the database to temporarily be in an inconsistent
state while the update is proceeding, the DBMS will make this data unavailable
to users until the update is finished.
Ensuring
data is valid is also an important function of the database. A payroll
database that shows an employee worked 400 hours in a week instead of
40 clearly contains invalid data. There is no way for the DBMS to determine
this value is invalid itself, but a DBMS with the proper integrity control
functions can allow the Database Administrator to define and implement
integrity constraints. These integrity constraints will check
to ensure the data is valid whenever a transaction tries to add or modify
data in the database.
2.8
Access Control
The centralized
and multi-user nature of a DBMS requires that some form of security
control is in place, both to prevent unauthorized access and to
limit access for authorized users. Security control can generally be
divided into two areas, user authorization and transaction
authorization.
User
Authorization
User authorization
helps to protect a database against unauthorized use, usually by requiring
that a user enter a user name and a password to gain entry to the system.
The password is usually known only to the user and the DBMS, and is
protected by the DBMS at least as well as the data in the database.
However, it should be noted this user name and password scheme can not
guarantee the security of the database. It does not prevent you
from choosing a password that is easy to guess (like the name of a spouse
or pet) or from recording your password in an accessible location (like
on the front of your computer!).
Transaction
Authorization
Generally,
not all users are given the same access rights to different databases
or different parts of the same database. In some cases, sensitive data
such as employee salaries should only be accessible to those users who
need it. In other cases, some users may only require the ability to
read some data items, where other users require the ability to both
read and update the data.
A Point-Of-Sale
(POS) system is a good example of the second case: clerks working in
a store might need read access for the price of an item, but should
not be able to change the price. Employees at the head office may need
to read and update the data, in order to enter new prices for the item.
Transaction
authorization helps to protect a database against an authorized user
trying to access a data item they do not have permission to access (this
may occur either intentionally or unintentionally). The DBMS usually
keeps a record of what rights have been granted to users on all of the
data objects in the database, and checks these rights every time a user
transaction tries to access the database. If the user does not have
the proper rights to a data item, the transaction will not be allowed.
It is the responsibility of the Database Administrator to explicitly
grant the rights assigned to each user.
2.9
Recovery Methods
Although
we would like to think otherwise, any DBMS may be the victim of a software
or hardware failure. These failures can generally be divided into two
types, known as system failures and media failures. After
a failure occurs the DBMS should have some method to recover the information
that was entered into the database. In fact, this is one of the main
advantages a DBMS has over the file-based systems they replaced.
Recovery
from System Failures
A system
failure is a failure of the volatile storage of the computer
system. Volatile storage is the term used for the main memory in a computer
system. A system failure may be caused by a power failure, a program/operating
system crash, or some other reason. The most common method of protecting
against system failures is the use of a transaction journal or
transaction log.
The transaction
journal is a history of all changes made to the database. Since the
exact state of a transaction in progress cannot be reliably determined
in the event of a system failure, it cannot continue to completion when
the system restarts. The DBMS uses the transaction journal to undo
all changes that have already been written to disk for all transactions
that terminated abnormally.
Similarly,
it is possible for a transaction to have completed before the system
failure, without having all changes it made to the data written to disk.
(The data may still be stored in the DBMS system buffers at the time
of the failure.) In this case, the DBMS uses the transaction journal
to redo or rollover all the transactions that were completed,
but not yet written to disk.
Recovery
from Media Failures
Media
failure is a failure of the stable storage or secondary storage of the
computer system. These terms are usually used to refer to the disk storage
system of a computer system. Media failures are usually caused by physical
trauma to the disk itself, such as a head crash, fire, earthquake, or
exposure to vibration or g-forces outside its physical operating limits.
When a media failure occurs, there is nothing that can prevent the loss
of data on the affected disk. However, the database can be successfully
restored if the database provides archiving or data mirroring
operations.
Archiving
allows you to back up your database at periodic intervals; every night
for example. This allows you to save a backup copy of every file that
has changed since the last backup. When a media failure occurs, you
can use these backup copies to reconstruct your database up to the
point in time of the backup. It is important to note that all
changes made since the last backup will be lost. This type of archiving
is suitable for some database systems, but is not robust enough for
critical applications such as electronic banking or airline reservation
systems, where every effort must be made to ensure no data is lost in
the event of a media failure.
Data mirroring
involves continuously creating an archive copy of the entire database.
To do this, a copy of the entire database at a single point in time
and a duplicate transaction journal are created. Then any changes made
to the database are written to both logs simultaneously, in effect creating
two copies of the database. If a media failure occurs between the time
the two logs are written, then only that part that was written in one
log and not the other will not survive. But since there is a record
of it in one log, an error message can be sent to the user notifying
them of the loss during recovery.
Obviously,
when using either of these methods, you want to store the backup copy
in a location where it and the original database will not be destroyed
at the same time.
   
|