|
  
16.
Distributed Databases
16.1
Introduction to Distributed Databases
16.2
DBMaker Distribution Structure
16.3
Distributed Database Environment Setup
16.4
Access Distributed Database Objects
16.5
Distributed Transaction Control
16. Distributed Databases
This chapter
introduces the Distributed Database management functions provided by
DBMaker, including distributed databases, the distributed architecture
of DBMaker, distributed data access, distributed database object management,
and distributed transaction management.
16.1 Introduction to Distributed Databases
Traditional
client-server database management systems, as shown in figure 16-1,
locate the database in one specific computer in network, and this computer
is responsible for handling all client requests.

Figure
16-1: Traditional client/server database management system.
Distributed
databases, as shown in figure 16-2, locate a copy of the database in
several computers in the network, and each can independently support
database clients. The distributed database management system manages
the databases in these computers, so you can access your data transparently,
without any concern for where the data is physically located.
DBMaker
3.5 supports a true distributed architecture to provide a complete and
robust distributed database management system (Distributed DBMS). It
provides remote database connections, distributed queries, and distributed
transaction management. DBMaker also provides table and database replication
to keep data up-to-date automatically.

Figure
16-2: Distributed database in client-server
In the
DBMaker distributed database environment, you can write application
programs using the DBMaker ODBC 3.0 compatible API or perform ad-hoc
SQL queries that access data from different parts of the distributed
database. DBMaker will transparently integrate the data and return the
results to you, just as if they all came from the local database.
In this
chapter, we will briefly describe the system architecture and basic
functions of distributed database management using DBMaker 3.5. This
includes configuring the distributed environment, managing remote data
links and distributed transactions, and performing distributed queries.
Whether you are a database administrator or an application developer,
this chapter will provide you with a thorough overview of the simplicity
and power of the DBMaker distributed architecture.
16.2 DBMaker Distribution Structure
The DBMaker
distributed database environment builds on the traditional client/server
architecture, effectively linking multiple client applications and multiple
database servers. Client applications process user requests and display
the results, and the database servers handle data management. Each client
has a direct connection to a single database server, which is known
as the Coordinator Database to that client. Through the Coordinator
Database, the client can connect to other remote databases, which are
known as Participant Databases.
DBMaker
uses a hierarchical distribution structure to connect to remote databases.
This allows DBMaker to access data from a remote database with no direct
connection to the Coordinator Database by going through one of the Participant
Databases. When this happens, the Participant Database acts as coordinator
for any child databases beneath it in the hierarchy, and is known as
the Local Coordinator Database. The DBMaker distribution structure is
illustrated in figure 16-3.

Figure
16-3: DBMaker distribution structure
In figure
16-3, the client application program connects to the database server
in New York, which makes the database in New York the Coordinator Database.
If you use the database in New York to access data from London and Hong
Kong, then both the London and Hong Kong databases are Participant Databases.
Some of the data you are looking for in Hong Kong might actually be
in the databases in Tokyo or Taipei, so the databases in Tokyo and Taipei
are Child Participant Databases. This makes the database in Hong Kong
a Coordinator Database for the databases in Tokyo and Taipei, so the
database in Hong Kong is not only a Participant Database, but also acts
as a Local Coordinator Database.
16.3 Distributed Database Environment Setup
Setting
up a distributed database environment using DBMaker 3.5 is very simple.
All you need to do is add some keywords to the dmconfig.ini configuration
file to set the distributed database configuration options.
You must
provide values for the following keywords when setting up a distributed
database environment in DBMaker 3.5. Keywords with the prefix DB_ are
for the client/server connection between the client and the Coordinator
Database, and keywords with the prefix DD_ are for the distributed database
connections between the Coordinator Database and the Participant Databases.
DB_SvAdr = <ip_address/host name> - specifies
the IP address or host name of the Coordinator Database. This is used
so the client application knows where it should look for the Coordinator
Database to connect.
DB_PtNum = <port number> - specifies the port number
that the client application and the Coordinator Database should use
to communicate.
DD_DDBMD = <0/1> - enables distributed database mode for
the Coordinator Database. The default value is 0, which means that distributed
database mode is disabled.
DD_CTIMO = <number of seconds> - specifies the time
in seconds that the Coordinator Database should wait when trying to
establish a connection to a Participant Database. The default value
is 5 seconds.
DD_LTIMO = <number of seconds> - specifies the time
in seconds that the Coordinator Database should wait when trying to
establish a lock on the requested data in a Participant Database. The
default value is 5 seconds.
DD_GTSVR = <0/1> - enables the global transaction recovery
daemon (GTRECO). The default value is 1, which means the global transaction
recovery daemon is enable.
DD_GTITV = <YYYY/MM/DD hh:mm:ss>
- specifies the time interval that the global transaction recovery daemon
(GTRECO) should wait when processing pending global transactions.
DBMaker
supports an automatic recovery mechanism for distributed transactions
that have failed due to network problems or errors at the Participant
Database. The automatic recovery mechanism is handled by the GTRECO
daemon, which will check whether a distributed database server has any
problems with pending global transactions. If any problems are detected,
the GTRECO daemon will attempt to recover the pending global transactions.
The GTRECO daemon is enabled using the DD_GTSVR keyword in the dmconfig.ini
configuration file.
To better
understand how DBMaker manages distributed databases, take an example
where the ABC Bank has two branch offices. One branch office is in Los
Angeles, and the other one is in Seattle. Each branch maintains their
own customer and business data, but the government and financial database
is under the central control of the Los Angeles branch.
LA branch
database server dmconfig.ini configuration file:
[BankTranx] ;LA branch business database DB_DbDir = c:\database DB_SvAdr = 192.168.0.1 DB_PtNum = 21000 DD_DDBMD = 1
[BankMIS] ;government and financial database DB_DbDir = c:\database DB_SvAdr = 192.168.0.1 DB_PtNum = 30000 DD_DDBMD = 1
[BankTranx@Seattle] ;Seattle branch business database DB_SvAdr = 192.168.0.2 DB_PtNum = 21000 DD_CTIMO = 20 DD_LTIMO = 10
|
Seattle
branch database server dmconfig.ini configuration file:
[BankTranx] ;Seattle branch business database DB_DbDir = c:\database DB_SvAdr = 192.168.0.2 DB_PtNum = 21000 DD_DDBMD = 1
[BankMIS] ;government and financial database DB_SvAdr = 192.168.0.1 DB_PtNum = 30000 DD_CTIMO = 20
[BankTranx@La] ;LA branch business database DB_SvAdr = 192.168.0.1 DB_PtNum = 21000 DD_CTIMO = 20 DD_LTIMO = 10
|
LA branch
client application dmconfig.ini configuration file:
[BankTranx] ;LA branch business database DB_SvAdr = 192.168.0.1 DB_PtNum = 21000
|
Seattle
branch client application dmconfig.ini configuration file:
[BankTranx] ;Seattle branch business database DB_SvAdr = 192.168.0.2 DB_PtNum = 21000
|
In the
configuration files shown above, you must set DD_DDBMD=1 in the configuration
section for the local database to enable distributed database support.
In this example, you would put this keyword in the BankTranx configuration
section of both the Los Angeles and Seattle dmconfig.ini configuration
files.
In addition,
you must include a database configuration section for the Participant
Database in the Coordinator Database configuration file, and for the
Coordinator Database in the Participant Database configuration file.
In this case, both the Los Angeles branch database and the Seattle branch
database use the same database name. If you use the remote database
name for the name of the database configuration section, it will cause
a conflict with the local database name in the dmconfig.ini configuration
file.
To avoid
this type of problem when using distributed databases, DBMaker allows
you to distinguish the remote database name from the local database
name by attaching a server host description to the remote database name
in the local dmconfig.ini configuration file. In this case, the remote
database name would look like this:
database_name@server_host_description
|
The server
host description can be any identifying name, such as the IP address
or host name of the database server, the domain name, or almost any
other descriptive text. In this example, the Los Angeles branch client
application would use BankTranx@Seattle when it wants to access data
in the Seattle branch database, and the Seattle branch client application
would use BankTranx@La when it wants to access data in the Los Angeles
branch database.
You must
also set up the server address and port name for both the local database
and the remote database in their respective configuration sections in
the configuration files at both the Los Angeles and Seattle branches.
In this example, the Los Angeles branch configuration file would contain
the local server address in the BankTranx configuration section, and
would contain the Seattle branch server address in the BankTranx@Seattle
configuration section. Similarly, the Seattle branch configuration file
would contain the Los Angeles branch server address in the BankTranx
configuration section, and would contain the Seattle branch server address
in the BankTranx@La configuration section.
You should
also set the DD_CTIMO and DD_LTIMO remote connection parameters. These
parameters go in the configuration section for the Participant Database
in the Coordinator Database configuration file, and for the Coordinator
Database in the Participant Database configuration file.
Every database
server in the network can operate on distributed database objects. You
can access any of these database servers through your Coordinator Database,
in a manner similar to the normal client/server architecture. The SQL
commands that reference a remote database are be passed to remote database
server through the Coordinator Database. The Coordinator database will
decompose this SQL command into the local and remote portions, and send
the appropriate commands to the remote database server. The Coordinator
Database will wait for the remote database to return its results, and
then merge all local and remote data and return the combined results
to you.
16.4 Access Distributed Database Objects
DBMaker
provides several different methods for you to access a Participant Database:
Specify the Participant Database name directly.
Using database links defined in the Coordinator Database.
Through remote object mapping such as views or synonyms.
The difference
between the first two approaches is that database links contain security
information in addition to the remote database name. This allows you
to specify the user name and password that you want to use in the database
link when you access the remote database.
There is
no obvious difference between the statements of a distributed query
and a normal query except in the way database objects are specified.
However, when using a remote database, the only remote database objects
you can access are tables, views, or synonyms. To access a remote database
object, you have to provide the remote database name or database link
when you specify the name of the database object. This provides two
ways to identify a remote database object:
remote_database_name:object_owner.object_name
database_link:object_owner.object_name
The following
are examples of how to specify a remote database object in a query.
dmSQL> SELECT * FROM Bank:EmpTaple; dmSQL> DELETE FROM Bank:EmpTable WHERE id = 101; dmSQL> INSERT INTO Link1:mis.account
VALUES (2003,'Kevin Liu','2327-0021');
|
The following
example shows you how to access remote database objects in two different
Participant Databases in the same query.
dmSQL> SELECT * FROM ABCBank@La:account a, ABCBankMIS@Seattle:account b WHERE a.name = b.name;
|
Connecting to Remote Databases Using Database Names
You can
connect to remote databases with the database name of the Coordinator
Database Server. To apply this way, you have to know the remote database
name which is defined in the dmconfig.ini file in the Coordinator Database
Server. For example, a client application in the Los Angeles branch
of the ABC bank can access the database which is located in the Taipei
branch of the ABC bank with the following SQL command.
dmSQL> CONNECT TO BankTranx SYSADM aa; dmSQL> SELECT * FROM BankTranx@Taipei:SYSADM.Account ORDER BY AccID;
|
In this
example, it appears like you are connecting to the Taipei branch with
the user name SYSADM and the password aa. In reality you are connecting
to the Coordinator Database, which is the Los Angeles branch database.
The Coordinator Database then connects to the remote database with the
account and password you used when connecting to the Coordinator Database.
More advanced
SQL requests, such as joins, can also access remote database objects
in same way.
dmSQL> SELECT * FROM BankMIS:SYSADM.Personnel ORDER BY PID; dmSQL> SELECT Personnel.* FROM BankTranx@Taipei:Account A, BankMIS:Personnel B WHERE A.CustID = B.CustID;
|
Connecting to Remote Databases Using Database Links
A database
link creates a connection to a remote database, and contains the login
information and password necessary for connecting to the remote database.
This allows you to connect to a remote database with a different user
name than you are using in the Coordinator Database, or to connect to
a remote database that you do not have an account in using a public
link. It also makes data in a distributed database environment location
transparent. The link definition, which also contains the login information
and password, is stored in the Coordinator Database.
Creating Database Links
Below is
the SQL syntax for creating a database link.

Only database
administrators can create public database links that can be used by
all users of that database. Any user can create private database links
that only they can use. Different users may create private database
links with same name. If you create a private database link whose name
is identical to a public database link, the private database link will
override the public database link.
If you
do not explicitly specify the type of link you want to create, DBMaker
will create your link as a private link by default. If you do not specify
the login account and password in the IDENTIFY BY clause, your current
login name and password will be used by default.
Accessing Remote Database Objects with Database Links
The following
is an example to show how to access remote database objects using database
links. In this example the SYSADM connects to the database and creates
a public link named Bank_Seattle that connects to the Seattle branch
database using the SYSADM account. The SYSADM updates some values and
disconnects. Then user1 connects and performs a query on the Account
table.
dmSQL> CONNECT TO BankTranx SYSADM; dmSQL> CREATE PUBLIC DATABASE LINK Bank_Seattle 2> CONNECT TO BankTranx@Seattle IDENTIFIED BY SYSADM; dmSQL> UPDATE Bank_Seattle:Account SET balance = balance + 100 2> WHERE id = 1001; dmSQL> DISCONNECT; dmSQL> CONNECT TO BankTranx user1 pwd1; dmSQL> SELECT * FROM Bank_Seattle:Account;
|
The following
example is similar to the example above, but the SYSADM does not specify
the account to use when connecting to the public link. This means that
when user1 uses the public link to connect to the Bank_Seattle database,
there must be an account for user1 in the remote database, and the user1
account must have the authority to query the SYSADM.Account table. Otherwise
this example will cause an error to occur.
dmSQL> CONNECT TO BankTranx SYSADM; dmSQL> CREATE PUBLIC DATABASE LINK Bank_Seattle
2> CONNECT TO BankTranx@Seattle; dmSQL> SELECT * FROM Bank_Seattle:Account; dmSQL> DISCONNECT; dmSQL> CONNECT TO BankTranx user1 pwd1; dmSQL> SELECT * FROM Bank_Seattle:SYSADM.Account;
|
If a database
link name is the same as the remote database name, DBMaker will use
the database link name in preference to the remote database name. If
you want to access the remote database directly, you must explicitly
specify the remote database name in the form dbname@"" to
force DBMaker to access the remote database directly instead of through
database link.
The following
example shows two different ways to access a remote databases, one through
a database link and the other by explicitly specifying the remote database
name in the form dbname@"".
dmSQL> CONNECT TO BankTranx SYSADM; dmSQL> CREATE PUBLIC DATABASE LINK BankMIS CONNECT TO BankMIS 2> IDENTIFIED BY SYSADM; dmSQL> DISCONNECT; dmSQL> CONNECT TO BankTranx user1 pwd1; dmSQL> SELECT * FROM BankMIS:Personnel; //database link dmSQL> SELECT * FROM BankMIS@"":SYSADM.Personnel; //remote db name
|
Deleting Database Links
Below is
the SQL syntax for deleting a database link.

Only database
administrators can delete public database links, and only the owner
of a private database link can delete that link. If you try to delete
a database link when both a public and a private database link with
the same name exist, DBMaker will assume you want to delete the private
database link if you do not specify whether the link you are deleting
is a public or private database link.
The following
example deletes a public database link named BankMIS.
dmSQL> DROP PUBLIC DATABASE LINK BankMIS;
|
Accessing Remote Database Objects Using Database Object Mapping
Database
Object Mapping provides better location transparency in a distributed
database environment. There is no difference between the way users access
remote database objects with Database Object Mapping and the way they
access local database objects. This type of Database Object Mapping
includes using views and synonyms.
Accessing Remote Database Objects Using Synonyms
Using a
synonym to define a remote database object means that you give the remote
database object an alias name. The privileges you have in the remote
database depend on the privileges of the currently connected user, and
not the user who originally defined the synonym.
The following
example shows the what happens when you access a remote database object
using synonyms. The comments indicate the equivalent SQL expression,
the database being connected to, and the account used to connect.
dmSQL> CONNECT TO BankTranx user1; dmSQL> CREATE DATABASE LINK LK1 CONNECT TO BankMIS IDENTIFIED BY user2; dmSQL> CREATE SYNONYM s1 FOR BankTranx:Account; dmSQL> CREATE SYNONYM s2 FOR LK1:user2.Personnel; dmSQL> SELECT * FROM s1; // SELECT * FROM BankTranx:user1.Account; (BankTranx, user1) dmSQL> SELECT * FROM s2; // SELECT * FROM LK1:user2.Personnel; (BankMIS, user2) dmSQL> DISCONNECT; dmSQL> CONNECT TO BankTranx user3; dmSQL> CREATE DATABASE LINK LK1 CONNECT TO BankMIS IDENTIFIED BY user4; dmSQL> SELECT * FROM s1; // SELECT * FROM BankTranx:user3.Account; (BankTranx, user3) dmSQL> SELECT * FROM s2; // SELECT * FROM LK1:user2.Personnel; (BankMIS, user4)
|
Accessing Remote Database Objects Using Views
Using a
view to define a remote database object is a bit different than using
a synonym. The view is not just an alias, but includes the database
name, user account, password, object owner, and object name as part
of the view definition. The privileges you have in the remote database
depend on the privileges of the user whose account was used when the
view was created, and not the currently connected user.
The following
example shows the what happens when you access a remote database object
using views. The comments indicate the equivalent SQL expression, the
database being connected to, and the account used to connect.
dmSQL> CONNECT TO BankTranx user1; dmSQL> CREATE DATABASE LINK LK1 CONNECT TO BankMIS IDENTIFIED BY user2; dmSQL> CREATE VIEW v1 FOR BankTranx:Account; dmSQL> CREATE VIEW v2 FOR LK1:user3.Personnel; dmSQL> SELECT * FROM v1; // SELECT * FROM BankTranx:user1.Account; (BankTranx, user1) dmSQL> SELECT * FROM v2; // SELECT * FROM BankMIS:user3.Personnel; (BankMIS, user2) dmSQL> DISCONNECT; dmSQL> CONNECT TO BankTranx user3; dmSQL> CREATE DATABASE LINK LK1 CONNECT TO BankMIS IDENTIFIED BY user4; dmSQL> SELECT * FROM v1; // SELECT * FROM BankTranx:user1.Account; (BankTranx, user1) dmSQL> SELECT * FROM v2; // SELECT * FROM LK1:user3.Personnel; (BankMIS, user2)
|
Closing Database Links
Once you
access a remote database with an SQL command, the Coordinator Database
will build a remote connection to the Participant Database. The remote
connection will remain open until all users disconnect from the Coordinate
Database or until the link is explicitly closed with the CLOSE DATABASE
LINK command. Since DBMaker provides at most eight remote connections
for each database, it is a good idea to close remote connections that
are no longer required to free these connections.
Below is
the SQL syntax for closing a database link.

The following
example closes a database link using the remote database name BankMIS.
dmSQL> close database link BankMIS;
|
The following
example closes a database link using the remote database link BankLink1.
dmSQL> close database link BankLink1;
|
Actually,
when you issue a CLOSE DATABASE LINK command, DBMaker will decrease
the remote connection counter by one. When the counter reaches zero,
the connection is really closed and the occupied resources freed. Otherwise
the connection remains open.
The following
example closes all database links and frees the connection and any occupied
resources.
dmSQL> CLOSE DATABASE LINK ALL
|
The following
command closes all remote connections that are no longer being used
in the current transaction.
dmSQL> CLOSE DATABASE LINK NONACTIVE
|
Database Link System Catalog Tables
There are
two system catalogs tables related to database links: SYSDBLINK and
SYSOPENLINK. SYSDBLINK records all database link names and their definitions,
while SYSOPENLINK records open connections between databases.
16.5 Distributed Transaction Control
DBMaker
supports a distributed transaction mechanism that is transparent to
users. You do not need to become worry about Participant Databases committing
their part of a distributed transaction, since this is handled automatically
by DBMaker. As far as you are concerned, all transactions behave the
same way as in the traditional client/server architecture.
The following
example shows how distributed transaction control works.
dmSQL> CONNECT TO BankTranx user1; // ABC Bank in Taipei dmSQL> SET AUTOCOMMIT OFF; dmSQL> UPDATE BankTranx:Customer SET money=money-1000 where id=123; dmSQL> UPDATE BankTranx@"Bank_in_Seattle":Customer SET money=money+1000 2> WHERE id=123; dmSQL> COMMIT;
|
Since all
database operations from the client application are handled by the Coordinator
Database, the Coordinator Database knows the scope of the instructions
via Distributed System Catalog Manager. Transactions belonging to the
local site will handled by the Coordinator Database in the same manner
as a regular client/server transaction. Transactions belonging to remote
sites need to reference the appropriate remote database. The Coordinator
Database will exchange information with every Participant Database and
coordinate the whole transaction until it is either rolled back or committed.
Two-Phase Commit
Database
management systems need to maintain data integrity, and this requires
that all transactions are atomic. All operations in the transaction
must commit or roll back together. In the traditional client/server
architecture, a journal is used to make sure the changes are either
rolled back or committed.
In the
distributed database architecture, two-phase commit protocol with presumed
abort is used as the mechanism for controlling distributed transactions
that span multiple database servers. A transaction that modifies data
on two or more databases must complete the two-phase commit protocol
before it is committed. The two-phase commit mechanism guarantees that
all sites commit or roll back globally. It also protects data manipulation
operations performed by remote synonyms, integrity constraints, and
triggers. To commit a transaction, you have to ensure every sub-transaction
has finished, otherwise the transaction will be aborted. For the same
reason, if any sub-transaction cannot commit, the other sub-transactions
must be aborted as well.
Distributed Transaction Recovery
DBMaker
uses the two-phase commit protocol to inform all Participant Databases
to commit a global transaction. Before entering the commit phase, the
Coordinator Database will check the status of the Participant Databases
to ensure there are no server or network problems. If the Coordinator
Database finds a problem with any of the Participant Databases, it informs
the other Participant Databases to roll back their part of the transaction,
and returns an error indicating the global transaction has failed. If
the two-phase commit protocol has finished the preparation phase, but
there is a server or network problem with a Participant Database, the
global transaction is regarded as a success. DBMaker records which Participant
Database server cannot commit its part of the transaction in the SYSGLBTRANX
system catalog table, and also records which database contains a pending
transaction in the SYSPENDTRANX system catalog table for the crashed
database.
DBMaker
also provides an automatic recovery mechanism to handle network or site
failure during the execution of a distributed transaction. In the Coordinator
Database you start the global transaction recovery daemon (GTRECO).
This daemon scans the SYSGLBTRANX system catalog table and periodically
recovers any pending global transactions. Then it tries to connect to
the crashed Participant Database and inform it to commit or roll back
its part of the global transaction.
Heuristic End Global Transaction
After a
network or site failure occurs during the two-phase commit, pending
transactions continue to hold some resources such as locks or journal
blocks. The pending transaction will occupy these resources until the
problem is solved by the global recovery daemon (GTRECO). If the network
or site failure cannot be solved immediately, then some of the users
in the Participant site may be blocked by the held resources. To solve
this problem, DBMaker supports heuristic end global transaction. Heuristic
end transaction is an independent action taken by the database administrator
to force a pending transaction in a Participant Database to commit or
roll back. The database administrator can use DBA Tool to solve this
problem.
To solve
a pending transaction manually, you can do the following:
1. In the
Participant Database, browse the SYSPENDTRANX table to find out whether
there are transactions that have been pending for a long time.
2. In the
Coordinator Database, determine the commitment status of the pending
transaction. For example, if there are two pending transactions "DB_1-3376aafd"
and "DB_2-3376aafd:DB_3-3376ab0f#1", you should ask the administrator
of DB_1 to determine the status of "DB_1-3376aafd" and ask
the administrator of DB_3 to determine the status of "DB_2-3376aafd:DB_3-3376ab0f#1".
3. In the
Coordinator Database, when the administrator receives the transaction
status query he can examine SYSGLBTRANX to determine the transaction
status. If the STATE is 2 (COMMIT) or 3 (PENDCOM), he may reply `commit'.
If the STATE is 4 (PENDABO), he may reply `abort'. But if the STATE
is 1 (PREPARE), this transaction branch is pending in this site too,
and he may ask the administrator of the parent site to determine its
status.
4. In Participant
Database, the administrator uses Server Manager to perform a heuristic
commit or abort the pending transaction based on the reply.
If the
administrator initiates a heuristic end transaction on a pending transaction
which is different from action taken in the Coordinator Database, the
distributed data will be inconsistent.
  
|