CREATE TEXT INDEX
Creates
a new text index on a column in an existing table.
SYNTAX

| text_index_name |
Name
of the text index you want to create. |
| table_name |
Name
of the table you want to create the text index on. |
| column_name |
Name
of the column you want to create the index on. |
| number |
Value
to use for the TEXT BLOCK SIZE, BASIC BIT LENGTH, EXTENDED BIT LENGTH,
or CLUSTER WIDTH parameters. |
DESCRIPTION
The CREATE
TEXT INDEX command creates a new text index on a column in an existing
table. You can use text indexes to increase the performance of full-text
queries by quickly locating specific words in columns containing text
without examining the entire table. To execute the CREATE TEXT INDEX command
on a column in a table you must be the table owner, have DBA or SYSADM
security privileges, or have the INDEX privilege on that table.
A text index
is a mechanism that provides fast access to rows in a table that contain
one or more words or phrases in columns containing text. Text indexes
contain a representation of all the text found in the text columns they
are based on, but the data is encoded and structured to make retrieval
much faster than directly from the table. Once you create a text index
on a table, its operation is transparent to users of the database; the
DBMS will use the index to improve full-text query performance whenever
possible.
DBMaker
encodes every word in a block of text into a sequence of bits using a
simple and efficient encoding algorithm. The size of the block of text
is predefined, and any text that is larger than the predefined size is
split into multiple text blocks for processing. Each word from the text
block is represented by a single bit in the bit sequence, but due to the
encoding process each bit may represent multiple words.
When you
perform a full-text query, DBMaker uses the encoding algorithm to determine
which bit in the bit sequence represents each word of the desired text.
Then DBMaker performs a simple boolean operation on each bit sequence
to determine which text blocks may contain the desired text (the text
blocks resulting from this operation may not contain the desired text,
since each bit may represent multiple words). Finally DBMaker searches
those text blocks to determine if the word actually exists in the text
block, or if the match was due to one of the other words that is represented
by the same bit.
To increase
full-text query performance even further, DBMaker uses two different algorithms
to encode the text blocks into two bit sequences, which are combined to
form a single bit sequence set. Using two bit sequences to represent each
word reduces the possibility that two words will be represented by the
same bit in both bit sequence sequences, since two words that are encoded
to the same bit using one algorithm probably will not be encoded to the
same bit using the other algorithm. The result is fewer false hits when
DBMaker searches the bit sequences to determine which text blocks may
contain the desired text.
DBMaker
groups the bit sequence sets into clusters. Clusters are arranged so that
corresponding bits from the bit sequence sets are physically grouped together
on disk, allowing DBMaker to quickly search for text blocks that contain
a particular word, while reducing disk I/O.
When you
create a text index you must specify the text index name, the name of
the table you are creating the text index on, and the name of the column
in the table you want to create the text index on. You can only create
a text index on columns defined with the CHAR, VARCHAR, LONG VARCHAR,
or FILE data types, and you can only create a text index on a single column
at a time. If you want to create a text index on more than one column
in a table, you must create a new text index for each column. You cannot
create a text index on system tables, temporary tables, or views.
The TEXT
BLOCK SIZE keyword is optional. This keyword specifies the maximum size
in words of each block of text that is encoded as a set of bit sequences.
You can specify an integer value between 20 and 32767 for the TEXT BLOCK
SIZE keyword. If the text data being encoded contains more words than
the value provided for the text block size, the text data is split into
multiple text blocks. The default value for the text block size is 250
words (or 300 characters for languages which use a single character to
represent a word, such as Chinese).
The BASIC
BIT LENGTH keyword is optional. This keyword specifies the length of the
bit sequence used for encoding words in a text block using the first encoding
algorithm. You can specify an integer value between 1 and32767 for the
BASIC BIT LENGTH keyword. The bit sequence defined using the BASIC BIT
LENGTH keyword is combined with the bit sequence defined using the EXTENDED
BIT LENGTH keyword to form a single bit sequence set. The default value
for the length of this bit sequence is 1024 bits.
The EXTENDED
BIT LENGTH keyword is optional. This keyword specifies the length of the
bit sequence used for encoding words in a text block using the second
encoding algorithm. You can specify an integer value between 1 and32767
for the EXTENDED BIT LENGTH keyword. The bit sequence defined using the
EXTENDED BIT LENGTH keyword is combined with the bit sequence defined
using the BASIC BIT LENGTH keyword to form a single bit sequence set.
The default value for the length of this bit sequence is 1024 bits.
The BASIC
BIT LENGTH and the EXTENDED BIT LENGTH are the most important parameters
for full-text query performance. Using a small value for the BASIC BIT
LENGTH and EXTENDED BIT LENGTH keywords increases the possibility that
each bit in the bit sequence will represent multiple words, while using
a large value increases the possibility that some bits in the bit sequence
will not be used. For these reasons the best size for the BASIC BIT LENGTH
and EXTENDED BIT LENGTH keywords is a trade-off between small bit sequences
that generate a larger number of false hits, and large bit sequences that
waste storage space and cause more disk activity.
The CLUSTER
WIDTH keyword is optional. This keyword specifies the size in bit sequence
sets of each physical (disk) storage unit of a text index. You can specify
an integer value between 1 and 32767 for the CLUSTER WIDTH keyword, although
this value is adjusted internally to the nearest multiple of 1024. The
default value is 4096 bit sequence sets.
When you
load data into a table, DBMaker will not update any text indexes on that
table. For this reason, you should try to load all of your data before
creating a text index on a table if possible. Rows containing matching
text that was entered into a table after the text index was created will
not be returned with the full-text search results. To include these rows
in the search results, you must rebuild the text index.
Text index
names must be unique for the table you are creating them on. Text index
names have a maximum length of eighteen characters, and may contain numbers,
letters, the underscore character, and the symbols $ and #. The first
character may not be a number.
EXAMPLES
The following
example creates a text index named TxtIdx on the Name column of the Employees
table, using the default values for all parameters.
CREATE TEXT INDEX TxtIdx ON Employees(Name)
|
The following
example creates a text index named TxtIdx on the Name column of the Employees
table, using the values shown below for the optional parameters.
CREATE TEXT INDEX TxtIdx ON Employees(Name) TEXT BLOCK SIZE 4096 BASIC BIT LENGTH 2048 EXTENDED BIT LENGTH 2048 CLUSTER WIDTH 16384
|
RELATED COMMANDS
<
CREATE TABLESPACE | Contents
| CREATE TRIGGER >
|