CREATE TEXT INDEX

Creates a new text index on a column in an existing table.

SYNTAX

text_index_name Name of the text index you want to create.
table_name Name of the table you want to create the text index on.
column_name Name of the column you want to create the index on.
number Value to use for the TEXT BLOCK SIZE, BASIC BIT LENGTH, EXTENDED BIT LENGTH, or CLUSTER WIDTH parameters.

DESCRIPTION

The CREATE TEXT INDEX command creates a new text index on a column in an existing table. You can use text indexes to increase the performance of full-text queries by quickly locating specific words in columns containing text without examining the entire table. To execute the CREATE TEXT INDEX command on a column in a table you must be the table owner, have DBA or SYSADM security privileges, or have the INDEX privilege on that table.

A text index is a mechanism that provides fast access to rows in a table that contain one or more words or phrases in columns containing text. Text indexes contain a representation of all the text found in the text columns they are based on, but the data is encoded and structured to make retrieval much faster than directly from the table. Once you create a text index on a table, its operation is transparent to users of the database; the DBMS will use the index to improve full-text query performance whenever possible.

DBMaker encodes every word in a block of text into a sequence of bits using a simple and efficient encoding algorithm. The size of the block of text is predefined, and any text that is larger than the predefined size is split into multiple text blocks for processing. Each word from the text block is represented by a single bit in the bit sequence, but due to the encoding process each bit may represent multiple words.

When you perform a full-text query, DBMaker uses the encoding algorithm to determine which bit in the bit sequence represents each word of the desired text. Then DBMaker performs a simple boolean operation on each bit sequence to determine which text blocks may contain the desired text (the text blocks resulting from this operation may not contain the desired text, since each bit may represent multiple words). Finally DBMaker searches those text blocks to determine if the word actually exists in the text block, or if the match was due to one of the other words that is represented by the same bit.

To increase full-text query performance even further, DBMaker uses two different algorithms to encode the text blocks into two bit sequences, which are combined to form a single bit sequence set. Using two bit sequences to represent each word reduces the possibility that two words will be represented by the same bit in both bit sequence sequences, since two words that are encoded to the same bit using one algorithm probably will not be encoded to the same bit using the other algorithm. The result is fewer false hits when DBMaker searches the bit sequences to determine which text blocks may contain the desired text.

DBMaker groups the bit sequence sets into clusters. Clusters are arranged so that corresponding bits from the bit sequence sets are physically grouped together on disk, allowing DBMaker to quickly search for text blocks that contain a particular word, while reducing disk I/O.

When you create a text index you must specify the text index name, the name of the table you are creating the text index on, and the name of the column in the table you want to create the text index on. You can only create a text index on columns defined with the CHAR, VARCHAR, LONG VARCHAR, or FILE data types, and you can only create a text index on a single column at a time. If you want to create a text index on more than one column in a table, you must create a new text index for each column. You cannot create a text index on system tables, temporary tables, or views.

The TEXT BLOCK SIZE keyword is optional. This keyword specifies the maximum size in words of each block of text that is encoded as a set of bit sequences. You can specify an integer value between 20 and 32767 for the TEXT BLOCK SIZE keyword. If the text data being encoded contains more words than the value provided for the text block size, the text data is split into multiple text blocks. The default value for the text block size is 250 words (or 300 characters for languages which use a single character to represent a word, such as Chinese).

The BASIC BIT LENGTH keyword is optional. This keyword specifies the length of the bit sequence used for encoding words in a text block using the first encoding algorithm. You can specify an integer value between 1 and32767 for the BASIC BIT LENGTH keyword. The bit sequence defined using the BASIC BIT LENGTH keyword is combined with the bit sequence defined using the EXTENDED BIT LENGTH keyword to form a single bit sequence set. The default value for the length of this bit sequence is 1024 bits.

The EXTENDED BIT LENGTH keyword is optional. This keyword specifies the length of the bit sequence used for encoding words in a text block using the second encoding algorithm. You can specify an integer value between 1 and32767 for the EXTENDED BIT LENGTH keyword. The bit sequence defined using the EXTENDED BIT LENGTH keyword is combined with the bit sequence defined using the BASIC BIT LENGTH keyword to form a single bit sequence set. The default value for the length of this bit sequence is 1024 bits.

The BASIC BIT LENGTH and the EXTENDED BIT LENGTH are the most important parameters for full-text query performance. Using a small value for the BASIC BIT LENGTH and EXTENDED BIT LENGTH keywords increases the possibility that each bit in the bit sequence will represent multiple words, while using a large value increases the possibility that some bits in the bit sequence will not be used. For these reasons the best size for the BASIC BIT LENGTH and EXTENDED BIT LENGTH keywords is a trade-off between small bit sequences that generate a larger number of false hits, and large bit sequences that waste storage space and cause more disk activity.

The CLUSTER WIDTH keyword is optional. This keyword specifies the size in bit sequence sets of each physical (disk) storage unit of a text index. You can specify an integer value between 1 and 32767 for the CLUSTER WIDTH keyword, although this value is adjusted internally to the nearest multiple of 1024. The default value is 4096 bit sequence sets.

When you load data into a table, DBMaker will not update any text indexes on that table. For this reason, you should try to load all of your data before creating a text index on a table if possible. Rows containing matching text that was entered into a table after the text index was created will not be returned with the full-text search results. To include these rows in the search results, you must rebuild the text index.

Text index names must be unique for the table you are creating them on. Text index names have a maximum length of eighteen characters, and may contain numbers, letters, the underscore character, and the symbols $ and #. The first character may not be a number.

EXAMPLES

The following example creates a text index named TxtIdx on the Name column of the Employees table, using the default values for all parameters.

CREATE TEXT INDEX TxtIdx ON Employees(Name)

The following example creates a text index named TxtIdx on the Name column of the Employees table, using the values shown below for the optional parameters.

CREATE TEXT INDEX TxtIdx ON Employees(Name) TEXT BLOCK SIZE 4096
BASIC BIT LENGTH 2048
EXTENDED BIT LENGTH 2048
CLUSTER WIDTH 16384

RELATED COMMANDS

< CREATE TABLESPACE | Contents | CREATE TRIGGER >

Copyright 2002 SYSCOM Computer Engineering Co. All rights reserved.