Contents

List of Tables

Title and Copyright Information

Send Us Your Comments

Preface

Documentation Accessibility

Related Documentation

What's New in Oracle Text?

Oracle Database 10g R1 New Features

Security Improvements

Classification and Clustering

Language Features

Document Services

1 Oracle Text SQL Statements and Operators

ALTER TABLE: Supported Partitioning Statements

2 Oracle Text Indexing Elements

2.1.1 Creating Preferences

2.2 Datastore Types

2.2.1 DIRECT_DATASTORE

2.2.1.1 DIRECT_DATASTORE CLOB Example

2.2.2 MULTI_COLUMN_DATASTORE

2.2.2.1 Indexing and DML

2.2.2.2 MULTI_COLUMN_DATASTORE Example

2.2.2.3 MULTI_COLUMN_DATASTORE Filter Example

2.2.2.4 Tagging Behavior

2.2.2.5 Indexing Columns as Sections

2.2.3 DETAIL_DATASTORE

2.2.3.1 Synchronizing Master/Detail Indexes

2.2.3.2 Example Master/Detail Tables

2.2.4 FILE_DATASTORE

2.2.4.1 PATH Attribute Limitations

2.2.4.2 FILE_DATASTORE Example

2.2.5 URL_DATASTORE

2.2.5.1 URL Syntax

2.2.5.2 URL_DATASTORE Attributes

2.2.5.3 URL_DATASTORE Example

2.2.6 USER_DATASTORE

2.2.6.1 Constraints

2.2.6.2 Editing Procedure after Indexing

2.2.6.3 USER_DATASTORE with CLOB Example

2.2.6.4 USER_DATASTORE with BLOB_LOC Example

2.2.7 NESTED_DATASTORE

2.2.7.1 NESTED_DATASTORE Example

2.3 Filter Types

2.3.1 CHARSET_FILTER

2.3.1.1 UTF-16 Big- and Little-Endian Detection

2.3.1.2 Indexing Mixed-Character Set Columns

2.3.2 INSO_FILTER

2.3.2.1 Indexing Formatted Documents

2.3.2.2 Explicitly Bypassing Plain Text or HTML in Mixed Format Columns

2.3.2.3 Character Set Conversion With Inso

2.3.3 NULL_FILTER

2.3.3.1 Indexing HTML Documents

2.3.4 MAIL_FILTER

2.3.4.1 Filter Behavior

2.3.4.2 About the Mail Filter Configuration File

2.3.5 USER_FILTER

2.3.5.1 User Filter Example

2.3.6 PROCEDURE_FILTER

2.3.6.1 Parameter Order

2.3.6.2 Procedure Filter Execute Requirements

2.3.6.3 Error Handling

2.3.6.4 Procedure Filter Preference Example

2.4 Lexer Types

2.4.1 BASIC_LEXER

2.4.1.1 Stemming User-Dictionaries

2.4.1.2 BASIC_LEXER Example

2.4.2 MULTI_LEXER

2.4.2.1 Multi-language Stoplists

2.4.2.2 MULTI_LEXER Example

2.4.2.3 Querying Multi-Language Tables

2.4.3 CHINESE_VGRAM_LEXER

2.4.3.1 Character Sets

2.4.4 CHINESE_LEXER

2.4.4.1 Customizing the Chinese Lexicon

2.4.5 JAPANESE_VGRAM_LEXER

2.4.5.1 JAPANESE_VGRAM_LEXER Attribute

2.4.5.2 JAPANESE_VGRAM_LEXER Character Sets

2.4.6 JAPANESE_LEXER

2.4.6.1 Customizing the Japanese Lexicon

2.4.6.2 JAPANESE_LEXER Attribute

2.4.6.3 JAPANESE LEXER Character Sets

2.4.6.4 Japanese Lexer Example

2.4.7 KOREAN_LEXER

2.4.7.1 KOREAN_LEXER Character Sets

2.4.7.2 KOREAN_LEXER Attributes

2.4.7.3 Limitations

2.4.8 KOREAN_MORPH_LEXER

2.4.8.1 Supplied Dictionaries

2.4.8.2 Supported Character Sets

2.4.8.3 Unicode Support

2.4.8.4 KOREAN_MORPH_LEXER Attributes

2.4.8.5 Limitations

2.4.8.6 KOREAN_MORPH_LEXER Example: Setting Composite Attribute

2.4.9 USER_LEXER

2.4.9.1 Limitations

2.4.9.2 USER_LEXER Attributes

2.4.9.3 INDEX_PROCEDURE

2.4.9.4 INPUT_TYPE

2.4.9.5 QUERY_PROCEDURE

2.4.9.6 Encoding Tokens as XML

2.4.9.7 XML Schema for No-Location, User-defined Indexing Procedure

2.4.9.8 XML Schema for User-defined Indexing Procedure with Location

2.4.9.9 XML Schema for User-defined Lexer Query Procedure

2.4.10 WORLD_LEXER

2.4.10.1 WORLD_LEXER Example

2.5 Wordlist Type

2.5.1 BASIC_WORDLIST

2.5.2 BASIC_WORDLIST Example

2.5.2.1 Enabling Fuzzy Matching and Stemming

2.5.2.2 Enabling Sub-string and Prefix Indexing

2.5.2.3 Setting Wildcard Expansion Limit

2.6 Storage Types

2.6.1 BASIC_STORAGE

2.6.1.1 Storage Default Behavior

2.6.1.2 Storage Example

2.7 Section Group Types

2.7.1 Section Group Examples

2.7.1.1 Creating Section Groups in HTML Documents

2.7.1.2 Creating Sections Groups in XML Documents

2.7.1.3 Automatic Sectioning in XML Documents

2.8 Classifier Types

2.8.1 RULE_CLASSIFIER

2.8.2 SVM_CLASSIFIER

2.9 Cluster Types

2.9.1 KMEAN_CLUSTERING

2.10.1 Multi-Language Stoplists

2.10.2 Creating Stoplists

2.10.3 Modifying the Default Stoplist

2.10.3.1 Dynamic Addition of Stopwords

2.11 System-Defined Preferences

2.11.1 Data Storage

2.11.1.1 CTXSYS.DEFAULT_DATASTORE

2.11.1.2 CTXSYS.FILE_DATASTORE

2.11.1.3 CTXSYS.URL_DATASTORE

2.11.2.1 CTXSYS.NULL_FILTER

2.11.2.2 CTXSYS.INSO_FILTER

2.11.3.1 CTXSYS.DEFAULT_LEXER

2.11.3.2 CTXSYS.BASIC_LEXER

2.11.4 Section Group

2.11.4.1 CTXSYS.NULL_SECTION_GROUP

2.11.4.2 CTXSYS.HTML_SECTION_GROUP

2.11.4.3 CTXSYS.AUTO_SECTION_GROUP

2.11.4.4 CTXSYS.PATH_SECTION_GROUP

2.11.5 Stoplist

2.11.5.1 CTXSYS.DEFAULT_STOPLIST

2.11.5.2 CTXSYS.EMPTY_STOPLIST

2.11.6.1 CTXSYS.DEFAULT_STORAGE

2.11.7 Wordlist

2.11.7.1 CTXSYS.DEFAULT_WORDLIST

2.12 System Parameters

2.12.1 General System Parameters

2.12.2 Default Index Parameters

2.12.2.1 CONTEXT Index Parameters

2.12.2.2 CTXCAT Index Parameters

2.12.2.3 CTXRULE Index Parameters

2.12.2.4 Viewing Default Values

2.12.2.5 Changing Default Values

3 Oracle Text CONTAINS Query Operators

3.1 Operator Precedence

3.1.1 Group 1 Operators

3.1.2 Group 2 Operators and Characters

3.1.3 Procedural Operators

3.1.4 Precedence Examples

3.1.5 Altering Precedence

ACCUMulate ( , )

Broader Term (BT, BTG, BTP, BTI)

EQUIValence (=)

Narrower Term (NT, NTG, NTP, NTI)

Preferred Term (PT)

Related Term (RT)

Stored Query Expression (SQE)

Translation Term (TR)

Translation Term Synonym (TRSYN)

wildcards (% _)

4 Special Characters in Oracle Text Queries

4.1 Grouping Characters

4.2 Escape Characters

4.2.1 Querying Escape Characters

4.3 Reserved Words and Characters

5 CTX_ADM Package

6 CTX_CLS Package

7 CTX_DDL Package

ADD_ATTR_SECTION

ADD_FIELD_SECTION

ADD_MDATA_SECTION

ADD_SPECIAL_SECTION

ADD_STOP_SECTION

ADD_ZONE_SECTION

CREATE_INDEX_SET

CREATE_PREFERENCE

CREATE_SECTION_GROUP

CREATE_STOPLIST

DROP_PREFERENCE

DROP_SECTION_GROUP

REMOVE_STOPCLASS

REMOVE_STOPTHEME

REMOVE_STOPWORD

REPLACE_INDEX_METADATA

UNSET_ATTRIBUTE

8 CTX_DOC Package

POLICY_HIGHLIGHT

9 CTX_OUTPUT Package

GET_TRACE_VALUE

START_QUERY_LOG

10 CTX_QUERY Package

11 CTX_REPORT

11.1 Procedures in CTX_REPORT

11.2 Using the Function Versions

DESCRIBE_POLICY

CREATE_INDEX_SCRIPT

CREATE_POLICY_SCRIPT

QUERY_LOG_SUMMARY

12 CTX_THES Package

ALTER_THESAURUS

CREATE_RELATION

CREATE_THESAURUS

CREATE_TRANSLATION

DROP_TRANSLATION

UPDATE_TRANSLATION

13 CTX_ULEXER Package

14 Oracle Text Executables

14.1 Thesaurus Loader (ctxload)

14.1.1 Text Loading

14.1.2 ctxload Syntax

14.1.2.1 Mandatory Arguments

14.1.2.2 Optional Arguments

14.1.3 ctxload Examples

14.1.3.1 Thesaurus Import Example

14.1.3.2 Thesaurus Export Example

14.2 Knowledge Base Extension Compiler (ctxkbtc)

14.2.1 Knowledge Base Character Set

14.2.2 ctxkbtc Syntax

14.2.3 ctxkbtc Usage Notes

14.2.4 ctxkbtc Limitations

14.2.5 ctxkbtc Constraints on Thesaurus Terms

14.2.6 ctxkbtc Constraints on Thesaurus Relations

14.2.7 Extending the Knowledge Base

14.2.7.1 Example for Extending the Knowledge Base

14.2.8 Adding a Language-Specific Knowledge Base

14.2.8.1 Limitations for Adding a Knowledge Base

14.2.9 Order of Precedence for Multiple Thesauri

14.2.10 Size Limits for Extended Knowledge Base

14.3 Lexical Compiler (ctxlc)

14.3.1 Syntax of ctxlc

14.3.1.1 Mandatory Arguments

14.3.1.2 Optional Arguments

14.3.2 Performance Considerations

14.3.3 ctxlc Usage Notes

15 Oracle Text Alternative Spelling

15.1 Overview of Alternative Spelling Features

15.1.1 Alternate Spelling

15.1.2 Base-Letter Conversion

15.1.2.1 Generic Versus Language-Specific Base-Letter Conversions

15.1.3 New German Spelling

15.2 Overriding Alternative Spelling Features

15.2.1 Overriding Base-Letter Transformations with Alternate Spelling

15.3 Alternative Spelling Conventions

15.3.1 German Alternate Spelling Conventions

15.3.2 Danish Alternate Spelling Conventions

15.3.3 Swedish Alternate Spelling Conventions

A Oracle Text Result Tables

A.1 CTX_QUERY Result Tables

A.1.1 EXPLAIN Table

A.1.1.1 Operation Column Values

A.1.1.2 OPTIONS Column Values

A.1.2 HFEEDBACK Table

A.1.2.1 Operation Column Values

A.1.2.2 OPTIONS Column Values

A.1.2.3 CTX_FEEDBACK_TYPE

A.2 CTX_DOC Result Tables

A.2.1 Filter Table

A.2.2 Gist Table

A.2.3 Highlight Table

A.2.4 Markup Table

A.2.5 Theme Table

A.2.6 Token Table

A.3 CTX_THES Result Tables and Data Types

A.3.1 EXP_TAB Table Type

B Oracle Text Supported Document Formats

B.1 About Document Filtering Technology

B.1.1 Latest Updates for Patch Releases

B.1.2 Supported Platforms

B.1.2.1 Supported Platforms

B.1.3 Environment Variables

B.1.4 Requirements for UNIX Platforms

B.2 Supported Document Formats

B.2.1 Word Processing Formats - Generic Text

B.2.2 Word Processing Formats - DOS

B.2.3 Word Processing Formats - Windows

B.2.4 Word Processing Formats - Macintosh

B.2.5 Spreadsheet Formats

B.2.6 Database Formats

B.2.7 Display Formats

B.2.8 Presentation Formats

B.2.9 Graphic Formats

B.2.10 Other Document Formats

B.3 Restrictions on Format Support

C Text Loading Examples for Oracle Text

C.1 SQL INSERT Example

C.2 SQL*Loader Example

C.2.1 Creating the Table

C.2.2 Issuing the SQL*Loader Command

C.2.2.1 Example Control File: loader1.dat

C.2.2.2 Example Data File: loader2.dat

C.3 Structure of ctxload Thesaurus Import File

C.3.1 Alternate Hierarchy Structure

C.3.2 Usage Notes for Terms in Import Files

C.3.3 Usage Notes for Relationships in Import Files

C.3.4 Examples of Import Files

C.3.4.1 Example 1 (Flat Structure)

C.3.4.2 Example 2 (Hierarchical)

C.3.4.3 Example 3

D Oracle Text Multilingual Features

D.1 Introduction

D.2.1 Index Types

D.2.1.1 CONTEXT Index Type

D.2.1.2 CTXCAT Index Type

D.2.1.3 CTXRULE Index Type

D.2.2 Lexer Types

D.2.3 Basic Lexer Features

D.2.3.1 Theme Indexing

D.2.3.2 Alternate Spelling

D.2.3.3 Base Letter Conversion

D.2.3.4 Composite

D.2.3.5 Index stems

D.2.4 Multi Lexer Features

D.2.5 World Lexer Features

D.3.1 ABOUT Operator

D.3.2 Fuzzy Operator

D.3.3 Stem Operator

D.4 Supplied Stop Lists

D.5 Knowledge Base

D.5.1 Knowledge Base Extension

D.6 Multi-Lingual Features Matrix

E Oracle Text Supplied Stoplists

E.1 English Default Stoplist

E.2 Chinese Stoplist (Traditional)

E.3 Chinese Stoplist (Simplified)

E.4 Danish (dk) Default Stoplist

E.5 Dutch (nl) Default Stoplist

E.6 Finnish (sf) Default Stoplist

E.7 French (f) Default Stoplist

E.8 German (d) Default Stoplist

E.9 Italian (i) Default Stoplist

E.10 Portuguese (pt) Default Stoplist

E.11 Spanish (e) Default Stoplist

E.12 Swedish (s) Default Stoplist

F The Oracle Text Scoring Algorithm

F.1 Scoring Algorithm for Word Queries

F.1.2 DML and Scoring

G Oracle Text Views

G.1 CTX_CLASSES

G.2 CTX_INDEXES

G.3 CTX_INDEX_ERRORS

G.4 CTX_INDEX_OBJECTS

G.5 CTX_INDEX_PARTITIONS

G.6 CTX_INDEX_SETS

G.7 CTX_INDEX_SET_INDEXES

G.8 CTX_INDEX_SUB_LEXERS

G.9 CTX_INDEX_SUB_LEXER_VALUES

G.10 CTX_INDEX_VALUES

G.11 CTX_OBJECTS

G.12 CTX_OBJECT_ATTRIBUTES

G.13 CTX_OBJECT_ATTRIBUTE_LOV

G.14 CTX_PARAMETERS

G.15 CTX_PENDING

G.16 CTX_PREFERENCES

G.17 CTX_PREFERENCE_VALUES

G.18 CTX_SECTIONS

G.19 CTX_SECTION_GROUPS

G.21 CTX_STOPLISTS

G.22 CTX_STOPWORDS

G.23 CTX_SUB_LEXERS

G.24 CTX_THESAURI

G.25 CTX_THES_PHRASES

G.26 CTX_TRACE_VALUES

G.27 CTX_USER_INDEXES

G.28 CTX_USER_INDEX_ERRORS

G.29 CTX_USER_INDEX_OBJECTS

G.30 CTX_USER_INDEX_PARTITIONS

G.31 CTX_USER_INDEX_SETS

G.32 CTX_USER_INDEX_SET_INDEXES

G.33 CTX_USER_INDEX_SUB_LEXERS

G.34 CTX_USER_INDEX_SUB_LEXER_VALS

G.35 CTX_USER_INDEX_VALUES

G.36 CTX_USER_PENDING

G.37 CTX_USER_PREFERENCES

G.38 CTX_USER_PREFERENCE_VALUES

G.39 CTX_USER_SECTIONS

G.40 CTX_USER_SECTION_GROUPS

G.41 CTX_USER_SQES

G.42 CTX_USER_STOPLISTS

G.43 CTX_USER_STOPWORDS

G.44 CTX_USER_SUB_LEXERS

G.45 CTX_USER_THESAURI

G.46 CTX_USER_THES_PHRASES

G.47 CTX_VERSION

H Stopword Transformations in Oracle Text

H.1 Understanding Stopword Transformations

H.1.1 Word Transformations

H.1.2 AND Transformations

H.1.3 OR Transformations

H.1.4 ACCUMulate Transformations

H.1.5 MINUS Transformations

H.1.6 NOT Transformations

H.1.7 EQUIValence Transformations

H.1.8 NEAR Transformations

H.1.9 Weight Transformations

H.1.10 Threshold Transformations

H.1.11 WITHIN Transformations

Index