Skip Headers
Oracle® Text Reference
10
g
Release 1 (10.1)
Part Number B10730-02
Home
Book List
Index
Master Index
Feedback
Next
View PDF
Contents
List of Tables
Title and Copyright Information
Send Us Your Comments
Preface
Audience
Documentation Accessibility
Structure
Related Documentation
Conventions
What's New in Oracle Text?
Oracle Database 10
g
R1 New Features
Security Improvements
Classification and Clustering
Indexing
Language Features
Querying
Document Services
1
Oracle Text SQL Statements and Operators
ALTER INDEX
ALTER TABLE: Supported Partitioning Statements
CATSEARCH
CONTAINS
CREATE INDEX
DROP INDEX
MATCHES
MATCH_SCORE
SCORE
2
Oracle Text Indexing Elements
2.1
Overview
2.1.1
Creating Preferences
2.2
Datastore Types
2.2.1
DIRECT_DATASTORE
2.2.1.1
DIRECT_DATASTORE CLOB Example
2.2.2
MULTI_COLUMN_DATASTORE
2.2.2.1
Indexing and DML
2.2.2.2
MULTI_COLUMN_DATASTORE Example
2.2.2.3
MULTI_COLUMN_DATASTORE Filter Example
2.2.2.4
Tagging Behavior
2.2.2.5
Indexing Columns as Sections
2.2.3
DETAIL_DATASTORE
2.2.3.1
Synchronizing Master/Detail Indexes
2.2.3.2
Example Master/Detail Tables
2.2.4
FILE_DATASTORE
2.2.4.1
PATH Attribute Limitations
2.2.4.2
FILE_DATASTORE Example
2.2.5
URL_DATASTORE
2.2.5.1
URL Syntax
2.2.5.2
URL_DATASTORE Attributes
2.2.5.3
URL_DATASTORE Example
2.2.6
USER_DATASTORE
2.2.6.1
Constraints
2.2.6.2
Editing Procedure after Indexing
2.2.6.3
USER_DATASTORE with CLOB Example
2.2.6.4
USER_DATASTORE with BLOB_LOC Example
2.2.7
NESTED_DATASTORE
2.2.7.1
NESTED_DATASTORE Example
2.3
Filter Types
2.3.1
CHARSET_FILTER
2.3.1.1
UTF-16 Big- and Little-Endian Detection
2.3.1.2
Indexing Mixed-Character Set Columns
2.3.2
INSO_FILTER
2.3.2.1
Indexing Formatted Documents
2.3.2.2
Explicitly Bypassing Plain Text or HTML in Mixed Format Columns
2.3.2.3
Character Set Conversion With Inso
2.3.3
NULL_FILTER
2.3.3.1
Indexing HTML Documents
2.3.4
MAIL_FILTER
2.3.4.1
Filter Behavior
2.3.4.2
About the Mail Filter Configuration File
2.3.5
USER_FILTER
2.3.5.1
User Filter Example
2.3.6
PROCEDURE_FILTER
2.3.6.1
Parameter Order
2.3.6.2
Procedure Filter Execute Requirements
2.3.6.3
Error Handling
2.3.6.4
Procedure Filter Preference Example
2.4
Lexer Types
2.4.1
BASIC_LEXER
2.4.1.1
Stemming User-Dictionaries
2.4.1.2
BASIC_LEXER Example
2.4.2
MULTI_LEXER
2.4.2.1
Multi-language Stoplists
2.4.2.2
MULTI_LEXER Example
2.4.2.3
Querying Multi-Language Tables
2.4.3
CHINESE_VGRAM_LEXER
2.4.3.1
Character Sets
2.4.4
CHINESE_LEXER
2.4.4.1
Customizing the Chinese Lexicon
2.4.5
JAPANESE_VGRAM_LEXER
2.4.5.1
JAPANESE_VGRAM_LEXER Attribute
2.4.5.2
JAPANESE_VGRAM_LEXER Character Sets
2.4.6
JAPANESE_LEXER
2.4.6.1
Customizing the Japanese Lexicon
2.4.6.2
JAPANESE_LEXER Attribute
2.4.6.3
JAPANESE LEXER Character Sets
2.4.6.4
Japanese Lexer Example
2.4.7
KOREAN_LEXER
2.4.7.1
KOREAN_LEXER Character Sets
2.4.7.2
KOREAN_LEXER Attributes
2.4.7.3
Limitations
2.4.8
KOREAN_MORPH_LEXER
2.4.8.1
Supplied Dictionaries
2.4.8.2
Supported Character Sets
2.4.8.3
Unicode Support
2.4.8.4
KOREAN_MORPH_LEXER Attributes
2.4.8.5
Limitations
2.4.8.6
KOREAN_MORPH_LEXER Example: Setting Composite Attribute
2.4.9
USER_LEXER
2.4.9.1
Limitations
2.4.9.2
USER_LEXER Attributes
2.4.9.3
INDEX_PROCEDURE
2.4.9.4
INPUT_TYPE
2.4.9.5
QUERY_PROCEDURE
2.4.9.6
Encoding Tokens as XML
2.4.9.7
XML Schema for No-Location, User-defined Indexing Procedure
2.4.9.8
XML Schema for User-defined Indexing Procedure with Location
2.4.9.9
XML Schema for User-defined Lexer Query Procedure
2.4.10
WORLD_LEXER
2.4.10.1
WORLD_LEXER Example
2.5
Wordlist Type
2.5.1
BASIC_WORDLIST
2.5.2
BASIC_WORDLIST Example
2.5.2.1
Enabling Fuzzy Matching and Stemming
2.5.2.2
Enabling Sub-string and Prefix Indexing
2.5.2.3
Setting Wildcard Expansion Limit
2.6
Storage Types
2.6.1
BASIC_STORAGE
2.6.1.1
Storage Default Behavior
2.6.1.2
Storage Example
2.7
Section Group Types
2.7.1
Section Group Examples
2.7.1.1
Creating Section Groups in HTML Documents
2.7.1.2
Creating Sections Groups in XML Documents
2.7.1.3
Automatic Sectioning in XML Documents
2.8
Classifier Types
2.8.1
RULE_CLASSIFIER
2.8.2
SVM_CLASSIFIER
2.9
Cluster Types
2.9.1
KMEAN_CLUSTERING
2.10
Stoplists
2.10.1
Multi-Language Stoplists
2.10.2
Creating Stoplists
2.10.3
Modifying the Default Stoplist
2.10.3.1
Dynamic Addition of Stopwords
2.11
System-Defined Preferences
2.11.1
Data Storage
2.11.1.1
CTXSYS.DEFAULT_DATASTORE
2.11.1.2
CTXSYS.FILE_DATASTORE
2.11.1.3
CTXSYS.URL_DATASTORE
2.11.2
Filter
2.11.2.1
CTXSYS.NULL_FILTER
2.11.2.2
CTXSYS.INSO_FILTER
2.11.3
Lexer
2.11.3.1
CTXSYS.DEFAULT_LEXER
2.11.3.2
CTXSYS.BASIC_LEXER
2.11.4
Section Group
2.11.4.1
CTXSYS.NULL_SECTION_GROUP
2.11.4.2
CTXSYS.HTML_SECTION_GROUP
2.11.4.3
CTXSYS.AUTO_SECTION_GROUP
2.11.4.4
CTXSYS.PATH_SECTION_GROUP
2.11.5
Stoplist
2.11.5.1
CTXSYS.DEFAULT_STOPLIST
2.11.5.2
CTXSYS.EMPTY_STOPLIST
2.11.6
Storage
2.11.6.1
CTXSYS.DEFAULT_STORAGE
2.11.7
Wordlist
2.11.7.1
CTXSYS.DEFAULT_WORDLIST
2.12
System Parameters
2.12.1
General System Parameters
2.12.2
Default Index Parameters
2.12.2.1
CONTEXT Index Parameters
2.12.2.2
CTXCAT Index Parameters
2.12.2.3
CTXRULE Index Parameters
2.12.2.4
Viewing Default Values
2.12.2.5
Changing Default Values
3
Oracle Text CONTAINS Query Operators
3.1
Operator Precedence
3.1.1
Group 1 Operators
3.1.2
Group 2 Operators and Characters
3.1.3
Procedural Operators
3.1.4
Precedence Examples
3.1.5
Altering Precedence
ABOUT
ACCUMulate ( , )
AND (&)
Broader Term (BT, BTG, BTP, BTI)
EQUIValence (=)
Fuzzy
HASPATH
INPATH
MDATA
MINUS (-)
Narrower Term (NT, NTG, NTP, NTI)
NEAR (;)
NOT (~)
OR (|)
Preferred Term (PT)
Related Term (RT)
soundex (!)
stem ($)
Stored Query Expression (SQE)
SYNonym (SYN)
threshold (>)
Translation Term (TR)
Translation Term Synonym (TRSYN)
Top Term (TT)
weight (*)
wildcards (% _)
WITHIN
4
Special Characters in Oracle Text Queries
4.1
Grouping Characters
4.2
Escape Characters
4.2.1
Querying Escape Characters
4.3
Reserved Words and Characters
5
CTX_ADM Package
RECOVER
SET_PARAMETER
6
CTX_CLS Package
TRAIN
CLUSTERING
7
CTX_DDL Package
ADD_ATTR_SECTION
ADD_FIELD_SECTION
ADD_INDEX
ADD_MDATA
ADD_MDATA_SECTION
ADD_SPECIAL_SECTION
ADD_STOPCLASS
ADD_STOP_SECTION
ADD_STOPTHEME
ADD_STOPWORD
ADD_SUB_LEXER
ADD_ZONE_SECTION
COPY_POLICY
CREATE_INDEX_SET
CREATE_POLICY
CREATE_PREFERENCE
CREATE_SECTION_GROUP
CREATE_STOPLIST
DROP_INDEX_SET
DROP_POLICY
DROP_PREFERENCE
DROP_SECTION_GROUP
DROP_STOPLIST
OPTIMIZE_INDEX
REMOVE_INDEX
REMOVE_MDATA
REMOVE_SECTION
REMOVE_STOPCLASS
REMOVE_STOPTHEME
REMOVE_STOPWORD
REPLACE_INDEX_METADATA
SET_ATTRIBUTE
SYNC_INDEX
UNSET_ATTRIBUTE
UPDATE_POLICY
8
CTX_DOC Package
FILTER
GIST
HIGHLIGHT
IFILTER
MARKUP
PKENCODE
POLICY_FILTER
POLICY_GIST
POLICY_HIGHLIGHT
POLICY_MARKUP
POLICY_THEMES
POLICY_TOKENS
SET_KEY_TYPE
THEMES
TOKENS
9
CTX_OUTPUT Package
ADD_EVENT
ADD_TRACE
END_LOG
END_QUERY_LOG
GET_TRACE_VALUE
LOG_TRACES
LOGFILENAME
REMOVE_EVENT
REMOVE_TRACE
RESET_TRACE
START_LOG
START_QUERY_LOG
10
CTX_QUERY Package
BROWSE_WORDS
COUNT_HITS
EXPLAIN
HFEEDBACK
REMOVE_SQE
STORE_SQE
11
CTX_REPORT
11.1
Procedures in CTX_REPORT
11.2
Using the Function Versions
DESCRIBE_INDEX
DESCRIBE_POLICY
CREATE_INDEX_SCRIPT
CREATE_POLICY_SCRIPT
INDEX_SIZE
INDEX_STATS
QUERY_LOG_SUMMARY
TOKEN_INFO
TOKEN_TYPE
12
CTX_THES Package
ALTER_PHRASE
ALTER_THESAURUS
BT
BTG
BTI
BTP
CREATE_PHRASE
CREATE_RELATION
CREATE_THESAURUS
CREATE_TRANSLATION
DROP_PHRASE
DROP_RELATION
DROP_THESAURUS
DROP_TRANSLATION
HAS_RELATION
NT
NTG
NTI
NTP
OUTPUT_STYLE
PT
RT
SN
SYN
THES_TT
TR
TRSYN
TT
UPDATE_TRANSLATION
13
CTX_ULEXER Package
WILDCARD_TAB
14
Oracle Text Executables
14.1
Thesaurus Loader (ctxload)
14.1.1
Text Loading
14.1.2
ctxload Syntax
14.1.2.1
Mandatory Arguments
14.1.2.2
Optional Arguments
14.1.3
ctxload Examples
14.1.3.1
Thesaurus Import Example
14.1.3.2
Thesaurus Export Example
14.2
Knowledge Base Extension Compiler (ctxkbtc)
14.2.1
Knowledge Base Character Set
14.2.2
ctxkbtc Syntax
14.2.3
ctxkbtc Usage Notes
14.2.4
ctxkbtc Limitations
14.2.5
ctxkbtc Constraints on Thesaurus Terms
14.2.6
ctxkbtc Constraints on Thesaurus Relations
14.2.7
Extending the Knowledge Base
14.2.7.1
Example for Extending the Knowledge Base
14.2.8
Adding a Language-Specific Knowledge Base
14.2.8.1
Limitations for Adding a Knowledge Base
14.2.9
Order of Precedence for Multiple Thesauri
14.2.10
Size Limits for Extended Knowledge Base
14.3
Lexical Compiler (ctxlc)
14.3.1
Syntax of ctxlc
14.3.1.1
Mandatory Arguments
14.3.1.2
Optional Arguments
14.3.2
Performance Considerations
14.3.3
ctxlc Usage Notes
14.3.4
Example
15
Oracle Text Alternative Spelling
15.1
Overview of Alternative Spelling Features
15.1.1
Alternate Spelling
15.1.2
Base-Letter Conversion
15.1.2.1
Generic Versus Language-Specific Base-Letter Conversions
15.1.3
New German Spelling
15.2
Overriding Alternative Spelling Features
15.2.1
Overriding Base-Letter Transformations with Alternate Spelling
15.3
Alternative Spelling Conventions
15.3.1
German Alternate Spelling Conventions
15.3.2
Danish Alternate Spelling Conventions
15.3.3
Swedish Alternate Spelling Conventions
A
Oracle Text Result Tables
A.1
CTX_QUERY Result Tables
A.1.1
EXPLAIN Table
A.1.1.1
Operation Column Values
A.1.1.2
OPTIONS Column Values
A.1.2
HFEEDBACK Table
A.1.2.1
Operation Column Values
A.1.2.2
OPTIONS Column Values
A.1.2.3
CTX_FEEDBACK_TYPE
A.2
CTX_DOC Result Tables
A.2.1
Filter Table
A.2.2
Gist Table
A.2.3
Highlight Table
A.2.4
Markup Table
A.2.5
Theme Table
A.2.6
Token Table
A.3
CTX_THES Result Tables and Data Types
A.3.1
EXP_TAB Table Type
B
Oracle Text Supported Document Formats
B.1
About Document Filtering Technology
B.1.1
Latest Updates for Patch Releases
B.1.2
Supported Platforms
B.1.2.1
Supported Platforms
B.1.3
Environment Variables
B.1.4
Requirements for UNIX Platforms
B.2
Supported Document Formats
B.2.1
Word Processing Formats - Generic Text
B.2.2
Word Processing Formats - DOS
B.2.3
Word Processing Formats - Windows
B.2.4
Word Processing Formats - Macintosh
B.2.5
Spreadsheet Formats
B.2.6
Database Formats
B.2.7
Display Formats
B.2.8
Presentation Formats
B.2.9
Graphic Formats
B.2.10
Other Document Formats
B.3
Restrictions on Format Support
C
Text Loading Examples for Oracle Text
C.1
SQL INSERT Example
C.2
SQL*Loader Example
C.2.1
Creating the Table
C.2.2
Issuing the SQL*Loader Command
C.2.2.1
Example Control File:
loader1.dat
C.2.2.2
Example Data File:
loader2.dat
C.3
Structure of ctxload Thesaurus Import File
C.3.1
Alternate Hierarchy Structure
C.3.2
Usage Notes for Terms in Import Files
C.3.3
Usage Notes for Relationships in Import Files
C.3.4
Examples of Import Files
C.3.4.1
Example 1 (Flat Structure)
C.3.4.2
Example 2 (Hierarchical)
C.3.4.3
Example 3
D
Oracle Text Multilingual Features
D.1
Introduction
D.2
Indexing
D.2.1
Index Types
D.2.1.1
CONTEXT Index Type
D.2.1.2
CTXCAT Index Type
D.2.1.3
CTXRULE Index Type
D.2.2
Lexer Types
D.2.3
Basic Lexer Features
D.2.3.1
Theme Indexing
D.2.3.2
Alternate Spelling
D.2.3.3
Base Letter Conversion
D.2.3.4
Composite
D.2.3.5
Index stems
D.2.4
Multi Lexer Features
D.2.5
World Lexer Features
D.3
Querying
D.3.1
ABOUT Operator
D.3.2
Fuzzy Operator
D.3.3
Stem Operator
D.4
Supplied Stop Lists
D.5
Knowledge Base
D.5.1
Knowledge Base Extension
D.6
Multi-Lingual Features Matrix
E
Oracle Text Supplied Stoplists
E.1
English Default Stoplist
E.2
Chinese Stoplist (Traditional)
E.3
Chinese Stoplist (Simplified)
E.4
Danish (dk) Default Stoplist
E.5
Dutch (nl) Default Stoplist
E.6
Finnish (sf) Default Stoplist
E.7
French (f) Default Stoplist
E.8
German (d) Default Stoplist
E.9
Italian (i) Default Stoplist
E.10
Portuguese (pt) Default Stoplist
E.11
Spanish (e) Default Stoplist
E.12
Swedish (s) Default Stoplist
F
The Oracle Text Scoring Algorithm
F.1
Scoring Algorithm for Word Queries
F.1.1
Example
F.1.2
DML and Scoring
G
Oracle Text Views
G.1
CTX_CLASSES
G.2
CTX_INDEXES
G.3
CTX_INDEX_ERRORS
G.4
CTX_INDEX_OBJECTS
G.5
CTX_INDEX_PARTITIONS
G.6
CTX_INDEX_SETS
G.7
CTX_INDEX_SET_INDEXES
G.8
CTX_INDEX_SUB_LEXERS
G.9
CTX_INDEX_SUB_LEXER_VALUES
G.10
CTX_INDEX_VALUES
G.11
CTX_OBJECTS
G.12
CTX_OBJECT_ATTRIBUTES
G.13
CTX_OBJECT_ATTRIBUTE_LOV
G.14
CTX_PARAMETERS
G.15
CTX_PENDING
G.16
CTX_PREFERENCES
G.17
CTX_PREFERENCE_VALUES
G.18
CTX_SECTIONS
G.19
CTX_SECTION_GROUPS
G.20
CTX_SQES
G.21
CTX_STOPLISTS
G.22
CTX_STOPWORDS
G.23
CTX_SUB_LEXERS
G.24
CTX_THESAURI
G.25
CTX_THES_PHRASES
G.26
CTX_TRACE_VALUES
G.27
CTX_USER_INDEXES
G.28
CTX_USER_INDEX_ERRORS
G.29
CTX_USER_INDEX_OBJECTS
G.30
CTX_USER_INDEX_PARTITIONS
G.31
CTX_USER_INDEX_SETS
G.32
CTX_USER_INDEX_SET_INDEXES
G.33
CTX_USER_INDEX_SUB_LEXERS
G.34
CTX_USER_INDEX_SUB_LEXER_VALS
G.35
CTX_USER_INDEX_VALUES
G.36
CTX_USER_PENDING
G.37
CTX_USER_PREFERENCES
G.38
CTX_USER_PREFERENCE_VALUES
G.39
CTX_USER_SECTIONS
G.40
CTX_USER_SECTION_GROUPS
G.41
CTX_USER_SQES
G.42
CTX_USER_STOPLISTS
G.43
CTX_USER_STOPWORDS
G.44
CTX_USER_SUB_LEXERS
G.45
CTX_USER_THESAURI
G.46
CTX_USER_THES_PHRASES
G.47
CTX_VERSION
H
Stopword Transformations in Oracle Text
H.1
Understanding Stopword Transformations
H.1.1
Word Transformations
H.1.2
AND Transformations
H.1.3
OR Transformations
H.1.4
ACCUMulate Transformations
H.1.5
MINUS Transformations
H.1.6
NOT Transformations
H.1.7
EQUIValence Transformations
H.1.8
NEAR Transformations
H.1.9
Weight Transformations
H.1.10
Threshold Transformations
H.1.11
WITHIN Transformations
Index