Oracle® Database Utilities 10g Release 1 (10.1) Part Number B10825-01 |
|
|
View PDF |
This chapter describes the access parameters for the default external tables access driver, ORACLE_LOADER
. You specify these access parameters when you create the external table.
To use the information in this chapter, you must have some knowledge of the file format and record format (including character sets and field datatypes) of the datafiles on your platform. You must also know enough about SQL to be able to create an external table and perform queries against it.
The following topics are discussed in this chapter:
You may find it helpful to use the EXTERNAL_TABLE
=GENERATE_ONLY
parameter in SQL*Loader to get the proper access parameters for a given SQL*Loader control file. When you specify GENERATE_ONLY
, all the SQL statements needed to do the load using external tables, as described in the control file, are placed in the SQL*Loader log file. These SQL statements can be edited and customized. The actual load can be done later without the use of SQL*Loader by executing these statements in SQL*Plus.
Notes:
|
The access parameters clause contains comments, record formatting, and field formatting information.
The description of the data in the data source is separate from the definition of the external table. This means that:
The source file can contain more or fewer fields than there are columns in the external table
The datatypes for fields in the data source can be different from the columns in the external table
As stated earlier, the access driver ensures that data from the data source is processed so that it matches the definition of the external table.
The syntax for the access_parameters
clause is as follows:
Comments are lines that begin with two hyphens followed by text. Comments must be placed before any access parameters, for example:
--This is a comment. --This is another comment. RECORDS DELIMITED BY NEWLINE
All text to the right of the double hyphen is ignored, until the end of the line.
The record_format_info
clause is an optional clause that contains information about the record, such as its format, the character set of the data, and what rules are used to exclude records from being loaded. For a full description of the syntax, see record_format_info Clause.
The field_definitions
clause is used to describe the fields in the datafile. If a datafile field has the same name as a column in the external table, then the data from the field is used for that column. For a full description of the syntax, see field_definitions Clause.
The column_transforms
clause is an optional clause used to describe how to load columns in the external table that do not map directly to columns in the datafile. This is done using the following transforms: NULL
, CONSTANT
, CONCAT
, and LOBFILE
. For a full description of the syntax, see column_transforms Clause.
The record_format_info
clause contains information about the record, such as its format, the character set of the data, and what rules are used to exclude records from being loaded. The record_format_info
clause is optional. If the clause is not specified, the default value is RECORDS
DELIMITED
BY
NEWLINE
. The syntax for the record_format_info
clause is as follows:
The FIXED
clause is used to identify the records as all having a fixed size of length bytes. The size specified for FIXED
records must include any record termination characters, such as newlines. Compared to other record types, fixed-length fields in fixed-length records are the easiest field and record formats for the access driver to process.
The following is an example of using FIXED
records. It assumes there is a 1-byte newline character at the end of each record in the datafile. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (RECORDS FIXED 20 FIELDS (first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4))) LOCATION ('info.dat')); Alvin Tolliver1976 KennethBaer 1963 Mary Dube 1973
The VARIABLE
clause is used to indicate that the records have a variable length and that each record is preceded by a character string containing a number with the count of bytes for the record. The length of the character string containing the count field is the size argument that follows the VARIABLE
parameter. Note that size indicates a count of bytes, not characters. The count at the beginning of the record must include any record termination characters, but it does not include the size of the count field itself. The number of bytes in the record termination characters can vary depending on how the file is created and on what platform it is created.
The following is an example of using VARIABLE
records. It assumes there is a 1-byte newline character at the end of each record in the datafile. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (RECORDS VARIABLE 2 FIELDS TERMINATED BY ',' (first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4))) LOCATION ('info.dat')); 21Alvin,Tolliver,1976, 19Kenneth,Baer,1963, 16Mary,Dube,1973,
The DELIMITED BY
clause is used to indicate the characters that identify the end of a record.
If DELIMITED BY NEWLINE
is specified, then the actual value used is platform-specific. On UNIX platforms, NEWLINE
is assumed to be "\n
". On Windows NT, NEWLINE
is assumed to be "\r\n
".
If DELIMITED
BY
string
is specified, string
can either be text or a series of hexadecimal digits. If it is text, then the text is converted to the character set of the datafile and the result is used for identifying record boundaries. See string.
If the following conditions are true, then you must use hexadecimal digits to identify the delimiter:
The character set of the access parameters is different from the character set of the datafile.
Some characters in the delimiter string cannot be translated into the character set of the datafile.
The hexadecimal digits are converted into bytes, and there is no character set translation performed on the hexadecimal string.
If the end of the file is found before the record terminator, the access driver proceeds as if a terminator was found, and all unprocessed data up to the end of the file is considered part of the record.
Caution: Do not include any binary data, including binary counts forVARCHAR and VARRAW , in a record that has delimiters. Doing so could cause errors or corruption, because the binary data will be interpreted as characters during the search for the delimiter. |
The following is an example of using DELIMITED BY
records.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (RECORDS DELIMITED BY '|' FIELDS TERMINATED BY ',' (first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4))) LOCATION ('info.dat')); Alvin,Tolliver,1976|Kenneth,Baer,1963|Mary,Dube,1973
The CHARACTERSET
string
clause identifies the character set of the datafile. If a character set is not specified, the data is assumed to be in the default character set for the database. See string.
Note: The settings of NLS environment variables on the client have no effect on the character set used for the database. |
See Also: Oracle Database Globalization Support Guide for a listing of Oracle-supported character sets |
The DATA IS...ENDIAN
clause indicates the endianness of data whose byte order may vary depending on the platform that generated the datafile. Fields of the following types are affected by this clause:
INTEGER
UNSIGNED INTEGER
FLOAT
BINARY_FLOAT
DOUBLE
BINARY_DOUBLE
VARCHAR
(numeric count only)
VARRAW
(numeric count only)
Any character datatype in the UTF16 character set
Any string specified by RECORDS
DELIMITED
BY
string
and in the UTF16 character set
A common platform that generates little-endian data is Windows NT. Big-endian platforms include Sun Solaris and IBM MVS. If the DATA IS...ENDIAN
clause is not specified, then the data is assumed to have the same endianness as the platform where the access driver is running. UTF-16 datafiles may have a mark at the beginning of the file indicating the endianness of the data. This mark will override the DATA IS...ENDIAN
clause.
The BYTE
ORDER
MARK
clause is used to specify whether or not the datafile should be checked for the presence of a byte-order mark (BOM). This clause is meaningful only when the character set is Unicode.
BYTE
ORDER
MARK
NOCHECK
indicates that the datafile should not be checked for a BOM and that all the data in the datafile should be read as data.
BYTE
ORDER
MARK
CHECK
indicates that the datafile should be checked for a BOM. This is the default behavior for a datafile in a Unicode character set.
The following are examples of some possible scenarios:
If the data is specified as being little or big endian and CHECK
is specified and it is determined that the specified endianness does not match the datafile, then an error is returned. For example, suppose you specify the following:
DATA IS LITTLE ENDIAN BYTEORDERMARK CHECK
If the BOM is checked in the Unicode datafile and the data is actually big endian, an error is returned because you specified little endian.
If a BOM is not found and no endianness is specified with the DATA
IS...ENDIAN
parameter, then the endianness of the platform is used.
If BYTE
ORDER
MARK
NOCHECK
is specified and the DATA
IS...ENDIAN
parameter specified an endianness, then that value is used. Otherwise, the endianness of the platform is used.
The STRING
SIZES
ARE
IN
clause is used to indicate whether the lengths specified for character strings are in bytes or characters. If this clause is not specified, the access driver uses the mode that the database uses. Character types with embedded lengths (such as VARCHAR
) are also affected by this clause. If this clause is specified, the embedded lengths are a character count, not a byte count. Specifying STRING
SIZES
ARE
IN CHARACTERS
is needed only when loading multibyte character sets, such as UTF16.
The LOAD
WHEN
condition_spec
clause is used to identify the records that should be passed to the database. The evaluation method varies:
If the condition_spec
references a field in the record, the clause is evaluated only after all fields have been parsed from the record, but before any NULLIF
or DEFAULTIF
clauses have been evaluated.
If the condition specification references only ranges (and no field names), then the clause is evaluated before the fields are parsed. This is useful for cases where the records in the file that are not to be loaded cannot be parsed into the current record definition without errors.
See condition_spec.
The following are some examples of using LOAD
WHEN
:
LOAD WHEN (empid != BLANKS) LOAD WHEN ((dept_id = "SPORTING GOODS" OR dept_id = "SHOES") AND total_sales != 0)
The BADFILE
clause names the file to which records are written when they cannot be loaded because of errors. For example, a record was written to the bad file because a field in the datafile could not be converted to the datatype of a column in the external table. Records that fail the LOAD
WHEN
clause are not written to the bad file but are written to the discard file instead. Also, any errors in using a record from an external table (such as a constraint violation when using INSERT INTO...AS SELECT...
from an external table) will not cause the record to be written to the bad file.
The purpose of the bad file is to have one file where all rejected data can be examined and fixed so that it can be loaded. If you do not intend to fix the data, then you can use the NOBADFILE
option to prevent creation of a bad file, even if there are bad records.
If you specify BADFILE
, you must specify a filename or you will receive an error.
If neither BADFILE
nor NOBADFILE
is specified, the default is to create a bad file if at least one record is rejected. The name of the file will be the table name followed by _%p
, and it will have an extension of .bad
.
The DISCARDFILE
clause names the file to which records are written that fail the condition in the LOAD
WHEN
clause. The discard file is created when the first record to be discarded is encountered. If the same external table is accessed multiple times, then the discard file is rewritten each time. If there is no need to save the discarded records in a separate file, then use NODISCARDFILE
.
If you specify DISCARDFILE
, you must specify a filename or you will receive an error.
If neither DISCARDFILE
nor NODISCARDFILE
is specified, the default is to create a discard file if at least one record fails the LOAD
WHEN
clause. The name of the file will be the table name followed by _%p
and it will have an extension of .dsc
.
The LOGFILE
clause names the file that contains messages generated by the external tables utility while it was accessing data in the datafile. If a log file already exists by the same name, the access driver reopens that log file and appends new log information to the end. This is different from bad files and discard files, which overwrite any existing file. NOLOGFILE
is used to prevent creation of a log file.
If you specify LOGFILE
, you must specify a filename or you will receive an error.
If neither LOGFILE
nor NOLOGFILE
is specified, the default is to create a log file. The name of the file will be the table name followed by _%p
and it will have an extension of .log
.
Skips the specified number of records in the datafile before loading. SKIP
can be specified only when nonparallel access is being made to the data.
The READSIZE
parameter specifies the size of the read buffer. The size of the read buffer is a limit on the size of the largest record the access driver can handle. The size is specified with an integer indicating the number of bytes. The default value is 512 KB (524288 bytes). You must specify a larger value if any of the records in the datafile are larger than 512 KB. There is no limit on how large READSIZE
can be, but practically, it is limited by the largest amount of memory that can be allocated by the access driver. Also, note that multiple buffers are allocated, so the amount of memory available for allocation is also another limit.
By default, the date cache feature is enabled (for 1000 elements). To completely disable the date cache feature, set it to 0
.
DATE_CACHE
specifies the date cache size (in entries). For example, DATE_CACHE=5000
specifies that each date cache created can contain a maximum of 5000 unique date entries. Every table has its own date cache, if one is needed. A date cache is created only if at least one date or timestamp value is loaded that requires datatype conversion in order to be stored in the table.
The date cache feature is enabled by default. The default date cache size is 1000 elements. If the default size is used and the number of unique input values loaded exceeds 1000, then the date cache feature is automatically disabled for that table. However, if you override the default and specify a nonzero date cache size and that size is exceeded, then the cache is not disabled.
You can use the date cache statistics (entries, hits, and misses) contained in the log file to tune the size of the cache for future similar loads.
A string is a quoted series of characters or hexadecimal digits. If it is a series of characters, then those characters will be converted into the character set of the data file. If it is a series of hexadecimal digits, then there must be an even number of hexadecimal digits. The hexadecimal digits are converted into their binary translation, and the translation is treated as a character string in the character set of the data file. This means that once the hexadecimal digits have been converted into their binary translation, there is no other character set translation that occurs. The syntax for a string
is as follows:
The condition_spec
is an expression that evaluates to either true or false. It specifies one or more conditions that are joined by Boolean operators. The conditions and Boolean operators are evaluated from left to right. (Boolean operators are applied after the conditions are evaluated.) Parentheses can be used to override the default order of evaluation of Boolean operators. The evaluation of condition_spec
clauses slows record processing, so these clauses should be used sparingly. The syntax for condition_spec
is as follows:
Note that if the condition specification contains any conditions that reference field names, then the condition specifications are evaluated only after all fields have been found in the record and after blank trimming has been done. It is not useful to compare a field to BLANKS
if blanks have been trimmed from the field.
The following are some examples of using condition_spec
:
empid = BLANKS OR last_name = BLANKS (dept_id = SPORTING GOODS OR dept_id = SHOES) AND total_sales != 0
This clause is used to specify the name of an output file (BADFILE
, DISCARDFILE,
or LOGFILE
). The directory object name is the name of a directory object where the user accessing the external table has privileges to write. If the directory object name is omitted, then the value specified for the DEFAULT
DIRECTORY
clause in the CREATE
TABLE...
ORGANIZATION
EXTERNAL
statement is used.
The filename
parameter is the name of the file to create in the directory object. The access driver does some symbol substitution to help make filenames unique in parallel loads. The symbol substitutions supported for UNIX and Windows NT are as follows (other platforms may have different symbols):
%p
is replaced by the process ID of the current process. For example, if the process ID of the access driver is 12345
, then exttab_%p.log
becomes exttab_12345.log.
%a
is replaced by the agent number of the current process. The agent number is the unique number assigned to each parallel process accessing the external table. This number is padded to the left with zeros to fill three characters. For example, if the third parallel agent is creating a file and bad_data_%a.bad
was specified as the filename, then the agent would create a file named bad_data_003.bad.
%%
is replaced by %
. If there is a need to have a percent sign in the filename, then this symbol substitution is used.
If the %
character is encountered followed by anything other than one of the preceding characters, then an error is returned.
If %p
or %a
is not used to create unique filenames for output files and an external table is being accessed in parallel, then output files may be corrupted or agents may be unable to write to the files.
If you specify BADFILE
(or DISCARDFILE
or LOGFILE
), you must specify a filename for it or you will receive an error. However, if you do not specify BADFILE
(or DISCARDFILE
or LOGFILE
), then the access driver uses the name of the table followed by _%p
as the name of the file. If no extension is supplied for the file, a default extension will be used. For bad files, the default extension is .bad;
for discard files, the default is .dsc;
and for log files, the default is .log
.
A condition
compares a range of bytes or a field from the record against a constant string. The source of the comparison can be either a field in the record or a byte range in the record. The comparison is done on a byte-by-byte basis. If a string is specified as the target of the comparison, it will be translated into the character set of the datafile. If the field has a noncharacter datatype, no datatype conversion is performed on either the field value or the string. The syntax for a condition
is as follows:
This clause describes a range of bytes or characters in the record to use for a condition. The value used for the STRING SIZES ARE
clause determines whether range
refers to bytes or characters. The range
start
and range
end
are byte or character offsets into the record. The range
start
must be less than or equal to the range
end
. Finding ranges of characters is faster for data in fixed-width character sets than it is for data in varying-width character sets. If the range refers to parts of the record that do not exist, then the record is rejected when an attempt is made to reference the range.
Note: The datafile should not mix binary data (including datatypes with binary counts, such asVARCHAR ) and character data that is in a varying-width character set or more than one byte wide. In these cases, the access driver may not find the correct start for the field, because it treats the binary data as character data when trying to find the start. |
If a field is NULL
, then any comparison of that field to any value other than NULL
will return FALSE
.
The following are some examples of using condition
:
empid != BLANKS 10:13 = 0x00000830 PRODUCT_COUNT = "MISSING"
The field_definitions
clause names the fields in the datafile and specifies how to find them in records.
If the field_definitions
clause is omitted, then:
The fields are assumed to be delimited by ','
The fields are assumed to be character type
The maximum length of the field is assumed to be 255
The order of the fields in the datafile is the order in which the fields were defined in the external table
No blanks are trimmed from the field
The following is an example of an external table created without any access parameters. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir LOCATION ('info.dat')); Alvin,Tolliver,1976 Kenneth,Baer,1963
The syntax for the field_definitions
clause is as follows:
Description of the illustration et_fields_clause.gif
The delim_spec
clause is used to identify how all fields are terminated in the record. The delim_spec
specified for all fields can be overridden for a particular field as part of the field_list
clause. For a full description of the syntax, see delim_spec.
The trim_spec
clause specifies the type of whitespace trimming to be performed by default on all character fields. The trim_spec
clause specified for all fields can be overridden for individual fields by specifying a trim_spec
clause for those fields. For a full description of the syntax, see trim_spec.
MISSING FIELD VALUES ARE NULL
indicates that if there is not enough data in a record for all fields, then those fields with missing data values are set to NULL
. For a full description of the syntax, see MISSING FIELD VALUES ARE NULL.
REJECT ROWS WITH ALL NULL FIELDS
indicates that a row will not be loaded into the external table if all referenced fields in the row are null. If this parameter is not specified, the default value is to accept rows with all null fields. The setting of this parameter is written to the log file either as "reject rows with all null fields" or as "rows with all null fields are accepted."
The field_list
clause identifies the fields in the datafile and their datatypes. For a full description of the syntax, see field_list.
The delim_spec
clause is used to find the end (and if ENCLOSED
BY
is specified, the start) of a field. Its syntax is as follows:
If ENCLOSED
BY
is specified, the access driver starts at the current position in the record and skips over all whitespace looking for the first delimiter. All whitespace between the current position and the first delimiter is ignored. Next, the access driver looks for the second enclosure delimiter (or looks for the first one again if a second one is not specified). Everything between those two delimiters is considered part of the field.
If TERMINATED
BY
string
is specified with the ENCLOSED
BY
clause, then the terminator string must immediately follow the second enclosure delimiter. Any whitespace between the second enclosure delimiter and the terminating delimiter is skipped. If anything other than whitespace is found between the two delimiters, then the row is rejected for being incorrectly formatted.
If TERMINATED
BY
is specified without the ENCLOSED
BY
clause, then everything between the current position in the record and the next occurrence of the termination string is considered part of the field.
If OPTIONALLY
is specified, then TERMINATED
BY
must also be specified. The OPTIONALLY
parameter means the ENCLOSED
BY
delimiters can either both be present or both be absent. The terminating delimiter must be present regardless of whether the ENCLOSED
BY
delimiters are present. If OPTIONALLY
is specified, then the access driver skips over all whitespace, looking for the first nonblank character. Once the first nonblank character is found, the access driver checks to see if the current position contains the first enclosure delimiter. If it does, then the access driver finds the second enclosure string and everything between the first and second enclosure delimiters is considered part of the field. The terminating delimiter must immediately follow the second enclosure delimiter (with optional whitespace allowed between the second enclosure delimiter and the terminating delimiter). If the first enclosure string is not found at the first nonblank character, then the access driver looks for the terminating delimiter. In this case, leading blanks are trimmed.
See Also: Table 9-5 for a description of the access driver's default trimming behavior. You can override this behavior withLTRIM and RTRIM . |
After the delimiters have been found, the current position in the record is set to the spot after the last delimiter for the field. If TERMINATED
BY
WHITESPACE
was specified, then the current position in the record is set to after all whitespace following the field.
A missing terminator for the last field in the record is not an error. The access driver proceeds as if the terminator was found. It is an error if the second enclosure delimiter is missing.
The string used for the second enclosure can be included in the data field by including the second enclosure twice. For example, if a field is enclosed by single quotation marks, a data field could contain a single quotation mark by doing something like the following:
'I don''t like green eggs and ham'
There is no way to quote a terminator string in the field data without using enclosing delimiters. Because the field parser does not look for the terminating delimiter until after it has found the enclosing delimiters, the field can contain the terminating delimiter.
In general, specifying single characters for the strings is faster than multiple characters. Also, searching data in fixed-width character sets is usually faster than searching data in varying-width character sets.
Note: The use of the backslash character (\) within strings is not supported in external tables. |
The following is an example of an external table that uses terminating delimiters. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY WHITESPACE) LOCATION ('info.dat')); Alvin Tolliver 1976 Kenneth Baer 1963 Mary Dube 1973
The following is an example of an external table that uses both enclosure and terminator delimiters. Remember that all whitespace between a terminating string and the first enclosure string is ignored, as is all whitespace between a second enclosing delimiter and the terminator. The example is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY "," ENCLOSED BY "(" AND ")") LOCATION ('info.dat')); (Alvin) , (Tolliver),(1976) (Kenneth), (Baer) ,(1963) (Mary),(Dube) , (1973)
The following is an example of an external table that uses optional enclosure delimiters. Note that LRTRIM
is used to trim leading and trailing blanks from fields. The example is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '(' and ')' LRTRIM) LOCATION ('info.dat')); Alvin , Tolliver , 1976 (Kenneth), (Baer), (1963) ( Mary ), Dube , (1973)
The trim_spec
clause is used to specify that spaces should be trimmed from the beginning of a text field, the end of a text field, or both. Spaces include blanks and other nonprinting characters such as tabs, line feeds, and carriage returns. The syntax for the trim_spec
clause is as follows:
NOTRIM
indicates that no characters will be trimmed from the field.
LRTRIM
, LTRIM
, and RTRIM
are used to indicate that characters should be trimmed from the field. LRTRIM
means that both leading and trailing spaces are trimmed. LTRIM
means that leading spaces will be trimmed. RTRIM
means trailing spaces are trimmed.
LDRTRIM
is used to provide compatibility with SQL*Loader trim features. It is the same as NOTRIM
except in the following cases:
If the field is not a delimited field, then spaces will be trimmed from the right.
If the field is a delimited field with OPTIONALLY
ENCLOSED
BY
specified, and the optional enclosures are missing for a particular instance, then spaces will be trimmed from the left.
The default is LDRTRIM
. Specifying NOTRIM
yields the fastest performance.
The trim_spec
clause can be specified before the field list to set the default trimming for all fields. If trim_spec
is omitted before the field list, then LDRTRIM
is the default trim setting. The default trimming can be overridden for an individual field as part of the datatype_spec
.
If trimming is specified for a field that is all spaces, then the field will be set to NULL
.
In the following example, all data is fixed-length; however, the character data will not be loaded with leading spaces. The example is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS LTRIM) LOCATION ('info.dat')); Alvin, Tolliver,1976 Kenneth, Baer, 1963 Mary, Dube, 1973
MISSING FIELD VALUES ARE NULL
indicates that if there is not enough data in a record for all fields, then those fields with missing data values are set to NULL
. If MISSING FIELD VALUES ARE NULL
is not specified, and there is not enough data in the record for all fields, then the row is rejected.
In the following example, the second record is stored with a NULL
set for the year_of_birth
column, even though the data for the year of birth is missing from the datafile. If the MISSING FIELD VALUES ARE NULL
clause was omitted from the access parameters, then the second row would be rejected because it did not have a value for the year_of_birth
column. The example is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth INT) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY "," MISSING FIELD VALUES ARE NULL) LOCATION ('info.dat')); Alvin,Tolliver,1976 Baer,Kenneth Mary,Dube,1973
The field_list
clause identifies the fields in the datafile and their datatypes. Evaluation criteria for the field_list
clause are as follows:
If no datatype is specified for a field, it is assumed to be CHAR(1)
for a nondelimited field, and CHAR(255)
for a delimited field.
If no field list is specified, then the fields in the datafile are assumed to be in the same order as the fields in the external table. The datatype for all fields is CHAR(255)
unless the column in the database is CHAR
or VARCHAR
. If the column in the database is CHAR
or VARCHAR
, then the datatype for the field is still CHAR
but the length is either 255 or the length of the column, whichever is greater.
If no field list is specified and no delim_spec
clause is specified, then the fields in the datafile are assumed to be in the same order as fields in the external table. All fields are assumed to be CHAR(255)
and terminated by a comma.
This example shows the definition for an external table with no field_list
and a delim_spec
. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth INT) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY "|") LOCATION ('info.dat')); Alvin|Tolliver|1976 Kenneth|Baer|1963 Mary|Dube|1973
The syntax for the field_list
clause is as follows:
The field_name
is a string identifying the name of a field in the datafile. If the string is not within quotation marks, the name is uppercased when matching field names with column names in the external table.
If field_name
matches the name of a column in the external table that is referenced in the query, then the field value is used for the value of that external table column. If the name does not match any referenced name in the external table, then the field is not loaded but can be used for clause evaluation (for example WHEN
or NULLIF
).
The pos_spec
clause indicates the position of the column within the record. For a full description of the syntax, see pos_spec Clause.
The datatype_spec
clause indicates the datatype of the field. If datatype_spec
is omitted, the access driver assumes the datatype is CHAR(255)
. For a full description of the syntax, see datatype_spec Clause.
The init_spec
clause indicates when a field is NULL
or has a default value. For a full description of the syntax, see init_spec Clause.
The pos_spec
clause indicates the position of the column within the record. The setting of the STRING
SIZES
ARE
IN
clause determines whether pos_spec
refers to byte positions or character positions. Using character positions with varying-width character sets takes significantly longer than using character positions with fixed-width character sets. Binary and multibyte character data should not be present in the same datafile when pos_spec
is used for character positions. If they are, then the results are unpredictable. The syntax for the pos_spec
clause is as follows:
The start
parameter is the number of bytes or characters from the beginning of the record to where the field begins. It positions the start of the field at an absolute spot in the record rather than relative to the position of the previous field.
The *
parameter indicates that the field begins at the first byte or character after the end of the previous field. This is useful if you have a varying-length field followed by a fixed-length field. This option cannot be used for the first field in the record.
The increment
parameter positions the start of the field at a fixed number of bytes or characters from the end of the previous field. Use *-
increment
to indicate that the start of the field starts before the current position in the record (this is a costly operation for multibyte character sets). Use *+
increment
to move the start after the current position.
The end
parameter indicates the absolute byte or character offset into the record for the last byte of the field. If start
is specified along with end
, then end
cannot be less than start
. If *
or increment
is specified along with end
, and the start
evaluates to an offset larger than the end
for a particular record, then that record will be rejected.
The length
parameter indicates that the end of the field is a fixed number of bytes or characters from the start. It is useful for fixed-length fields when the start is specified with *
.
The following example shows various ways of using pos_spec
. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), year_of_birth INT, phone CHAR(12), area_code CHAR(3), exchange CHAR(3), extension CHAR(4)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS RTRIM (first_name (1:15) CHAR(15), last_name (*:+20), year_of_birth (36:39), phone (40:52), area_code (*-12: +3), exchange (*+1: +3), extension (*+1: +4))) LOCATION ('info.dat')); Alvin Tolliver 1976415-922-1982 Kenneth Baer 1963212-341-7912 Mary Dube 1973309-672-2341
The datatype_spec
clause is used to describe the datatype of a field in the datafile if the datatype is different than the default. The datatype of the field can be different than the datatype of a corresponding column in the external table. The access driver handles the necessary conversions. The syntax for the datatype_spec
clause is as follows:
If the number of bytes or characters in any field is 0, then the field is assumed to be NULL
. The optional DEFAULTIF
clause specifies when the field is set to its default value. Also, the optional NULLIF
clause specifies other conditions for when the column associated with the field is set to NULL
. If the DEFAULTIF
or NULLIF
clause is true
, then the actions of those clauses override whatever values are read from the datafile.
See Also:
|
This clause defines a field as an integer. If EXTERNAL
is specified, the number is a character string. If EXTERNAL
is not specified, the number is a binary field. The valid values for len
in binary integer fields are 1, 2, 4, and 8. If len
is omitted for binary integers, the default value is whatever the value of sizeof(int)
is on the platform where the access driver is running. Use of the DATA
IS
{BIG
|
LITTLE}
ENDIAN
clause may cause the data to be byte-swapped before it is stored.
If EXTERNAL
is specified, then the value of len
is the number of bytes or characters in the number (depending on the setting of the STRING
SIZES
ARE
IN
BYTES
or CHARACTERS
clause). If no length is specified, the default value is 255.
The DECIMAL
clause is used to indicate that the field is a packed decimal number. The ZONED
clause is used to indicate that the field is a zoned decimal number. The precision
field indicates the number of digits in the number. The scale
field is used to specify the location of the decimal point in the number. It is the number of digits to the right of the decimal point. If scale
is omitted, a value of 0 is assumed.
Note that there are different encoding formats of zoned decimal numbers depending on whether the character set being used is EBCDIC-based or ASCII-based. If the language of the source data is EBCDIC, then the zoned decimal numbers in that file must match the EBCDIC encoding. If the language is ASCII-based, then the numbers must match the ASCII encoding.
If the EXTERNAL
parameter is specified, then the data field is a character string whose length matches the precision of the field.
ORACLE_DATE
is a field containing a date in the Oracle binary date format. This is the format used by the DTYDAT
datatype in Oracle Call Interface (OCI) programs. The field is a fixed length of 7.
ORACLE_NUMBER
is a field containing a number in the Oracle number format. The field is a fixed length (the maximum size of an Oracle number field) unless COUNTED
is specified, in which case the first byte of the field contains the number of bytes in the rest of the field.
ORACLE_NUMBER
is a fixed-length 22-byte field. The length of an ORACLE_NUMBER
COUNTED
field is one for the count byte, plus the number of bytes specified in the count byte.
The following four datatypes, DOUBLE
, FLOAT
, BINARY_DOUBLE
, and BINARY_FLOAT
are floating-point numbers.
DOUBLE
and FLOAT
are the floating-point formats used natively on the platform in use. They are the same datatypes used by default for the DOUBLE
and FLOAT
datatypes in a C program on that platform. BINARY_FLOAT
and BINARY_DOUBLE
are floating-point numbers that conform substantially with the Institute for Electrical and Electronics Engineers (IEEE) Standard for Binary Floating-Point Arithmetic, IEEE Standard 754-1985. Because most platforms use the IEEE standard as their native floating-point format, FLOAT
and BINARY_FLOAT
are the same on those platforms and DOUBLE
and BINARY_DOUBLE
are also the same.
The DOUBLE
clause indicates that the field is the same format as the C language DOUBLE
datatype on the platform where the access driver is executing. Use of the DATA
IS
{BIG
|
LITTLE}
ENDIAN
clause may cause the data to be byte-swapped before it is stored. This datatype may not be portable between certain platforms.
The FLOAT
clause indicates that the field is the same format as the C language FLOAT
datatype on the platform where the access driver is executing. Use of the DATA
IS
{BIG
|
LITTLE}
ENDIAN
clause may cause the data to be byte-swapped before it is stored. This datatype may not be portable between certain platforms.
If the EXTERNAL
parameter is specified, then the field is a character string whose maximum length is 255. See
BINARY_DOUBLE
is a 64-bit, double-precision, floating-point number datatype. Each BINARY_DOUBLE
value requires 9 bytes, including a length byte. See the information in the note provided for the FLOAT
datatype for more details about floating-point numbers.
BINARY_FLOAT
is a 32-bit, single-precision, floating-point number datatype. Each BINARY_FLOAT
value requires 5 bytes, including a length byte. See the information in the note provided for the FLOAT
datatype for more details about floating-point numbers.
The RAW
clause is used to indicate that the source data is binary data. The len
for RAW
fields is always in number of bytes. When a RAW
field is loaded in a character column, the data that is written into the column is the hexadecimal representation of the bytes in the RAW
field.
The CHAR
clause is used to indicate that a field is a character datatype. The length (len)
for CHAR
fields specifies the largest number of bytes or characters in the field. The len
is in bytes or characters, depending on the setting of the STRING
SIZES
ARE
IN
clause.
If no length is specified for a field of datatype CHAR
, then the size of the field is assumed to be 1, unless the field is delimited:
For a delimited CHAR
field, if a length is specified, that length is used as a maximum.
For a delimited CHAR
field for which no length is specified, the default is 255 bytes.
For a delimited CHAR
field that is greater than 255 bytes, you must specify a maximum length. Otherwise you will receive an error stating that the field in the datafile exceeds maximum length.
The date_format_spec
clause is used to indicate that the field contains a date or time in the specified format.
The following example shows the use of the CHAR
clause. It is followed by a sample of the datafile that can be used to load it.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), hire_date CHAR(10), resume_file CHAR(500)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY "," (first_name, last_name, hire_date CHAR(10) DATE_FORMAT DATE MASK "mm/dd/yyyy", resume_file)) LOCATION ('info.dat')); Alvin,Tolliver,12/2/1995,tolliver_resume.ps Kenneth,Baer,6/6/1997,KB_resume.ps Mary,Dube,1/18/2000,dube_resume.ps
The date_format_spec
clause is used to indicate that a character string field contains date data, time data, or both, in a specific format. This information is used only when a character field is converted to a date or time datatype and only when a character string field is mapped into a date column. The syntax for the date_format_spec
clause is as follows:
The MASK
clause is used to override the default globalization format mask for the datatype. If a date mask is not specified, then the settings of NLS parameters for the session (not the client settings) for the appropriate globalization parameter for the datatype are used.
NLS_DATE_FORMAT
for DATE
datatypes
NLS_TIME_FORMAT
for TIME
datatypes
NLS_TIMESTAMP_FORMAT
for TIMESTAMP
datatypes
NLS_TIME_WITH_TIMEZONE_FORMAT
for TIME
WITH
TIME
ZONE
datatypes
NLS_TIMESTAMP_WITH_TIMEZONE_FORMAT
for TIMESTAMP
WITH
TIME
ZONE
datatypes
The VARCHAR
datatype has a binary count field followed by character data. The value in the binary count field is either the number of bytes in the field or the number of characters. See STRING SIZES ARE IN for information about how to specify whether the count is interpreted as a count of characters or count of bytes.
The VARRAW
datatype has a binary count field followed by binary data. The value in the binary count field is the number of bytes of binary data. The data in the VARRAW
field is not affected by the DATA
IS
…ENDIAN
clause.
The optional length_of_length
field in the specification is the number of bytes in the count field. Valid values for length_of_length
for VARCHAR
are 1, 2, 4, and 8. If length_of_length
is not specified, a value of 2 is used. The count field has the same endianness as specified by the DATA
IS
…ENDIAN
clause.
The max_len
field is used to indicate the largest size of any instance of the field in the datafile. For VARRAW
fields, max_len
is number of bytes. For VARCHAR
fields, max_len
is either number of characters or number of bytes depending on the STRING
SIZES
ARE
IN
clause.
The following example shows various uses of VARCHAR
and VARRAW.
The binary values for the count bytes and value for raw data are shown in the datafile in italics, with 2 characters per binary byte.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), resume CHAR(2000), picture RAW(2000)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS (first_name VARCHAR(2,12), last_name VARCHAR(2,20), resume VARCHAR(4,10000), picture VARRAW(4,100000))) LOCATION ('info.dat'));
0005Alvin0008Tolliver0000001DAlvin Tolliver's Resume etc. 0000001013f4690a30bc29d7e40023ab4599ffff
The VARCHARC
datatype has a character count field followed by character data. The value in the count field is either the number of bytes in the field or the number of characters. See STRING SIZES ARE IN for information about how to specify whether the count is interpreted as a count of characters or count of bytes. The optional length_of_length
is either the number of bytes or the number of characters in the count field for VARCHARC
, depending on whether lengths are being interpreted as characters or bytes.
The maximum value for length_of_lengths
for VARCHARC
is 10 if string sizes are in characters, and 20 if string sizes are in bytes. The default value for length_of_length
is 5.
The VARRAWC
datatype has a character count field followed by binary data. The value in the count field is the number of bytes of binary data. The length_of_length
is the number of bytes in the count field.
The max_len
field is used to indicate the largest size of any instance of the field in the datafile. For VARRAWC
fields, max_len
is number of bytes. For VARCHARC
fields, max_len
is either number of characters or number of bytes depending on the STRING
SIZES
ARE
IN
clause.
The following example shows various uses of VARCHARC
and VARRAWC
. The length of the picture
field is 0, which means the field is set to NULL
.
CREATE TABLE emp_load (first_name CHAR(15), last_name CHAR(20), resume CHAR(2000), picture RAW (2000)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS (first_name VARCHARC(5,12), last_name VARCHARC(2,20), resume VARCHARC(4,10000), picture VARRAWC(4,100000))) LOCATION ('info.dat')); 00007William05Ricca0035Resume for William Ricca is missing0000
The init_spec
clause is used to specify when a field should be set to NULL
or when it should be set to a default value. The syntax for the init_spec
clause is as follows:
Only one NULLIF
clause and only one DEFAULTIF
clause can be specified for any field. These clauses behave as follows:
If NULLIF
condition_spec
is specified and it evaluates to true
, the field is set to NULL
.
If DEFAULTIF
condition_spec
is specified and it evaluates to true
, the value of the field is set to a default value. The default value depends on the datatype of the field, as follows:
For a character datatype, the default value is an empty string.
For a numeric datatype, the default value is a 0.
For a date datatype, the default value is NULL
.
If a NULLIF
clause and a DEFAULTIF
clause are both specified for a field, then the NULLIF
clause is evaluated first and the DEFAULTIF
clause is evaluated only if the NULLIF
clause evaluates to false
.
The optional column_transforms
clause provides transforms that you can use to describe how to load columns in the external table that do not map directly to columns in the datafile. The syntax for the column_transforms
clause is as follows:
Each transform specified in the transform
clause identifies a column in the external table and then a specifies how to calculate the value of the column. The syntax is as follows:
The NULL
transform is used to set the external table column to NULL
in every row. The CONSTANT
transform is used to set the external table column to the same value in every row. The CONCAT
transform is used to set the external table column to the concatenation of constant strings and/or fields in the current record from the datafile. The LOBFILE
transform is used to load data into a field for a record from another datafile. Each of these transforms is explained further in the following sections.
The column_name
uniquely identifies a column in the external table to be loaded. Note that if the name of a column is mentioned in the transform
clause, then that name cannot be specified in the FIELDS
clause as a field in the datafile.
When the NULL
transform is specified, every value of the field is set to NULL
for every record.
The CONSTANT
transform uses the value of the string specified as the value of the column in the record. If the column in the external table is not a character string type, then the constant string will be converted to the datatype of the column. This conversion will be done for every row.
The character set of the string used for datatype conversions is the character set of the database.
The CONCAT
transform concatenates constant strings and fields in the datafile together to form one string. Only fields that are character datatypes and that are listed in the fields
clause can be used as part of the concatenation. Other column transforms cannot be specified as part of the concatenation.
The LOBFILE
transform is used to identify a file whose contents are to be used as the value for a column in the external table. All LOBFILEs are identified by an optional directory object and a filename in the form <directory object
>:<filename
>. The following rules apply to use of the LOBFILE
transform:
Both the directory object and the filename can be either a constant string or the name of a field in the field clause.
If a constant string is specified, then that string is used to find the LOBFILE for every row in the table.
If a field name is specified, then the value of that field in the datafile is used to find the LOBFILE.
If a field name is specified for either the directory object or the filename and if the value of that field is NULL
, then the column being loaded by the LOBFILE is also set to NULL
.
If the directory object is not specified, then the default directory specified for the external table is used.
If a field name is specified for the directory object, the FROM
clause also needs to be specified.
Note that the entire file is used as the value of the LOB column. If the same file is referenced in multiple rows, then that file is reopened and reread in order to populate each column.
The lobfile_attr_list
lists additional attributes of the LOBFILE. The syntax is as follows:
The FROM
clause lists the names of all directory objects that will be used for LOBFILEs. It is used only when a field name is specified for the directory object of the name of the LOBFILE. The purpose of the FROM
clause is to determine the type of access allowed to the named directory objects during initialization. If directory object in the value of field is not a directory object in this list, then the row will be rejected.
The CLOB
attribute indicates that the data in the LOBFILE is character data (as opposed to RAW
data). Character data may need to be translated into the character set used to store the LOB in the database.
The CHARACTERSET
attribute contains the name of the character set for the data in the LOBFILEs.
The BLOB
attribute indicates that the data in the LOBFILE is raw data.
If neither CLOB
nor BLOB
is specified, then CLOB
is assumed. If no character set is specified for character LOBFILEs, then the character set of the datafile is assumed.
When identifiers (for example, column or table names) are specified in the external table access parameters, certain values are considered to be reserved words by the access parameter parser. If a reserved word is used as an identifier it must be enclosed in double quotation marks. The following are the reserved words for the ORACLE_LOADER
access driver:
ALL
AND
ARE
ASTERISK
AT
ATSIGN
BADFILE
BADFILENAME
BACKSLASH
BENDIAN
BIG
BLANKS
BY
BYTES
BYTESTR
CHAR
CHARACTERS
CHARACTERSET
CHARSET
CHARSTR
CHECK
CLOB
COLLENGTH
COLON
COLUMN
COMMA
CONCAT
CONSTANT
COUNTED
DATA
DATE
DATE_CACHE
DATE_FORMAT
DATEMASK
DAY
DEBUG
DECIMAL
DEFAULTIF
DELIMITBY
DELIMITED
DISCARDFILE
DOT
DOUBLE
DOUBLETYPE
DQSTRING
DQUOTE
DSCFILENAME
ENCLOSED
ENDIAN
ENDPOS
EOF
EQUAL
EXIT
EXTENDED_IO_PARAMETERS
EXTERNAL
EXTERNALKW
EXTPARM
FIELD
FIELDS
FILE
FILEDIR
FILENAME
FIXED
FLOAT
FLOATTYPE
FOR
FROM
HASH
HEXPREFIX
IN
INTEGER
INTERVAL
LANGUAGE
IS
LEFTCB
LEFTTXTDELIM
LEFTP
LENDIAN
LDRTRIM
LITTLE
LOAD
LOBFILE
LOBPC
LOBPCCONST
LOCAL
LOCALTZONE
LOGFILE
LOGFILENAME
LRTRIM
LTRIM
MAKE_REF
MASK
MINUSSIGN
MISSING
MISSINGFLD
MONTH
NEWLINE
NO
NOCHECK
NOT
NOBADFILE
NODISCARDFILE
NOLOGFILE
NOTEQUAL
NOTERMBY
NOTRIM
NULL
NULLIF
OID
OPTENCLOSE
OPTIONALLY
OPTIONS
OR
ORACLE_DATE
ORACLE_NUMBER
PLUSSIGN
POSITION
PROCESSING
QUOTE
RAW
READSIZE
RECNUM
RECORDS
REJECT
RIGHTCB
RIGHTTXTDELIM
RIGHTP
ROW
ROWS
RTRIM
SCALE
SECOND
SEMI
SETID
SIGN
SIZES
SKIP
STRING
TERMBY
TERMEOF
TERMINATED
TERMWS
TERRITORY
TIME
TIMESTAMP
TIMEZONE
TO
TRANSFORMS
UNDERSCORE
UINTEGER
UNSIGNED
VALUES
VARCHAR
VARCHARC
VARIABLE
VARRAW
VARRAWC
VLENELN
VMAXLEN
WHEN
WHITESPACE
WITH
YEAR
ZONED