Oracle® Ultra Search User's Guide 10g Release 1 (10.1) Part Number B10731-02 |
|
|
View PDF |
This section describes Ultra Search new features, with pointers to additional information. It also explains the Ultra Search release history.
Secure Crawling
Ultra Search provides secure crawling with the following types of authentication:
Digest Authentication Ultra Search supports HTTP digest authentication, and the Ultra Search crawler can authenticate itself to Web servers employing HTTP digest authentication scheme. This is based on a simple challenge-response paradigm; however, the password is encrypted. HTML Form Authentication HTML form-based authentication is the most commonly used authentication scheme on the Web. Ultra Search lets you register HTML forms that you want the Ultra Search crawler to automatically fill out during Web crawling. HTML form authentication requires that HTTP cookie functionality is enabled, which is the default.
See Also: "Creating Web Sources" |
Indexing Control of Dynamically Generated Web Pages
The crawler can be configured to not index Web pages that are dynamically generated (for example, if a URL contains a question mark).
See Also: "Creating Web Sources" |
HTTPS
Ultra Search now supports HTTPS (HTTP over SSL). The Ultra Search crawler can now crawl HTTPS URLs (for example, https
://www
.foo
.com
).
Secure Searching
Ultra Search now supports secure searches. Secure searches return only documents that the search user is allowed to view.
Each indexed document can be protected by an access control list (ACL). During searches, the ACL is evaluated. If the user performing the search has permission to read the protected document, then the document is returned by the query API. Otherwise, it is not returned.
Ultra Search stores ACLs in the Oracle XML DB repository. Ultra Search also uses Oracle XML DB functionality to evaluate ACLs.
See Also: "Secure Search" |
Remote Crawler JDBC Caching Support
It is now possible to use the remote crawler without mounting the remote cache directory to the server machine. Instead, the cache files are sent over the crawler's JDBC connection to the server cache directory.
See Also: "JDBC-Based Remote Crawling" and "Remote Crawler Profiles" |
Manual Launch Scheduling
A schedule can be created with no scheduled launch time, so that it can only be started on demand.
See Also: "Data Synchronization" |
Crawler Log File Versioning
For each data source, the crawler will preserve the latest 3 log files. This avoids wiping out previous crawling log file on recrawl.
See Also: "Crawler Logging" |
New PL/SQL Administration APIs
Ultra Search now includes APIs for various administration tasks, such as crawler, schedule, and instance administration.
See Also: Chapter 10, "Administration PL/SQL APIs" |
Integration with Oracle Internet Directory
Oracle Internet Directory is Oracle's native LDAP v3-compliant directory service, built as an application on top of the Oracle Database. Ultra Search integrates with Oracle Internet Directory in the following areas:
Ultra Search administration groups and group membership are stored in Oracle Internet Directory.
Users are authenticated through the single sign-on (SSO) server and Oracle Internet Directory.
Oracle Internet Directory performs authorization on Ultra Search users' administration privileges.
Cookie Support
Cookies remember context between HTTP requests. For example, the server can send a cookie such that it knows if a user has already logged on and does not need to log on again. Cookie support is enabled by default.
Crawler Cache Deletion Control
During crawling, documents are stored in the cache directory. Every time the preset size is reached, crawling stops and indexing starts. In previous releases, the cache file was always deleted when indexing was done. You can now specify not to delete the cache file when indexing is done. This option applies to all data sources. The default is to delete the cache file after indexing.
See Also: "Crawler Page" |
URL Boundary Rules Include Port Number Inclusion or Exclusion
You can set URL boundary rules to refine the crawling space. You can now include or exclude Web sites with a specific port. For example, you can include www.oracle.com but not www.oracle.com:8080. By default, all ports are crawled.
See Also: "Creating Web Sources" |
Hostname Prefix Allowed in Web Data Source URL Boundary Specification
In previous releases, you could only specify suffix inclusion rules. For example, crawl only URLs ending with "oracle.com." You can now also specify prefix rules. For example, crawl "oracle.com" but not "stores.oracle.com".
See Also: "Creating Web Sources" |
Default Ultra Search Instance and Schema
Ultra Search automatically creates a default Ultra Search instance based on the default Ultra Search test user. So, you can test Ultra Search functionality based on the default instance after installation.
Monitoring Ultra Search Components with Oracle Enterprise Manager
You can use Enterprise Manager's Grid Control to monitor Ultra Search components. Using Grid Control, you can set up notification rules to send out email notification automatically whenever a schedule status reaches certain severity states. For more information on the using Grid Control to monitor Ultra Search components, see the Oracle Enterprise Manager Concepts guide.
Crawler Recrawl Policy
You can update the recrawl policy to process documents that have changed or to process all documents.
In previous releases, "process all documents" did not help when the crawling scope had been narrowed. For example, if crawling depth was reduced from seven to five, the PDF mimetype was deleted, or a host inclusion rule was removed, then you had to remove the affected documents manually in a SQL*Plus session.
With this release, all crawled URLs are subject to crawler setting enforcement, not just newly crawled URLs.
See Also: "Editing Synchronization Schedules" |
Federated Search
Traditionally, Ultra Search used centralized search to gather data on a regular basis and update one index that cataloged all searchable data. This provided fast searching, but it required that the data source to be crawlable before it could be searched. Ultra Search now also provides federated search, which allows multiple indexes to perform a single search. Each index can be maintained separately. By querying the data source at search-time, search results are always the latest results. User credentials can be passed to the data source and authenticated by the data source itself. Queries can be processed efficiently using the data's native format.
To use federated search, you must deploy an Ultra Search search adapter, or searchlet, and create an Oracle Database source. A searchlet is a Java module deployed in the middle tier (inside OC4J) that searches the data in an enterprise information system on behalf of a user. When a user's query is delegated to the searchlet, the searchlet runs the query on behalf of the user. Every searchlet is a JCA 1.0 compliant resource adapter.
See Also: "Federated Sources" |
Ultra Search is released with the Oracle Database, Oracle Application Server, and Oracle Collaboration Suite. Because of different release numbers in the past, the Ultra Search release numbers are somewhat confusing.
Oracle Ultra Search 9.0.4 is part of Oracle Application Server release 10g (9.0.4).
Oracle Ultra Search release 9.0.3 is part of the Oracle Collaboration Suite release 9.0.3.
Oracle Ultra Search release 9.2 is part of Oracle9i release 9.2. Oracle Ultra Search release 1.0.3 was part of Oracle9i release 1 (9.0.1).
Oracle Ultra Search release 9.0.2 is part of Oracle9iAS release 2 (9.0.2).