Oracle Data Mining Application Developer's Guide 10g Release 1 (10.1) Part Number B10699-01 |
|
|
View PDF |
This chapter provides sample code using DBMS_DATA_MINING
for all the supported algorithms. The dataset used is the Drug Depot dataset that is available as part of the sample schema in Oracle10g. Please refer to Oracle Database Sample Schemas for information on sample schemas.
All samples are available in the directory $ORACLE_HOME/dm/demo/sample/plsql
.
ODM sample datasets need to be loaded into a user schema prior to using the sample programs. Refer to the following scripts for creating Oracle tablespace, user schema, and loading ODM sample datasets:
$ORACLE_HOME/dm/admin/odmtbs.sql $ORACLE_HOME/dm/admin/odmuser.sql $ORACLE_HOME/dm/admin/dmuserld.sql $ORACLE_HOME/dm/admin/dmshgrants.sql
The ODM PL/SQL sample programs illustrate the main operations of the data mining process:
Data mining models can be either supervised or unsupervised.
Supervised models predict the value of a specified variable, called the target variable, together with the confidence associated with each prediction. Supervised models are illustrated in the sample programs for Naive Bayes (NB), Adaptive Bayes Networks (ABN), and Support Vector Machines (SVM).
Unsupervised models have no target variable; they are used to predict group membership or relationships of an individual. Unsupervised models are illustrated in the sample programs for Clustering, Association Rules, and Non-Negative Matrix Factorization. Attribute Importance is also illustrated.
The PL/SQL sample programs rely on two sets of data:
algorithm
_demo.sql
are based on these datasets. These datasets must be loaded using $ORACLE_HOME/dm/admin/dmuserld.sql
in the user schema executing these demos.algorithm_
sh.sql
are based on datasets derived from the SH schema. The SH schema must be installed as part of RDBMS installation. The script$ORACLE_HOME/dm/admin/dmshgrants.sql
must be run by a user with privileges to access the SH schema, and the script$ORACLE_HOME/dm/admin/dmsh.sql
must be run in the user schema executing these demos.The file $ORACLE_HOME/dm/demo/data/README
.txt explains the datasets.
Each sample program for demonstrating Classification (NB, ABN, SVM) contains code that prepares the input data using DBMS_DATA_MINING_TRANSFORM
, builds a model, tests a model, and then scores the model against new data. It demonstrates how to generate test results such as a confusion matrix, lift, ROC, and ranked Apply results.
The samples for Regression using SVM normalize the input data, build models, and test models using metrics such as root mean squared error, apply the models to new data, and generate ranked results.
The samples for Association demonstrate model build, and show how to obtain frequent itemsets and association rules for a given support and confidence.
The samples for Clustering demonstrate model build, and show how to obtain clustering details such as histograms, child nodes, and rules. The clusters are scored and ranked based on their probability.
The samples for Feature Extraction demonstrate model build, and show how to obtain details of various features. The features are scored and ranked based on their probability.
There is one sample program demonstrating the BLAST interface for biological sequence match and alignment.
Finally, there are three sample programs that demonstrate text mining for extracting features from a text document into a nested table column, text classification using SVM, and text feature extraction using NMF, respectively.
All the sample programs listed in the tables below are located in the directory $ORACLE_HOME/dm/demo/sample/plsql
.
The summary description of these sample programs is also provided in $ORACLE_HOME/dm/demo/sample/plsql/README.txt
.