5
ODM PL/SQL Sample Programs

This chapter provides sample code using DBMS_DATA_MINING for all the supported algorithms. The dataset used is the Drug Depot dataset that is available as part of the sample schema in Oracle10g. Please refer to Oracle Database Sample Schemas for information on sample schemas.

All samples are available in the directory $ORACLE_HOME/dm/demo/sample/plsql.

ODM sample datasets need to be loaded into a user schema prior to using the sample programs. Refer to the following scripts for creating Oracle tablespace, user schema, and loading ODM sample datasets:

$ORACLE_HOME/dm/admin/odmtbs.sql
$ORACLE_HOME/dm/admin/odmuser.sql
$ORACLE_HOME/dm/admin/dmuserld.sql
$ORACLE_HOME/dm/admin/dmshgrants.sql

5.1 Overview of ODM PL/SQL Sample Programs

The ODM PL/SQL sample programs illustrate the main operations of the data mining process:

Preparing the data
Building a model
Testing the model
Applying the model to new data (scoring the data)

Data mining models can be either supervised or unsupervised.

Supervised models predict the value of a specified variable, called the target variable, together with the confidence associated with each prediction. Supervised models are illustrated in the sample programs for Naive Bayes (NB), Adaptive Bayes Networks (ABN), and Support Vector Machines (SVM).

Unsupervised models have no target variable; they are used to predict group membership or relationships of an individual. Unsupervised models are illustrated in the sample programs for Clustering, Association Rules, and Non-Negative Matrix Factorization. Attribute Importance is also illustrated.

The PL/SQL sample programs rely on two sets of data:

Individual datasets: All samples named algorithm_demo.sql are based on these datasets. These datasets must be loaded using $ORACLE_HOME/dm/admin/dmuserld.sql in the user schema executing these demos.
SH schema dataset: All samples named algorithm_sh.sql are based on datasets derived from the SH schema. The SH schema must be installed as part of RDBMS installation. The script
$ORACLE_HOME/dm/admin/dmshgrants.sql must be run by a user with privileges to access the SH schema, and the script
$ORACLE_HOME/dm/admin/dmsh.sql must be run in the user schema executing these demos.

The file $ORACLE_HOME/dm/demo/data/README.txt explains the datasets.

Each sample program for demonstrating Classification (NB, ABN, SVM) contains code that prepares the input data using DBMS_DATA_MINING_TRANSFORM, builds a model, tests a model, and then scores the model against new data. It demonstrates how to generate test results such as a confusion matrix, lift, ROC, and ranked Apply results.

The samples for Regression using SVM normalize the input data, build models, and test models using metrics such as root mean squared error, apply the models to new data, and generate ranked results.

The samples for Association demonstrate model build, and show how to obtain frequent itemsets and association rules for a given support and confidence.

The samples for Clustering demonstrate model build, and show how to obtain clustering details such as histograms, child nodes, and rules. The clusters are scored and ranked based on their probability.

The samples for Feature Extraction demonstrate model build, and show how to obtain details of various features. The features are scored and ranked based on their probability.

There is one sample program demonstrating the BLAST interface for biological sequence match and alignment.

Finally, there are three sample programs that demonstrate text mining for extracting features from a text document into a nested table column, text classification using SVM, and text feature extraction using NMF, respectively.

5.2 Summary of ODM PL/SQL Sample Programs

All the sample programs listed in the tables below are located in the directory $ORACLE_HOME/dm/demo/sample/plsql.

The summary description of these sample programs is also provided in $ORACLE_HOME/dm/demo/sample/plsql/README.txt.

Table 5-1 PL/SQL Samples Based on Individual Datasets

Sample Program	Description
`aidemo.sq`l	Attribute Importance using an MDL-based algorithm.
`abndemo.sql`	Classification using Adaptive Bayes Network algorithm
`ardemo.sql`	Association using Apriori algorithm
`blastdemo.sql`	BLAST sequence matching and alignment
`kmdemo.sql`	Clustering using k-Means algorithm
`nbdemo.sql`	Classification using Naive Bayes algorithm
`nmfdemo.sql`	Feature Extraction using NMF algorithm
`svmcdemo.sql`	Classification using SVM algorithm
`svmrdemo.sql`	Regression using SVM algorithm

Table 5-2 PL/SQL Samples Based on SH Schema

Sample Program	Description
`ai_sh.sql`	Attribute Importance using an MDL-based algorithm
`abn_sh.sql`	Classification using Adaptive Bayes Network algorithm
`ar_sh_.sql`	Association using Apriori algorithm
`akm_sh.sql`	Clustering using k-Means algorithm
`nb_sh.sql`	Classification using Naive Bayes algorithm
`nmf_sh.sql`	Feature Extraction using NMF algorithm
`svmc_sh.sql`	Classification using SVM algorithm
`svmr_sh.sql`	Regression using SVM algorithm
`textfe.sql`	Demonstrates extracting text features from a `CLOB/VARCHAR2` column into a nested table column in a table that can be provided as input to `CREATE_MODEL`
`textnmf.sql`	Text feature extraction using NMF
`textsvmc.sql`	Text classification using SVM

5 ODM PL/SQL Sample Programs

5.1 Overview of ODM PL/SQL Sample Programs

5.2 Summary of ODM PL/SQL Sample Programs

Table 5-1 PL/SQL Samples Based on Individual Datasets

Table 5-2 PL/SQL Samples Based on SH Schema

5
ODM PL/SQL Sample Programs