23
DBMS_DATA_MINING

Oracle Data Mining (ODM) is designed for programmers, systems analysts, project managers, and others interested in developing database applications that use data mining to discover hidden patterns and use that knowledge to make predictions.

This chapter contains the following topics:

Using DBMS_DATA_MINING

Overview

Oracle Data Mining (ODM) embeds data mining in the Oracle database. The data never leaves the database -- the data, its preparation, model building, and model scoring activities all remain in the database. This enables Oracle to provide an infrastructure for data analysts and application developers to integrate data mining seamlessly with database applications.

ODM provides two interfaces that support in-database data mining: a Java interface and a PL/SQL interface. The Java interface is described in Oracle Data Mining Application Developer's Guide.

You can use the package to build a mining model, test the model, and apply this model to your data to obtain predictive and descriptive information. For detailed examples of how to perform these tasks, see the sample programs in Oracle Data Mining Application Developer's Guide. The sample code is in the demo directory.

General information about both the Java and the PL/SQL interface is contained in Oracle Data Mining Application Developer's Guide. See Oracle Data Mining Concepts for a discussion of data mining concepts.

This chapter contains an overview of the development methodology, followed by information on data types, settings, and constants, as well as detailed descriptions of the PL/SQL subprograms. The DBMS_DATA_MINING_TRANSFORM package supports data pre-processing for data mining.

Constants

Table 23-1 through Table 23-8 list the constants to be used for various settings in the settings table.

Table 23-1 DBMS_DATA_MINING Constants Summary: Mining Function

Constant	Purpose
`association`	Parameter value for `mining function` in `CREATE_MODEL`, representing association mining function.
`attribute_importance`	Parameter value for `mining function` in `CREATE_MODEL`, representing attribute importance mining function.
`classification`	Parameter value for `mining_function` in `CREATE_MODEL`, representing classification mining function.
`regression`	Parameter value for `mining_function` in `CREATE_MODEL`, representing regression mining function.
`clustering`	Parameter value for `mining_function` in `CREATE_MODEL`, representing clustering mining function.
`feature_extraction`	Parameter value for `mining_function` in `CREATE_MODEL`, representing feature extraction mining function.

Table 23-2 DBMS_DATA_MINING Constants Summary: Function Settings

Constant	Purpose
`clas_priors_table_name`	Setting name representing prior probability table name for classification function.
`clus_num_clusters`	Setting name representing number of clusters for clustering function.
`feat_num_features`	Setting name representing number of features for feature selection function.
`asso_max_rule_length`	Setting name representing maximum rule length.
`asso_min_confidence`	Setting name representing minimum confidence.
`asso_min_support`	Setting name representing minimum support.

Table 23-3 DBMS_DATA_MINING Constants Summary: Algorithm Settings

Constant	Purpose
`algo_name`	Setting name representing the mining algorithm.
`algo_apriori_association_rules`	Setting value for Apriori algorithm for association rules.
`algo_naive_bayes`	Setting value for Naive Bayes (NB) algorithm for classification.
`algo_support_vector_machines`	Setting value for Support Vector Machines (SVM) algorithm for classification or regression.
`algo_nonnegative_matrix_factor`	Setting value for Non-Negative Matrix Factorization (NMF) for feature selection.
`algo_kmeans`	Setting value for k-Means (KM) for clustering.
`algo_ai_mdl`	Setting value for Minimum Description Length based algorithm for Attribute Importance.

Table 23-4 DBMS_DATA_MINING Constants Summary: Adaptive Bayes Network

Constant	Purpose
`abns_model_type`	Setting name representing ABN model type.
`abns_single_feature`	Setting value representing single feature ABN model.
`abns_multi_feature`	Setting value representing multi feature ABN model.
`abns_naive_bayes`	Setting value representing Naive Bayes ABN model.
`abns_max_build_minutes`	Setting name representing maximum time threshold to complete an ABN model build.
`abns_max_nb_predictors`	Setting name representing the maximum number of Naive Bayes predictors to be considered for building an ABN model of type `abns_naive_bayes`.

Table 23-5 DBMS_DATA_MINING Constants Summary: Naive Bayes

Constant	Purpose
`nabs_singleton_threshold`	Setting value for singleton threshold for Naive Bayes.
`nabs_pairwise_threshold`	Setting value for pair-wise threshold for Naive Bayes.

Table 23-6 DBMS_DATA_MINING Constants Summary: Support Vector Machines

Constant	Purpose
`svms_kernel_function`	Setting name representing the kernel function for SVM.
`svms_linear`	Setting value for Linear Kernel for SVM.
`svms_guassian`	Setting value for Gaussian Kernel for SVM.
`svms_kernel_cache_size`	Setting name representing for Kernel Cache Size for SVM.
`svms_conv_tolerance`	Setting name representing tolerance for SVM.
`svms_std_dev`	Setting name representing standard deviation for SVM.
`svms_complexity_factor`	Setting name representing complexity factor for SVM.
`svms_epsilon`	Setting name representing epsilon for SVM Regression.

Table 23-7 DBMS_DATA_MINING Constants Summary: Non-Negative Matrix Factorization

Constant	Purpose
`nmfs_num_iterations`	Setting name representing number of iterations.
`nmfs_conv_tolerance`	Setting name representing convergence tolerance.
`nmfs_random_seed`	Setting name representing random seed for NMF.

Table 23-8 DBMS_DATA_MINING Constants Summary: k-Means

Constant	Purpose
`kmns_distance`	Setting name representing distance function.
`kmns_euclidean`	Setting value representing Euclidean distance function.
`kmns_cosine`	Setting value representing cosine distance function.
`kmns_fast_cosine`	Setting value representing fast cosine distance function.
`kmns_iterations`	Setting name representing number of iterations.
`kmns_conv_tolerance`	Setting name representing convergence tolerance.
`kmns_split_criterion`	Setting name representing split criterion.
`kmns_variance`	Setting value representing variance as the split criterion.
`kmns_size`	Setting value representing size as the split criterion.
`kmns_block_growth`	Setting name representing growth factor for memory allocated to hold cluster data.
`kmns_num_bins`	Setting value for number of histogram bins.
`kmns_min_pct_attr_support`	Setting value for minimum percentage report required for attributes in rules.

Data Types

The DBMS_DATA_MINING and the DBMS_DATA_MINING_TRANSFORM packages use the data types shown in Table 23-9.

Table 23-9 DBMS_DATA_MINING Summary of Data Types

Data Type	Purpose
`DM_ABN_Detail`	This type represents each row of the model detail output generated by `GET_MODEL_DETAILS_ABN.`
`DM_ABN_Details`	This type represents the ABN model details generated by `GET_MODEL_DETAILS_ABN.`
`DM_Centroid`	This type represents the centroid of a cluster. It is used when retrieving cluster details using `GET_MODEL_DETAILS_KM`.
`DM_Child`	This type represents each child node of a cluster.
`DM_Children`	This type represents a set of children nodes for a given cluster identifier.
`DM_Cluster`	This type represents a cluster retrieved using `GET_MODEL_DETAILS_KM`.
`DM_Clusters`	This type represents a set of clusters.
`DM_Conditional`	This type represents each conditional probability from a set of conditional probabilities associated with each mining attribute used in a Naive Bayes or Adaptive Bayes Network model.
`DM_Conditionals`	This type represents conditional probabilities associated with a given mining attribute used in a Naive Bayes or Adaptive Bayes Network model. It is used when retrieving model details using `GET_MODEL_DETAILS_NB` or `GET_MODEL_DETAILS_ABN` respectively.
`DM_Histogram_Bin`	This type represents a histogram associated with a cluster identifier. It is used when retrieving cluster details using `GET_MODEL_DETAILS_KM`.
`DM_Histograms`	This type represents a set of histograms.
`DM_Item`	This type represents an item in a set of items.
`DM_Items`	This type represents the set of items in an ItemSet.
`DM_ItemSet`	This type represents an ItemSet.
`DM_ItemSets`	This type represents frequent itemsets in Association models.
`DM_Model_Settings`	This type represents the algorithm settings retrieved using the `GET_MODEL_SETTINGS` function.
`DM_Model_Signature`	This type represents a list of model signature attributes generated by `GET_MODEL_SIGNATURE`.
`DM_Modelname_List`	This type represents a list of model names provided as input for the parameter `model_names` in `EXPORT_MODEL` and `IMPORT_MODEL` procedures.
`DM_NB_Detail`	his type represents the each row of the model detail output generated by `GET_MODEL_DETAILS_NB`.
`DM_NB_Details`	This type represents the NB model details generated by `GET_MODEL_DETAILS_NB`.
`DM_Nested_Categoricals`	This type represents a nested table of categorical attributes, used for representing wide data.
`DM_Nested_Numericals`	This type represents a nested table of numerical attributes, used for representing wide data.
`DM_NMF_Attribute`	This type represents each attribute in an attribute set for NMF model details.
`DM_NMF_Attribute_Set`	This type represents a set of attributes that correspond to a feature identifier, returned by `GET_MODEL_DETAILS_NMF`.
`DM_NMF_Feature`	This type represents a feature in a NMF model.
`DM_NMF_Feature_Set`	This type represents a set of features returned by `GET_MODEL_DETAILS_NMF`.
`DM_Predicate`	This type represents each predicate in the set of predicates in a rule.
`DM_Predicates`	This type represents a set of predicates that constitute either the antecedent or the consequent of a rule.
`DM_Ranked_Attribute`	This type represents an entry in the set of ranked attribute returned by `GET_MODEL_DETAILS_AI`, ranked by the attribute's importance.
`DM_Ranked_Attributes`	This type represents a list of ranked attributes returned by `GET_MODEL_DETAILS_AI`.
`DM_Rule`	This type represents each rule in a list of rules generated by either `GET_ASSOCIATION_RULES` or `GET_MODEL_DETAILS_KM`.
`DM_Rules`	This type represents rules retrieved for Association Rules or k-means models using `GET_ASSOCIATION_RULES` and `GET_MODEL_DETAILS_KM` respectively.
`DM_SVM_Attribute`	This type represents each attribute in an attribute set for SVM model details.
`DM_SVM_Attribute_Set`	This type represents a set of attributes returned by `GET_MODEL_DETAILS_SVM` for a linear model.
`DM_SVM_Linear_Coeff`	This type represents an entry in the set of linear coefficients returned by `GET_MODEL_DETAILS_SVM`.
`DM_SVM_Linear_Coeff_Set`	This type represents the set of linear coefficients returned by `GET_MODEL_DETAILS_SVM` for an SVM model built using the linear kernel.

Exceptions

Table 23-10 lists the errors generated by DBMS_DATA_MINING.

Table 23-10 DBMS DATA_MINING Errors Summary

Oracle Error	Description
`ORA-40201`	Invalid input parameter %s.
`ORA-40202`	Column %s does not exist in the input table %s.
`ORA-40203`	Model %s does not exist.
`ORA-40204`	Model %s already exists.
`ORA-40205`	Invalid setting name %s.
`ORA-40206`	Invalid setting value for setting name %s.
`ORA-40207`	Duplicate or multiple function settings.
`ORA-40208`	Duplicate or multiple algorithm settings for function %s.
`ORA-40209`	Invalid setting: %s for function %s.
`ORA-40211`	Algorithm name: %s is invalid.
`ORA-40212`	Invalid target data type in input data for function: %.
`ORA-40213`	Contradictory values for settings: %s, %s.
`ORA-40214`	Duplicate setting: %s.
`ORA-40215`	Model %s is incompatible with current operation.
`ORA-40216`	Feature not supported.
`ORA-40219`	Apply result table %s is incompatible with current operation.
`ORA-40220`	Maximum number of attributes exceeded.
`ORA-40221`	Maximum target cardinality exceeded.
`ORA-40222`	Data mining model export failed, job name=%s, error=%s.
`ORA-40223`	Data mining model import failed, job name=%s, error=%s.
`ORA-40225`	Model is currently in use by another process.
`ORA-40251`	No support vectors were found.
`ORA-40252`	No target values were found.
`ORA-40253`	No target counter examples were found.
`ORA-40261`	Input data for model build contains negative values.
`ORA-40262`	NMF: number of features not between [1, %s].
`ORA-40271`	No statistically significant features were found.
`ORA-40272`	Apply rules prohibited for this model mode.
`ORA-40273`	Invalid model type %s for Adaptive Bayes Network algorithm.

User Views

Table 23-11 lists the user views provided by Oracle to obtain information about the models generated using DBMS_DATA_MINING

Table 23-11 DBMS_DATA_MINING Summary of User Views

User View	Purpose
`DM_USER_MODELS`	Lists all models in a given user's schema.

Operational Notes

The development methodology for data mining using the DBMS_DATA_MINING interface is divided into two phases.

The first phase includes your application data analysis and design, where you perform the following two steps:

Analyze your problem, and choose the mining function and algorithm.
Analyze the data to be used for building mining models (build data), testing predictive models (test data), and the new data on which the model will be applied (scoring data).

The second phase involves developing a mining application using DBMS_DATA_MINING and DBMS_DATA_MINING_TRANSFORM packages.

Prepare the build, test, and scoring data using the DBMS_DATA_MINING_TRANSFORM package or other third-party tool or direct SQL or PL/SQL utility scripts in a manner suitable for the chosen mining function and algorithm. An important caveat is that the three datasets referred to earlier have to be prepared in an identical manner for mining results to be meaningful. This is an optional step, required only if your data is not prepared for mining.
Prepare a settings table that overrides the default mining algorithm for a given mining function, and the default algorithm settings. This is also an optional step.
Build a mining model for the given training dataset.
For predictive models (classification and regression), test the model for its accuracy and measures of performance. This amounts to applying the model on the test data (that is, scoring the test data), and computing various metrics on the apply results.
Retrieve the model signature to determine the mining attributes that will be used by a given model for scoring. This information will help ascertain that the scoring data is suitable for a given model. This is an optional step.
Apply a classification, regression, clustering, or feature extraction model to new data to generate predictions and/or descriptive summaries and patterns about the data.
Retrieve the model details to understand why a model scored the data in a particular manner. This is an optional step.
Repeat steps 3 through 9 until you obtain satisfactory results.

See Oracle Data Mining Application Developer's Guide for more general discussion of the PL/SQL interface to Oracle Data Mining.

Settings Table

The settings table is a simple relational table with a fixed schema. You can choose the name of the settings table, but the column names and their types must be defined as specified:

(setting_name  VARCHAR2(30),
setting_value  VARCHAR2(128))

The values provided in the settings table override the default values assumed by the system. The values inserted into the setting_name column are one or more of several constants defined in the DBMS_DATA_MINING package. Depending on what the setting name denotes, the value for the setting_value column can be a predefined constant or the actual numerical or string value corresponding to the setting itself. The setting_value column is defined to be VARCHAR2, so you must cast numerical inputs to string using the TO_CHAR function before input into the settings table.

Table 23-12 through Table 23-17 list the various setting names and the valid setting values, with a brief explanation of each setting

Table 23-12 DBMS_DATA_MINING Function Settings

Algorithm Settings	Setting Value (with Permissible Value Ranges)
`algo_name`	Classification: One of: algo_naive_bayes algo_support_vector_machines algo_adaptive_bayes_network Regression: algo_support_vector_machines Association Rules: algo_apriori_association_rules Clustering: algo_kmeans Feature Extraction: algo_non_negative_matrix_factor Attribute Importance: algo_ai_mdl
`clas_priors_table_name`	`VARCHAR2` string denoting the name of a relational table of fixed schema containing prior probabilities. The schema of this table is provided in. This input is applicable only for classification algorithms. The prior probabilities table must be present in the current user's schema.
`clus_num_clusters`	TO_CHAR(numeric_expr >= 1) Number of clusters generated by a clustering algorithm. Default value is 10.
`feat_num_features`	`TO_CHAR(`numeric_expr `>= 1)` Number of features to be extracted. Default value estimated from the data by the algorithm.
`asso_max_rule_length`	`TO_CHAR(2 <=` numeric_expr `<= 20)` Maximum rule length for AR algorithm. Default value is 4.
`asso_min_confidence`	TO_CHAR(0 <= numeric_expr <= 1) Minimum confidence value for AR algorithm. Default value is 0.1.
`asso_min_support`	`TO_CHAR(0 <=` numeric_expr `<= 1)` Minimum support value for AR algorithm. Default value is 0.1.

Table 23-13 Algorithm Settings for Adaptive Bayes Network

Setting Name	Setting Value (with Permissible Value Ranges)
`abns_model_type`	Model type for Adaptive Bayes Network: `abns_single_feature` `abns_multi_feature` `abns_naive_bayes` Default value is `abns_multi_feature`.
`abns_max_build_minutes`	`TO_CHAR(`numeric_expr `>= 0)` The maximum time threshold for completion of model build. Default value is 0, which implies no time limit.
`abns_max_nb_predictors`	`TO_CHAR(`numeric_expr `> 0)` Maximum number of Naive Bayes predictors to be considered for model build, when the model type is chosen to be `abns_naive_bayes`. Default value is 10.
`abns_max_predictors`	`TO_CHAR(`numeric_expr `> 0)` Default is 25.

Table 23-14 Algorithm Settings for Naive Bayes

Setting Name Setting Value (with Permissible Value Ranges)

Setting Name	Setting Value (with Permissible Value Ranges)
`nabs_singleton_threshold`	`TO_CHAR (0 <=` numeric_expr `<=1)` Value of singleton threshold for NB algorithm. Default value is 0.01.
`nabs_pairwise_threshold`	`TO_CHAR (0 <=` numeric_expr `<=1)` Value of pairwise threshold for NB algorithm. Default value is 0.01.

nabs_singleton_threshold

TO_CHAR (0 <= numeric_expr <=1)

Value of singleton threshold for NB algorithm.

Default value is 0.01.

nabs_pairwise_threshold

TO_CHAR (0 <= numeric_expr <=1)

Value of pairwise threshold for NB algorithm.

Default value is 0.01.

Table 23-15 Algorithm Settings for Support Vector Machines

Setting Name	Setting Value (with Permissible Value Ranges)
`svms_kernel_function`	Kernel for Support Vector Machines: `svms_linear` (for Linear Kernel) `svms_gaussian` (for Gaussian Kernel) Default value is `svms_linear.`
`svms_kernel_cache_size`	`TO_CHAR(`numeric_expr `> 0)` Value of kernel cache size for SVM algorithm. Applies to Gaussian kernel only. Default value is 50000000 bytes.
`svms_conv_tolerance`	`TO_CHAR(`numeric_expr `> 0)` Convergence tolerance for SVM algorithm. Default value is 0.001.
`svms_std_dev`	`TO_CHAR(`numeric_expr `> 0)` Value of standard deviation for SVM algorithm. This is applicable only for Gaussian kernel. Default value estimated from the data by the algorithm.
`svms_complexity_factor`	`TO_CHAR(`numeric_expr `> 0)` Value of complexity factor for SVM algorithm. Default value estimated from the data by the algorithm.
`svms_epsilon`	`TO_CHAR(`numeric_expr `> 0)` Value of epsilon factor for SVM Regression. Default value estimated from the data by the algorithm.

Table 23-16 Algorithm Settings for Non-Negative Matrix Factorization

Setting Name Setting Value (with Permissible Value Ranges)

Setting Name	Setting Value (with Permissible Value Ranges)
`nmfs_random_seed`	`TO_CHAR(`numeric_expr`)` Seed for random generator. Default value is -1.
`nmfs_num_iterations`	`TO_CHAR(1 <=` numeric_expr `<= 500)` Maximum number of iterations for NMF algorithm. Default value is 50.
`nmfs_conv_tolerance`	TO_CHAR(0 < numeric_expr <= 0.5) Convergence tolerance for NMF algorithm. Default value is 0.05.

nmfs_random_seed

TO_CHAR(numeric_expr)

Seed for random generator.

Default value is -1.

nmfs_num_iterations

TO_CHAR(1 <= numeric_expr <= 500)

Maximum number of iterations for NMF algorithm.

Default value is 50.

nmfs_conv_tolerance

TO_CHAR(0 < numeric_expr <= 0.5)

Convergence tolerance for NMF algorithm.

Default value is 0.05.

Table 23-17 Algorithm Settings for k-Means

Setting Name	Setting Value (with Permissible Value Ranges)
`kmns_distance`	Distance Function for k-Means Clustering: `kmns_euclidean` `kmns_cosine` `kmns_fast_cosine` Default value is `kmns_euclidean.`
`kmns_iterations`	`TO_CHAR(0 <` numeric_expr `<= 20)` Number of iterations for k-Means algorithm. Default value is 3.
`kmns_conv_tolerance`	TO_CHAR(0 < numeric_expr <= 0.5) Convergence tolerance for k-Means algorithm. Default value is 0.01.
`kmns_split_criterion`	Split criterion for k-Means Clustering: `kmns_varianc`e `kmns_size` Default value is `kmns_variance.`
`kmns_num_bins`	Number of histogram bins. Specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin. Range > 0. Default value is 10.
`kmns_block_growth`	`TO_CHAR(1 <` numeric_expr `<= 5)` Growth factor for memory allocated to hold cluster data. Default value is 2.
`kmns_min_pct_attr_support`	Minimum percentage support required for attributes in rules. Specifies the minimum percentage of values for an attribute in a given cluster required to include this attribute in the rule description of the cluster. That is, if the required support level is not met, the attribute would be omitted from the rule. This would allow retaining in the rule only the well-represented attributes. Setting the parameter value too high in data with missing values can result in very short or even empty rules. Range >= 0 and <= 1. Default is 0.1.

You can create a settings table as shown in the example that follows for an SVM classification model, and edit the individual values using SQL DML.

CREATE TABLE drugstore_settings (
  setting_name VARCHAR2(30),
  setting_value VARCHAR2(128))
BEGIN
-- override the default for convergence tolerance for SVM Classification
INSERT INTO drugstore_model_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_conv_tolerance, TO_CHAR(0.081));
COMMIT;
END;

The table function GET_DEFAULT_SETTINGS provides you all the default settings for mining functions and algorithms. If you intend to override all the default settings, you can create a seed settings table and edit them using SQL DML.

BEGIN
CREATE TABLE drug_store_settings AS
SELECT setting_name, setting_value
   FROM TABLE (DBMS_DATA_MINING.GET_DEFAULT_SETTINGS
   WHERE setting_name LIKE 'SVMS_%';
-- update the values using appropriate DML
END;

You can also create a settings table based on another model's settings using GET_MODEL_SETTINGS, as shown in the following example:

BEGIN
CREATE TABLE my_new_model_settings AS
SELECT setting_name, setting_value 
  FROM TABLE (DBMS_DATA_MINING.GET_MODEL_SETTINGS('my_other_
model'));
END;

Prior Probabilities Table

Consult Oracle Data Mining Concepts for an explanation of the prior probabilities table. You can specify a prior probabilities table as an optional function setting when building classification models.

You must create the prior probabilities table using the fixed schema shown in the following code samples. For numerical targets, use the following schema:

target_value      NUMBER
prior_probability NUMBER

For categorical targets, use the following schema:

target_value      VARCHAR2
prior_probability NUMBER

Next, provide the name of the prior probabilities table as input to the setting_value column in the settings table, with the corresponding value for the setting_name column to be DBMS_DATA_MINING.clas_priors_table_name, as shown:

BEGIN
INSERT INTO drugstore_settings (setting_name, setting_value) 
VALUES (DBMS_DATA_MINING.class_priors_table_name, 'census_priors');
COMMIT;
END;

Cost Matrix Table

Consult Oracle Data Mining Concepts for an explanation of the cost matrix. You must create a cost matrix table with the fixed schema shown in the following code samples. For numerical targets, use the following schema:

actual_target_value    NUMBER
predicted_target_value NUMBER
cost                   NUMBER

For categorical targets, use the following schema:

actual_target_value    VARCHAR2
predicted_target_value VARCHAR2
cost                   NUMBER

The DBMS_DATA_MINING package enables you to evaluate the cost of predictions from classification models in an iterative manner during the experimental phase of mining, and to eventually apply the optimal cost matrix to predictions on the actual scoring data in a production environment.

The data input to each COMPUTE procedure in the package is the result generated from applying the model on test data. In addition, if you also provide a cost matrix as an input, the COMPUTE procedure generates test results taking the cost matrix into account. This enables you to experiment with various costs for a given prediction against the same APPLY results, without rebuilding the model and applying it against the same test data for every iteration.

Once you arrive at an optimal cost matrix, you can then input this cost matrix to the RANK_APPLY procedure along with the results of APPLY on your scoring data. RANK_APPLY will provide your new data ranked by cost.

Summary of DBMS_DATA_MINING Subprograms

Table 23-18 DBMS_DATA_MINING Package Subprograms

Data Type	Purpose
APPLY Procedure	Applies a model to scoring data
CREATE_MODEL Procedure	Creates (builds) a mining model
COMPUTE_CONFUSION_MATRIX Procedure	Computes the confusion matrix from the `APPLY` results on test data for a classification model; also provides the accuracy of the model
COMPUTE_LIFT Procedure	Computes lift for a given positive target value from the APPLY results on test data for a classification model
COMPUTE_ROC Procedure	Computes Receiver Operating Characteristic for target attributes with binary class from the APPLY results on test data for a classification model
DROP_MODEL Procedure	Drops a model
EXPORT_MODEL Procedure	Exports one or more models from a schema
GET_ASSOCIATION_RULES Function	This table function returns the rules from an Association model
GET_DEFAULT_SETTINGS Function	This table function returns all the default settings for all mining functions and algorithms
GET_FREQUENT_ITEMSETS Function	Returns a set of rows that represent the frequent itemsets from an Association model
GET_MODEL_DETAILS_ABN Function	Provides the details of an Adaptive Bayes Network model
GET_MODEL_DETAILS_KM Function	Provides the details of a k-Means model
GET_MODEL_DETAILS_NB Function	Provides the details of a Naive Bayes model
GET_MODEL_DETAILS_NMF Function	Provides the details of an NMF model
GET_MODEL_DETAILS_SVM Function	Provides the details of a SVM model with a linear kernel
GET_MODEL_SETTINGS Function	Provides the settings used to build a model
GET_MODEL_SIGNATURE Function	Provides the signature of a model
IMPORT_MODEL Procedure	Imports one or more models into the current schema
RANK_APPLY Procedure	Ranks the predictions from the APPLY results for a classification model
RENAME_MODEL Procedure	Renames a model

APPLY Procedure

This procedure applies a mining model to the data to be scored, and generates the APPLY results in a table. This operation is applicable for predictive models (classification, regression) and also for descriptive models (clustering, feature extraction).

Syntax

DBMS_DATA_MINING.APPLY (
      model_name           IN VARCHAR2,
      data_table_name      IN VARCHAR2,
      case_id_column_name  IN VARCHAR2,
      result_table_name    IN VARCHAR2,
      data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-19 APPLY Procedure Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).
`data_table_name`	Name of table or view representing data to be scored.
`case_id_column_name`	Name of the case identifier column.
`result_table_name`	Name of the table to store apply results.
`data_schema_name`	Name of the schema containing the data to be scored.

Usage Notes

The data provided for APPLY should match the data provided to CREATE_MODEL in terms of the schema definition and relevant content. The GET_MODEL_SIGNATURE function provides this information. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing. The case identifier is not considered to be a mining attribute during APPLY.

You must provide the name of the table in which the results of the apply operation are to be stored. APPLY creates a table with an algorithm-specific fixed schema in the user schema that owns the model.

The behavior of an APPLY operation is analogous to a SQL query operation, even though it is packaged as a procedure. It does not update the model contents and does not have any contention with CREATE_MODEL, DROP_MODEL, or RENAME_MODEL operations. The corollary is that if you potentially drop or rename a model while a model is being applied to scoring data, the APPLY operation may discontinue with partial or unpredictable results.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will match the case identifier column name provided by you. The type of incoming case-id column is preserved in Apply output.

Classification Algorithms

The table containing the APPLY results for all classification models has the same schema. For numerical targets, the results table will have the schema as shown:

case_id      VARCHAR2/NUMBER 
prediction   NUMBER
probability  NUMBER

For categorical targets, the results table will have the following schema:

case_id      VARCHAR2/NUMBER 
prediction   VARCHAR2
probability  NUMBER

Regression using Support Vector Machines

The results table will have the following schema:

case_id     VARCHAR2/NUMBER 
prediction  NUMBER

Clustering using k-Means

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the cluster identifier corresponding to a case, and the associated probability. The results table will have the schema as shown:

case_id      VARCHAR2/NUMBER 
cluster_id   NUMBER
probability  NUMBER

Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the feature identifier corresponding to a case, and the associated match quality. The results table will have the schema as shown:

case_id        VARCHAR2/NUMBER 
feature_id     NUMBER
match_quality  NUMBER

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if build data was pre-processed in any manner,
 * perform the same pre-processing steps on the
 * scoring data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
dbms_data_mining.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_apply',
  case_id_column_name => 'person_id',
  result_table_name => 'census_apply_result');
END;
/

-- View Apply Results
SELECT case_id, prediction, probability
  FROM census_apply_result;

CREATE_MODEL Procedure

This procedure creates a mining model for a given mining function using a specified mining algorithm.

Syntax

DBMS_DATA_MINING.CREATE_MODEL (
      model_name            IN VARCHAR2,
      mining_function       IN VARCHAR2,
      data_table_name       IN VARCHAR2,
      case_id_column_name   IN VARCHAR2,
      target_column_name    IN VARCHAR2 DEFAULT NULL,
      settings_table_name   IN VARCHAR2 DEFAULT NULL,
      data_schema_name      IN VARCHAR2 DEFAULT NULL,
      settings_schema_name  IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-20 CREATE_MODEL Procedure Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).
`mining_function`	Constant representing the mining function.
`data_table_name`	Name of the table or view containing the training data.
`case_id_column_name`	Name of the case identifier column.
`target_column_name`	Name of the target column -- `NULL` for descriptive models.
`settings_table_name`	Name of the table or view containing algorithm settings.
`data_schema_name`	Name of the schema hosting the training data.
`settings_schema_name`	Name of the schema hosting the settings table/view.

Usage Notes

The data provided to all subsequent operations such as APPLY must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to subsequent operations such as APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing. The case identifier column is not considered to be a mining attribute during CREATE_MODEL.

You can view the default settings for each algorithm through GET_DEFAULT_SETTINGS. You can override the defaults by providing a settings table specifying your choice of mining algorithm and relevant overriding algorithm settings.

Once a model has been built, information about the attributes used for model build can be obtained from GET_MODEL_SIGNATURE. To inspect or review model contents, you can use any of the algorithm-specific GET_MODEL_DETAILS functions.

The behavior of the CREATE_MODEL is analogous to a SQL DDL CREATE operation. It contends with RENAME_MODEL and DROP_MODEL operations.

Note: The CREATE_MODEL operation creates a set of system tables in the owner's schema to store the patterns and information that constitutes a mining model for a particular algorithm.The names of these tables have the prefix DM$. The number, schema, and content of these tables is Oracle proprietary and may change from release to release. You must not direct any queries or updates against these system tables.

Examples

Assume that you need to build a classification model using Support Vector Machines algorithm.

/* prepare a settings table to override default
 * settings (Naïve Bayes is the default classifier)
 */
CREATE TABLE census_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(128));

BEGIN
/* indicate that SVM is the chosen classifier */
INSERT INTO census_settings VALUES (
dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);

/* override the default value for complexity factor */
INSERT INTO census_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_complexity_factor, TO_CHAR(0.081));
COMMIT;

/* build a model with name census_model */
dbms_data_mining.create_model(
  model_name => 'census_model',
  mining_function => DBMS_DATA_MINING.CLASSIFICATION,
  data_table_name => 'census_2d_build',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  settings_table_name => 'census_settings');
END;
/

COMPUTE_CONFUSION_MATRIX Procedure

This procedure computes the confusion matrix for a classification model and also provides the accuracy of the model. See Oracle Data Mining Concepts for a description of confusion matrix.

The inputs are a table containing the results of applying the model on the test data, and a table that contains only the target and case identifier columns from the test data.

Syntax

DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
      accuracy                     OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      confusion_matrix_table_name  IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      cost_matrix_table_name       IN  VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 23-21 COMPUTE_CONFUSION_MATRIX Procedure Parameters

Parameter	Description
`accuracy`	Accuracy of the model.
`apply_result_table_name`	Name of the table or view containing the results of an `APPLY` operation on the test dataset (see Usage Notes).
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes for required schema specification).
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.
`target_column_name`	Name of the target column.
`confusion_matrix_table_name`	Name of the table into which the confusion matrix is to be generated.
`score_column_name`	Name of the column representing the score from the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score from the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`cost_matrix_table_name`	Name of the fixed-schema cost matrix table.
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table.
`target_schema_name`	Name of the schema hosting the targets table.
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table.

Usage Notes

You can also provide a cost matrix as an optional input in order to have the cost of predictions reflected in the results.

It is important to note that the data inputs to COMPUTE_CONFUSION_MATRIX do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the confusion matrix and accuracy as outputs. The quality of the results is dependent on the quality of the data.

The data provided for testing your classification model must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing.

Before you use the COMPUTE_CONFUSION_MATRIX procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_CONFUSION_MATRIX procedure represents the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. The parameter target_table_name reflects this input. The schema for this view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

The schema for this view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

You must provide the name of the table in which the confusion matrix is to be generated. The resulting fixed schema table will always be created in the schema owning the model.

For numerical target attributes, the confusion matrix table will have the schema:

(actual_target_value    NUMBER,
predicted_target_value  NUMBER,
value                   NUMBER)

For categorical target attributes, the confusion matrix table will have the schema:

actual_target_value     VARCHAR2,
predicted_target_value  VARCHAR2,
value                   NUMBER

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
dbms_data_mining.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_test',
  case_id_column_name => 'person_id',
  result_table_name => 'census_test_result');
CREATE VIEW census_2d_test_view as select person_id, class from census_2d_test;

/* now compute the confusion matrix from the two
 * data streams, also providing a cost matrix as input.
 */
dbms_data_mining.compute_confusion_matrix (
  accuracy => v_accuracy,
  apply_result_table_name => 'census_test_result',
  target_table_name => 'census_2d_test_view',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  confusion_matrix_table_name => 'census_confusion_matrix',
  cost_matrix_table_name => 'census_cost_matrix');
dbms_output.put_line('Accuracy of the model: ' || v_accuracy);
END;
/

-- View the confusion matrix using Oracle SQL
SELECT actual_target_value, predicted_target_value, value
  FROM census_confusion_matrix;

COMPUTE_LIFT Procedure

This procedure computes a lift table for a given positive target for a classification model. See Oracle Data Mining Concepts for a description of lift.

The inputs are a table containing the results of applying the model on the test data, and a table that contains only the target and case identifier columns from the test data.

Syntax

DBMS_DATA_MINING.COMPUTE_LIFT (
      apply_result_table_name      IN VARCHAR2,
      target_table_name            IN VARCHAR2,
      case_id_column_name          IN VARCHAR2,
      target_column_name           IN VARCHAR2,
      lift_table_name              IN VARCHAR2,
      positive_target_value        IN VARCHAR2,
      score_column_name            IN VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN VARCHAR2 DEFAULT 'PROBABILITY',
      num_quantiles                IN NUMBER DEFAULT 10,
      cost_matrix_table_name       IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN VARCHAR2 DEFAULT NULL,
      target_schema_name           IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-22 COMPUTE_LIFT Procedure Parameters

Parameter	Description
`apply_result_table_name`	Name of the table or view containing the results of an `APPLY` operation on the test dataset (see Usage Notes).
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes for required schema specification).
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.
`target_column_name`	Name of the target column.
`lift_table_name`	Name of the table into which the lift table is to be generated.
`positive_target_value`	Value of the positive target. If the target column is of `NUMBER` type, use `TO_CHAR()` operator to provide the value as a string.
`score_column_name`	Name of the column representing the score in the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`num_quantiles`	Number of quantiles required in the lift table.
`cost_matrix_table_name`	Name of the cost matrix table.
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table.
`target_schema_name`	Name of the schema hosting the targets table.
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table.

Usage Notes

You can also provide a cost matrix as an optional input to have the cost of predictions reflected in the results.

It is important to note that the data inputs to COMPUTE_LIFT do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the lift table as output. The quality of the results depends on the quality of the data.

Before you use the COMPUTE_LIFT procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_LIFT procedure represents the table that will be generated in your schema as a result of the APPLY operation.

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name             NUMBER)

The schema for this view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name            NUMBER)

You must provide the name of the table in which the lift table is to be generated. The resulting fixed schema table is always created in the schema that owns the model.

The resulting lift table will have the following schema:

(quantile_number               NUMBER,
 quantile_total_count          NUMBER,
 quantile_target_count         NUMBER,
 percent_records_cumulative    NUMBER,
 lift_cumulative               NUMBER,
 target_density_cumulative     NUMBER,
 targets_cumulative            NUMBER,
 non_targets_cumulative        NUMBER,
 lift_quantile                 NUMBER,
 target_density                NUMBER)

The output columns are explained in Oracle Data Mining Concepts.

Examples

DECLARE
  v_sql_stmt VARCHAR2(4000);
BEGIN

/* apply the model census_model on test data */
dbms_data_mining.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_test,
  case_id_column_name => 'person_id',
  result_table_name => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */

/* now compute lift with the default 10 quantiles
 * from the two data streams
 */
dbms_data_mining.compute_lift (
  apply_result_table_name => 'census_test_result',
  target_table_name => 'census_2d_test_view',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  lift_table_name => 'census_lift',
  positive_target_value => '1',
  cost_matrix_table_name => 'census_cost_matrix');
END;
/

-- View the lift table contents using SQL
SELECT *
  FROM census_lift;

COMPUTE_ROC Procedure

This procedure computes the receiver operating characteristic (ROC) for a binary classification model. See Oracle Data Mining Concepts for a description of receiver operating characteristic.

The inputs are a table containing the results of applying the model on the test data, and a table that contains only the target and case identifier columns from the test data.

Syntax

DBMS_DATA_MINING.COMPUTE_ROC (
      roc_area_under_curve         OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      roc_table_name               IN  VARCHAR2,
      positive_target_value        IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 23-23 COMPUTE_ROC Procedure Parameters

Parameter	Description
`roc_area_under_the_curve`	A measure of model accuracy, specifically, the probability that the model will correctly rank a randomly chosen pair of rows of opposite classes.
`apply_result_table_name`	Name of the table or view containing the results of an `APPLY` operation on the test dataset (see Usage Notes).
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes for required schema specification).
`case_id_column_name`	Name of the case identifier column in the test data set.
`roc_table_name`	Name of the table into which ROC results are to be generated.
`score_column_name`	Name of the column representing the score in the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table.
`target_schema_name`	Name of the schema hosting the targets table.

Usage Notes

It is important to note that the data inputs to COMPUTE_ROC do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the ROC table as output. The quality of the results depends on the quality of the data.

Before you use the COMPUTE_ROC procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_ROC procedure represents the table that will be generated in your schema as a result of the APPLY operation.

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           NUMBER

The schema for this view or table name for a categorical target attribute is:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           VARCHAR2

You must provide the name of the table in which the ROC table is to be generated. The resulting fixed schema table will always be created in the schema that owns the model. The resulting ROC table will have the following schema:

(probability              NUMBER,
 true_positives           NUMBER,
 false_negatives          NUMBER,
 false_positives          NUMBER,
 true_negatives           NUMBER,
 true_positive_fraction   NUMBER,
 false_positive_fraction  NUMBER)

The output columns are explained in Table 23-24.

Table 23-24 COMPUTE_ROC Output

Output Column	Description
`probability`	Minimum predicted positive class probability resulting in a positive class prediction. Thus, different threshold values result in different hit rates and `false_alarm_rates`.
`true_negatives`	Negative cases in the test data with predicted probabilities below the `probability_threshold` (correctly predicted).
`true_positives`	Positive cases in the test data with predicted probabilities above the `probability_threshold` (correctly predicted).
`false_negatives`	Positive cases in the test data with predicted probabilities below the `probability_threshold` (incorrectly predicted).
`false_positives`	Negative cases in the test data with predicted probabilities above the `probability_threshold` (incorrectly predicted).
`true_positive_fraction`	t`rue_positives/(true_positives + false_negatives)`
`false_positive_fraction`	false_positives/(false_positives + true_negatives)

The typical use scenario is to examine the true_positive_fraction and false_positive_fraction to determine the most desirable probability_threshold. This threshold is then used to predict class values in subsequent apply operations. For example, to identify positively predicted cases in probability rank order from an apply result table, given a probability_threshold:

select case_id_column_name from apply_result_table_name where probability > 
probability_threshold order by probability DESC;

There are two procedures one might use to identify the most desirable probability_threshold. One procedure applies when the relative cost of positive class versus negative class prediction errors are known to the user. The other applies when such costs are not well known to the user. In the first instance, one can apply the relative costs to the ROC table to compute the minimum cost probability_threshold. Suppose the relative cost ratio, Positive Class Error Cost / Negative Class Error Cost = 20. Then execute a query like:

WITH cost AS (
  SELECT probability_threshold, 20 * false_negatives + false positives cost 
    FROM ROC_table 
  GROUP BY probability_threshold), 
    minCost AS (
      SELECT min(cost) minCost 
        FROM cost)
      SELECT max(probability_threshold)probability_threshold 
        FROM cost, minCost 
    WHERE cost = minCost;

If relative costs are not well known, the user simply scans the values in the table (in sorted order) and makes a determination about which of the displayed trade-offs (misclassified positives versus misclassified negatives) is most desirable:

select * from ROC_table order by probability_threshold

Examples

Assume that you have built a classification model census_model using the SVM algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_test',
  case_id_column_name => 'person_id',
  result_table_name => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */
v_sql_stmt :=
'CREATE VIEW census_2d_test_view AS ' ||
'SELECT person_id, class FROM census_2d_test';
EXECUTE IMMEDIATE v_sql_stmt;

/* now compute the receiver operating characterestics from
 * the two data streams, also providing a cost matrix
 * as input.
 */
DBMS_DATA_MINING.compute_roc (
  accuracy => v_accuracy,
  apply_result_table_name => 'census_test_result',
  target_table_name => 'census_2d_test_view',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  roc_table_name => 'census_roc',
  cost_matrix_table_name => 'census_cost_matrix');
END;
/

-- View the ROC results using Oracle SQL
SELECT *
  FROM census_roc;

DROP_MODEL Procedure

This procedure drops an existing mining model from the user's schema.

Syntax

DBMS_DATA_MINING.DROP_MODEL (model_name IN VARCHAR2);

Parameters

Table 23-25 DROP_MODEL Procedure Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Usage Notes

You can use DROP_MODEL to drop an existing mining model.

The behavior of the DROP_MODEL is similar to a SQL DDL DROP operation. It blocks RENAME_MODEL and CREATE_MODEL operations. It does not block or block on APPLY, which is a SQL query-like operation that does not update any model data.

If an APPLY operation is using a model, and you attempt to drop the model during that time, the DROP will succeed and APPLY will return indeterminate results. This is in line with the conventional behavior in the RDBMS, where DDL operations do not block on Query operations.

Examples

Assume the existence of a model census_model. The following example shows how to drop this model.

BEGIN
  DBMS_DATA_MINING.drop_model(model_name => 'census_model');
END;
/

EXPORT_MODEL Procedure

This procedure exports specified data mining models into a dump file set.

Syntax

DBMS_DATA_MINING.EXPORT_MODEL (
      filename          IN VARCHAR2,
      directory         IN VARCHAR2,
      model_filter      IN VARCHAR2 DEFAULT NULL,
      filesize          IN VARCHAR2 DEFAULT NULL,
      operation         IN VARCHAR2 DEFAULT NULL,
      remote_link       IN VARCHAR2 DEFAULT NULL,
      jobname           IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-26 EXPORT_MODEL Procedure Parameters

Parameter	Description
`filename`	Name of the dump file set must be unique in the directory. A dump file set may contain one or more files. The number of files in a file set is determined by the size of the exporting data and the specified file size (see `filesize`). If only one file is created, it is named `filename>01.dmp`. If multiple files are created, they are named sequentially as `filename01.dmp`, `filename02.dmp`, and so forth.
`directory`	Name of the directory object where the dump file set is to be located. The directory object must be created before export. You must have `WRITE` privileges on the directory object and the corresponding file system directory.
`model_filter`	Filter that specifies models to be exported. See Usage Notes for details.
`filesize`	Size of the dump file; may be specified in bytes, kilobytes (K), megabytes (M), or gigabytes (G). Defaults to 50 MB.
`operation`	Use one of the literals '`EXPORT'/'ESTIMATE`'; defaults to `'EXPORT'`;.
`remote_link`	A valid DB link or `NULL`. When a DB link is provided, exporting DM models from a remote database is allowed. Defaults to `NULL` for local operations, namely, exporting models in the current database.
`jobname`	Name of the export job, limited to 30 characters. Must be unique to that schema, that is, there is no active job using the same name. If not supplied, a default name is provided in the form of `usernam_exp_datapump_job_id><dm_seq>`, for example, `SCOTT_exp_13410`. A log file is created in the dump file with name `jobname.log`.

Usage Notes

Use EXPORT_MODEL to export all or specific data mining models from the source. This procedure creates a dump file set that includes one or more files. The location of the dump files is specified by parameter directory, which is the name of a directory object created before this procedure is called. The user must have WRITE privileges on this directory object. The dump file name must be unique. When the export operation completes successfully, the dump file name is automatically expanded to filename01.dmp even if there is only one file in the dump set.

A log file is created for every successful export operation in the directory mapped by directory. If jobname is specified, the log is named jobname.log. If jobname is set to NULL (the default), the log is named as USERNAME_exp_nnnn.log, where nnnn is a number. If jobname is provided, it must be unique. Parameter model_filter is used to specify models to be exported; its use is indicated in the following table:

Table 23-27 Table MODEL_FILTER for Export

Value	Meaning	Notes
`NULL`	Default. Export all models from the user schema.	none
`ALL`	Export all models from the user schema.	Same as `NULL`; useful to improve code readability.
A valid `WHERE` clause, such as:	Specify which models are to be exported.	Any string that can be appended to `'select name from dm_user_models where'` and make it a valid SQL query.
`'name= "super_model"`'	Export the named model.	none
`'name IN ("model1","model2","model3")'`	Export 3 named models.	none
`'ALGORITHM_NAME= "NAIVE_BAYES"'`	Export all NB models.	none
`'FUNCTION_NAME="CLASSIFICATION"'`	Export all classification models.	none
`'name LIKE "NB\_MODEL%"ESCAPE"\"'`	Export all models whose names start with "`NB_MODEL`".	none

EXPORT_MODEL is not mutually exclusive with the DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. In other words, if an export operation is currently underway and the model is dropped at that time, then the results in the exported dump file are unpredictable.

Examples

The following example shows exporting one model from the current user schema into a dump file. The dump file will be located in operating system directory /home/models, which is mapped to a directory object DM_DUMP. The directory object DM_DUMP must be created before executing the sample, and the user must be granted WRITE privileges on it.

DECLARE
  job_name                              VARCHAR2(32) := 'model_exp_001';
  filename                              VARCHAR2(32);
BEGIN     
  filename := job_name;
  DBMS_DATA_MINING.export_model(
                      filename =>filename,
                      directory =>'DM_DUMP',
                      metadata_filter => 'name=''NB_MODEL_01''',
                      remote_link => NULL,
                      filesize => '30M',
                      operation => 'EXPORT',
                      job_name => job_name);
  dbms_output.put_line(
                      'Export_model '||job_name||' completed sucessfully!');
END;
/

GET_ASSOCIATION_RULES Function

This table function returns the rules from an Association model. The rows are an enumeration of the rules generated during the creation of the model.

Syntax

DBMS_DATA_MINING.GET_ASSOCIATION_RULES (
   model_name            IN VARCHAR2)
 RETURN DM_Rules pipelined;

Parameters

Table 23-28 GET_ASSOCIATION_RULES Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-29 GET_ASSOCIATION RULES Function Return Values

Return Value Description

Return Value	Description
`DM_Rules`	Represents a set of rows with schema: (rule_id INTEGER, antecedent DM_Predicates, consequent DM_Predicates, rule_support NUMBER, rule_confidence NUMBER)
`DM_Predicates`	Is a nested table with schema: (attribute_name VARCHAR2(30), conditional_operatoar CHAR(2), attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

DM_Rules

Represents a set of rows with schema:

(rule_id          INTEGER,
antecedent        DM_Predicates,
consequent        DM_Predicates,
rule_support      NUMBER,
rule_confidence   NUMBER)

DM_Predicates

Is a nested table with schema:

(attribute_name         VARCHAR2(30),
conditional_operatoar   CHAR(2),
attribute_num_value     NUMBER,
attribute_str_value     VARCHAR2(4000),
attribute_support       NUMBER,
attribute_confidence    NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

rule_id           INTEGER,
antecedent        DM_Predicates,
consequent        DM_Predicates,
rule_support      NUMBER,
rule_confidence   NUMBER

DM_Predicates is a collection of DM_Predicate objects. When un-nested, each object maps to a row of the form:

attribute_name         INTEGER,
conditional_operator   CHAR(2),
attribute_num_value    NUMBER,
attribute_str_value    VARCHAR2,
attribute_support      NUMBER,
attribute_confidence   NUMBER

The significance of piped output is that each row is materialized by the table function as soon as it is read from model storage, without any latency or wait for the generation of the complete DM_Rules object. All GET operations use pipelining. For more information on pipelined, parallel table functions, consult the PL/SQL User's Guide and Reference.

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

The following example demonstrates an Association model build followed by an invocation of GET_ASSOCIATION_RULES table function from Oracle SQL.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS
SELECT *
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE census_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

/* build an AR model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'market_model',
  function => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name => 'market_build',
  case_id_column_name => 'item_id',
  target_column_name => NULL,
  settings_table_name => 'census_settings');
END;
/

-- View the (unformatted) rules from SQL/Plus
SELECT rule_id, antecedent, consequent, rule_support,
       rule_confidence
  FROM GET_ASSOCIATION_RULES('market_model'));

-- see ardemo.sql for retrieving formatted rules
/

GET_DEFAULT_SETTINGS Function

This table function returns all the default settings for all mining functions and algorithms supported in the DBMS_DATA_MINING package.

Syntax

DBMS_DATA_MINING.GET_DEFAULT_SETTINGS;

Return Values

Table 23-30 GET_ASSOCIATION RULES Function Return Values

Return Value Description

Return Value	Description
`DM_Model_Settings`	Represents a set of rows with schema: (setting_name VARCHAR2(30), setting_value VARCHAR2(128))

DM_Model_Settings

Represents a set of rows with schema:

(setting_name VARCHAR2(30),
setting_value  VARCHAR2(128))

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(setting_name VARCHAR2(30),
setting_value  VARCHAR2(128))

This function is particularly useful if you do not know what settings are associated with a particular function or algorithm, and you want to override some or all of them.

Examples

For example, if you want to override some or all of k-Means clustering settings, you can create a settings table as shown, and update individual settings as required.


BEGIN
  CREATE TABLE mysettings AS
  SELECT * 
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
   WHERE setting_name LIKE 'KMNS%';
  -- now update individual settings as required
  UPDATE mysettings
     SET setting_value = 0.02
   WHERE setting_name = dbms_data_mining.kmns_min_pct_attr_support;
END;
/

GET_FREQUENT_ITEMSETS Function

This table function returns a set of rows that represent the frequent itemsets from an Association model. The rows are an enumeration of the frequent itemsets generated during the creation of the model. For a detailed description of frequent itemsets, consult Oracle Data Mining Concepts.

Syntax

DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS (
    model_name        IN VARCHAR2)
  RETURN DM_ItemSets pipelined;

Parameters

Table 23-31 GET_MODEL_DETAILS_ABN Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-32 GET_FREQUENT_ITEMSETS Function Return Values

Return Value Description

Return Value	Description
`DM_ItemSets`	Represents a set of rows with schema: (itemsets_id NUMBER, items DM_items, support NUMBER, number_of_items NUMBER)

DM_ItemSets

Represents a set of rows with schema:

(itemsets_id      NUMBER,
items             DM_items,
support           NUMBER,
number_of_items   NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(itemsets_id      NUMBER,
items             DM_items,
support           NUMBER,
number_of_items   NUMBER)

DM_Items is a nested table of VARCHAR2 strings representing individual item names.

The examples shown in this section describe how to un-nest the values from each of the columns discussed in the preceding sections.

Examples

The following example demonstrates an Association model build followed by an invocation of GET_FREQUENT_ITEMSETS table function from Oracle SQL.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS

    SELECT *

  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE census_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

/* build a AR model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'market_model',
  function => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name => 'market_build',
  case_id_column_name => 'item_id',
  target_column_name => NULL,
  settings_table_name => 'census_settings');
END;
/

-- View the (unformatted) Itemsets from SQL/Plus
SELECT itemset_id, items, support, number_of_items
  FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS('market_model'));

GET_MODEL_DETAILS_ABN Function

This table function returns a set of rows that provide the details of an Adaptive Bayes Network model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN (
    model_name         IN VARCHAR2)
  RETURN DM_ABN_Details

Parameters

Table 23-33 GET_MODEL_DETAILS_ABN Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-34 GET_MODEL_DETAILS_ABN Function Return Values

Return Value Description

Return Value	Description
`DM_ABN_Details`	Represents a set of rows with schema: (attribute_name VARCHAR2(30), attribute_num_value NUMBER attribute_str_val VARCHAR2(4000), probability NUMBER, conditionals DM_Conditionals)
`DM_Conditionals`	Represents a set of rows `DM_Conditional` with schema: (attribute_name VARCHAR2(30), attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), conditional_probability NUMBER)

DM_ABN_Details

Represents a set of rows with schema:

(attribute_name       VARCHAR2(30),
attribute_num_value   NUMBER
attribute_str_val     VARCHAR2(4000),
probability           NUMBER,
conditionals          DM_Conditionals)

DM_Conditionals

Represents a set of rows DM_Conditional with schema:

(attribute_name           VARCHAR2(30),
attribute_num_value       NUMBER,
attribute_str_value       VARCHAR2(4000),
conditional_probability   NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS.

Usage Notes

The table function pipes out rows with the schema:

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

This function returns details only for 'single feature' ABN model.

Examples

The following example demonstrates an ABN model build followed by an invocation of GET_MODEL_DETAILS_ABN table function from Oracle SQL.

BEGIN
  -- prepare a settings table to override default algorithm
  CREATE TABLE abn_settings (setting_name VARCHAR2(30),
  setting_value 
VARCHAR2(128));
  INSERT INTO abn_settings VALUES (dbms_data_mining.algo_name,
    dbms_data_mining.algo_adaptive_bayes_network);
 
  -- create a model
  DBMS_DATA_MINING.CREATE_MODEL (
    model_name => 'abn_model',
    function => DBMS_DATA_MINING.CLASSIFICATION,
    data_table_name => 'abn_build',
    case_id_column_name => 'id',
    target_column_name => NULL,
    settings_table_name => 'abn_settings');
END;
/
-- View the (unformatted) results from SQL/Plus
SELECT *
    FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN('abn_model');

GET_MODEL_DETAILS_KM Function

This table function returns a set of rows that provide the details of a k-Means clustering model. The rows are an enumeration of the clustering patterns generated during the creation of the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_KM (
  model_name         IN VARCHAR2)
RETURN DM_Clusters pipelined;

Parameters

Table 23-35 GET_MODEL_DETAILS_KM Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-36 GET_MODEL_DETAILS_KM Function Return Values

Return Value	Description
`DM_Clusters`	Represents a set of rows of `DM_Cluster` with schema: (id INTEGER, record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, child DM_Children) centroid DM_Centroids, histogram DM_Histogram, rule DM_Rule)
`DM_Children`	Is a nested table of `DM_Child` with schema: (id NUMBER)
`DM_Centroids`	Is a nested table of `DM_Centroid` with schema: (attribute_name VARCHAR2(30) mean NUMBER, mode_value VARCHAR2(30), variance NUMBER);
`DM_Histograms`	Is a nested table of DM_Histogram_bin with schema: (attribute_name VARCHAR2(30), bin_id NUMBER, lower_bound NUMBER, upper_bound NUMBER, label VARCHAR2(4000), count NUMBER)
`DM_Rule`	Is an object with schema: (rule_id INTEGER, antecedent DM_Predicates, consequent DM_Predicates, rule_support NUMBER, rule_confidence NUMBER)
`DM_Predicates`	Is a nested table of `DM_Predicate` with schema: (attribute_name VARCHAR2(30), conditional_operator CHAR(2), attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(id               INTEGER,
record_count      NUMBER,
parent            NUMBER,
tree_level        NUMBER,
dispersion        NUMBER,
child             DM_Children,
centroid          DM_Centroids,
histogram         DM_Histograms,
rule              DM_Rule)

Each DM_Children value is a nested table of id's.

Each DM_Centroid value is itself as nested table that, when un-nested, can return rows of the form:

(attribute_name   VARCHAR2(30)
mean              NUMBER,
mode_value        VARCHAR2(30),
variance          NUMBER);

Each DM_Histogram value is itself a nested table that, when un-nested, can return rows of the form:

(attribute_name   VARCHAR2(30),
bin_id            NUMBER,
lower_bound       NUMBER,
upper_bound       NUMBER,
label             VARCHAR2(4000),
count             NUMBER)

Each DM_Rule value is an object with two nested DM_Predicate columns:

(rule_id          INTEGER,
antecedent        DM_Predicates,
consequent        DM_Predicates,
rule_support      NUMBER,
rule_confidence   NUMBER)

DM_Predicates is a collection of DM_Predicate objects. When un-nested, each object maps to a row of the form:

(attribute_name        VARCHAR2(30),
conditional_operator   CHAR(2) ,
attribute_num_value    NUMBER,
attribute_str_value    VARCHAR2(4000),
attribute_support      NUMBER,
attribute_confidence   NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

The following example demonstrates a k-Means clustering model build followed by an invocation of GET_MODEL_DETAILS_KM table function from Oracle SQL.

BEGIN
-- create a settings table
UPDATE cluster_settings
   SET setting_value = 3 
 WHERE setting_name = DBMS_DATA_MINING.kmeans_block_growth;

/* build a k-Means clustering model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'eight_clouds',
  function => DBMS_DATA_MINING.CLUSTERING,
  data_table_name => 'eight_clouds_build',
  case_id_column_name => 'id',
  target_column_name => NULL,
  settings_table_name => 'cluster_settings');
END;
/

-- View the (unformatted) rules from SQL/Plus
SELECT id, record_count, parent, tree_level, dispersion,
       child, centroid, histogram, rule
  FROM TABLE(DBMS_DATA_MINING_GET_MODEL_DETAILS_KM('eight_clouds'));

GET_MODEL_DETAILS_NB Function

This table function returns a set of rows that provide the details of a Naive Bayes model. The rows are an enumeration of the patterns generated during the creation of the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NB (
   model_name      IN       VARCHAR2)
 RETURN DM_NB_Details pipelined;

Parameters

Table 23-37 GET_MODEL_DETAILS_NB Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-38 GET_MODEL_DETAILS_NB Function Return Values

Return Value Description

Return Value	Description
`DM_NB_Details`	Represents a set of rows `DM_NB_Detail` with schema: (target_attr_name VARCHAR2(30), target_attr_num_value NUMBER, target_attr_str_value VARCHAR2(4000), prior_probability NUMBER, conditionals
`DM_Conditionals`	Represents a set of rows `DM_Conditional` with schema: (attribute_name VARCHAR2(30), attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), conditional_probability NUMBER)

DM_NB_Details

Represents a set of rows DM_NB_Detail with schema:

(target_attr_name       VARCHAR2(30),
target_attr_num_value   NUMBER,
target_attr_str_value   VARCHAR2(4000),
prior_probability       NUMBER,
conditionals

DM_Conditionals

Represents a set of rows DM_Conditional with schema:

(attribute_name          VARCHAR2(30),
attribute_num_value      NUMBER,
attribute_str_value      VARCHAR2(4000),
conditional_probability  NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(target_attr_name       VARCHAR2(30),
target_attr_num_value   NUMBER,
target_attr_str_value   VARCHAR2(4000),
prior_probability       NUMBER,
conditionals            DM_Conditionals)

Each DM_Conditionals value is itself as nested table that, when un-nested, can return rows of type DM_Conditional, of the form:

(attribute_name          VARCHAR2(30),
attribute_num_value      NUMBER,
attribute_str_value      VARCHAR2(4000),
conditional_probability  NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details as shown in this example.

-- You can view the Naive Bayes model details in many ways
-- Consult the Oracle Application Developer's Guide -
-- Object-Relational Features for different ways of 
-- accessing Oracle Objects.

-- View the (unformatted) details from SQL/Plus
SELECT attribute_name, attribute_num_value, attribute_str_value,
       prior_probability, conditionals,
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NB('census_model');

See nbdemo.sql for generation of formatted rules.

GET_MODEL_DETAILS_NMF Function

This table function returns a set of rows that provide the details of a Non-Negative Matrix Factorization model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF (
   model_name        IN        VARCHAR2)
 RETURN DM_NMF_Details pipelined;

Parameters

Table 23-39 GET_MODEL_DETAILS_NMF Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-40 GET_MODEL_DETAILS_NMF Function Return Values

Return Value Description

Return Value	Description
`DM_NMF_Feature_set`	Represents a set of rows of `DM_NMF_Feature` with schema: (feature_id INTEGER, attribute_set DM_NMF_Attribute_Set)
`DM_NMF_Attribute_Set`	Is a nested table of `DM_NMF_Attribute` with schema: (attribute_name VARCHAR2, attribute_value VARCHAR2(4000), coefficient NUMBER)

DM_NMF_Feature_set

Represents a set of rows of DM_NMF_Feature with schema:

(feature_id     INTEGER,
attribute_set   DM_NMF_Attribute_Set)

DM_NMF_Attribute_Set

Is a nested table of DM_NMF_Attribute with schema:

(attribute_name    VARCHAR2,
attribute_value    VARCHAR2(4000),
coefficient        NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(feature_id      INTEGER,
attribute_set    DM_NMF_Attribute_Set)

Each DM_NMF_Feature_Set value is itself as nested table that, when un-nested, can return rows of the form:

(attribute_name)   VARCHAR2,
attribute_value    VARCHAR2(4000),
coefficient        NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

Assume you have built an NMF model called my_nmf_model. You can retrieve model details as shown:

--View (unformatted) details from SQL/Plus 
SELECT feature_id, attribute_set 
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF(
        'my_nmf_model'));

GET_MODEL_DETAILS_SVM Function

This table function returns a set of rows that provide the details of a Support Vector Machines model. This is applicable only for classification or regression models built using a linear kernel. For any other kernel, the table function returns ORA-40215.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM (
  model_name      IN       VARCHAR2)
 RETURN DM_SVM_Linear_Coeff pipelined;

Parameters

Table 23-41 GET_MODEL_DETAILS_SVM Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-42 GET_MODEL_DETAILS_SVM Function Return Values

Return Value Description

Return Value	Description
`DM_SVM_Linear_Coeff_Set`	Represents a set of rows of `DM_SVM_Linear_Coeff` with schema: (class VARCHAR2(4000), attribute_set DM_SVM_Attribute_Set)
`DM_SVM_Attribute_Set`	Is a nested table of `DM_SVM_Attribute` with schema: (attribute_name VARCHAR2(30), attribute_value VARCHAR2(4000), coefficient NUMBER)

DM_SVM_Linear_Coeff_Set

Represents a set of rows of DM_SVM_Linear_Coeff with schema:

(class             VARCHAR2(4000),
attribute_set     DM_SVM_Attribute_Set)

DM_SVM_Attribute_Set

Is a nested table of DM_SVM_Attribute with schema:

(attribute_name    VARCHAR2(30),
attribute_value    VARCHAR2(4000),
coefficient        NUMBER)

Usage Notes

The table function pipes out rows with the schema:

(class           VARCHAR2(4000), 
attribute_set    DM_SVM_Attribute_Set)

class represents classification target values. For regression targets, class is NULL. For each classification target value for classification models or once only for regression models, the DM_SVM_Attribute_Set value is itself a nested table that, when un-nested, can return rows of the form:

(attribute_name   VARCHAR2(30),
attribute_value   VARCHAR2(4000),
coefficient       NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

The following example demonstrates an SVM model build followed by an invocation of GET_MODEL_DETAILS_SVM table function from Oracle SQL:

 -- Create SVM model
BEGIN
  dbms_data_mining.create_model(
    model_name => 'SVM_Clas_sample',
    mining_function => dbms_data_mining.classification,
    data_table_name => 'svmc_sample_build_prepared',
    case_id_column_name => 'id',
    target_column_name => 'affinity_card',
    settings_table_name => 'svmc_sample_settings');
END;
/
-- Display model details
SELECT *
  FROM TABLE(dbms_data_mining.get_model_details_svm('SVM_Clas_sample'))
ORDER BY class;

GET_MODEL_SETTINGS Function

This table function returns the list of settings that were used to build the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_SETTINGS(
   model_name           IN VARCHAR2)
 RETURN DM_Model_Settings pipelined;

Parameters

Table 23-43 GET_MODEL_SETTINGS Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-44 GET_MODEL_SETTINGS Function Return Values

Return Value Description

Return Value	Description
`DM_Model_Settings`	Represents a set of rows with schema: (setting_name VARCHAR2(30), setting_value VARCHAR2(128))

DM_Model_Settings

Represents a set of rows with schema:

(setting_name    VARCHAR2(30),
setting_value    VARCHAR2(128))

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

You can use this table function to determine the settings that were used to build the model. This is purely for informational purposes only -- you cannot alter the model to adopt new settings.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model settings using Oracle SQL as follows:

SELECT setting_name, setting_value
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SETTINGS('census_model'));

GET_MODEL_SIGNATURE Function

This table function returns the model signature, which is a set of rows that provide the name and type of each attribute required as input to the APPLY operation.

The case identifier is not considered a mining attribute. For classification and regression models, the target attribute is also not considered part of the model signature.

Syntax

DBMS_DATA_MINING.GET_MODEL_SIGNATURE(
  model_name           IN VARCHAR2)
RETURN DM_Model_Signature pipelined;

Parameters

Table 23-45 GET_MODEL_SIGNATURE Function Parameters

Parameter	Description
`model_name`	Name of the model (see Rules and Limitations).

Return Values

Table 23-46 GET_MODEL_SIGNATURE Function Return Values

Return Value Description

Return Value	Description
`DM_Model_Signature`	Represents a set of rows with schema: (attribute_name VARCHAR2(30), attribute_type VARCHAR2(106))

DM_Model_Signature

Represents a set of rows with schema:

(attribute_name    VARCHAR2(30),
attribute_type     VARCHAR2(106))

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

You can use this table function to get the list of attributes used for building the model. This is particularly helpful to describe a model when an APPLY operation on test or scoring data is done a significant time period after the model is built, or after it is imported into another schema.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details using Oracle SQL as follows:

SELECT attribute_name, attribute_type
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SIGNATURE('census_model');

IMPORT_MODEL Procedure

This procedure imports specified data mining models from a dump file set or from a remote database.

Syntax

DBMS_DATA_MINING.IMPORT_MODEL (
    filename             IN  VARCHAR2,
    directory            IN  VARCHAR2,
    model_names          IN  VARCHAR2 DEFAULT NULL,
    operation            IN  VARCHAR2 DEFAULT NULL,
    remote_link          IN  VARCHAR2 DEFAULT NULL,
    jobname              IN  VARCHAR2 DEFAULT NULL,
    schema_remap         IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 23-47 IMPORT_MODEL Procedure Parameters

Parameter	Description
`filename`	Name of the dump file set. If there are multiple files in the set, you may use "%U" to specify the file set. See Usage Notes for details.
`directory`	Name of the directory object where the dump file is located. The directory object must be created before the import operation and you must be granted both `READ` and `WRITE` privileges.
`model_names`	Specify the names of models to be imported. See Usage Notes for details.
`operation`	Use one of the literals '`IMPORT'/'SQL_FILE';` defaults to `'IMPORT'`. When set to `'SQL_FILE'`, DDLs that create those database objects in the dump file set are generated in a text file in the dump directory. The DDL file is named `job_name``.sql`.
`remote_link`	A valid DB link or `NULL`. When a DB link is provided, this allows moving DM models from the remote database to the local database. Default value is `NULL` for importing into the current database from a dump file set.
`job_name`	Name of the import job, limited to 30 characters. It must be unique to that schema, that is, there is no active import job using the same name. If not supplied, a default name is provided in the form of `username_imp_nnnn`. For example, "`SCOTT_imp_76102"`.
`schema_remap`	Specify schema remapping. Must be specified when importing from a dump file set created by a different user. It must be specified in the form of '`FROM_NAME:TO_NAME'`. For example, to remap schema from `SCOTT` to `MARY`, set `schema_remap=>'SCOTT:MARY'`. You need `IMP_FULL_DATABASE` privileges to do schema remapping if you do not have a SYS role.

Usage Notes

Use IMPORT_MODEL to import all or specific data mining models from a dump or from a remote database by means of a DB link.

The dump file set must be created by expdp or EXPORT_MODEL procedure. The dump files must be located in the directory mapped by the directory object. The user must have READ and WRITE privileges on the directory object. In order to import models from a dump created by another user, you must have IMP_FULL_DATABASE privilege or have SYS role.

When import operation completes successfully, a log is created in the directory mapped by directory object. If jobname is provided, the log is named jobname.log. If jobname is NULL (the default) the log is named username_imp_nnnn.log, where nnnn is a number. If the dump file set contains multiple files, you can use "%U" in the filename. For example, if your dump file set contains 3 files, archive01.dmp, archive02.dmp, and archive03.dmp, you may specify filename=>'archive%U'.

Use the parameter model_names to specify which models to import. The use of model_names is dependent on the operation, whether the source is a dump file set or a remote database. When importing from dump files, model names are the sole useful reference. In this case, you can either assign model_names with comma-delimited model names or a WHERE clause with names explicitly specified. When importing from a remote database, model_names can be specified in the same fashion as in the EXPORT_MODEL procedure. Details are listed in Table 23-48.

Table 23-48 Table MODEL_FILTER for Import

Value	Source*	Meaning	Notes
`NULL`	`dump/remote`	Default. Import all models from the user schema.	---
`'ALL'`	`dump/remote`	Import all models from the user schema.	Same as `NULL`; useful to improve code readability.
Comma-separated model names	`dump/remote`	Import named models.	Examples: `'mymodel' or 'model1, model2, model3`
A valid `WHERE` clause, such as:	----	Specify which models are to be imported.	Any string that can be appended to `'select name from dm_user_models where'` and make it a valid SQL query.
`'name=''super_model'''`	`dump/remot`e	Import `super_mode`l.	----
`'name IN ("model1"`,`"model2"`, `"model3")'`	`dump/remote`	Import model `model1`,`model2`, and `model3`.	----
`'ALGORITHM_NAME="NAIVE_BAYES"'`	`remote`	Import all NB models.	----
`'FUNCTION_ NAME="CLASSIFICATION"'`	`remote`	Import all classification models.	----
`'name LIKE "NB\_ MODEL%"ESCAPE"\" '`	`remote`	Import all models whose name start with `'NB_MODEL'`.	----

IMPORT_MODEL is not mutually exclusive with the DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. In other words, if an import operation is underway, models with the same name are dropped at that time, and the results in the schema are unpredictable.

Examples

The following example shows user MARY imports all models from a dump file, model_exp_001.dmp, created by user SCOTT. The dump file is located in the file system directory mapped to a directory object called DM_DUMP. Note that if user MARY does not have IMP_FULL_DATABASE privileges, IMPORT_MODEL will raise an error.

-- import all models
declare
  file_name       VARCHAR2(40);
BEGIN
  file_name := 'model_exp_001.dmp';
  DBMS_DATA_MINING.import_model(
                 filename=>file_name,
                   directory=>'DM_DUMP',                             
                 schema_remap=>'SCOTT:MARY');
  dbms_output.put_line(
'DBMS_DATA_MINING.import_model of all models from SCOTT done!');
END;
/

RANK_APPLY Procedure

This procedure ranks the results of an APPLY operation based on a top-N specification for predictive and descriptive model results. For classification models, you can provide a cost matrix as input, and obtain the ranked results with costs applied to the predictions.

Syntax

DBMS_DATA_MINING.RANK_APPLY (
      apply_result_table_name        IN VARCHAR2,
      case_id_column_name            IN VARCHAR2,
      ranked_apply_result_tab_name   IN VARCHAR2,
      top_N                          IN INTEGER DEFAULT 1,
      cost_matrix_table_name         IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name       IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name        IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-49 RANK_APPLY Procedure Parameters

Parameter	Description
`apply_result_table_name`	Name of the table or view containing the results of an `APPLY` operation on the test dataset (see Usage Notes).
`case_id_column_name`	Name of the case identifier column. This must be the same as the one used for generating `APPLY` results.
`ranked_apply_result_tab_name`	Name of the table containing the ranked apply results.
`top_N`	Top N predictions to be considered from the `APPLY` results for precision recall computation.
`cost_matrix_table_name`	Name of the cost matrix table.
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table.
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table.

Usage Notes

You can use RANK_APPLY to generate ranked apply results, based on a top-N filter and also with application of cost for predictions, if a cost matrix is provided.

The behavior of RANK_APPLY is similar to that of APPLY with respect to other DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. The procedure does not depend on the model; the only input of relevance is the apply results generated in a fixed schema table from APPLY.

The main intended use of RANK_APPLY is for the generation of the final APPLY results against the scoring data in a production setting. You can apply the model against test data using APPLY, compute various test metrics against various cost matrix tables, and use the candidate cost matrix for RANK_APPLY.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will be the same case identifier column as that of the apply results.

Classification Models -- NB, ABN, SVM

For numerical targets, the ranked results table will have the schema as shown:

(case_id       VARCHAR2/NUMBER,
prediction     NUMBER,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

For categorical targets, the ranked results table will have the following schema:

(case_id       VARCHAR2/NUMBER,
prediction     VARCHAR2,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

Clustering using k-Means

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the cluster identifier corresponding to a case, and the associated probability. Cost matrix is not considered here. The ranked results table will have the schema as shown, and contains the cluster ids ranked by top-N.

(case_id       VARCHAR2/NUMBER,
cluster_id     NUMBER,
probability    NUMBER,
rank           INTEGER)

Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the feature identifier corresponding to a case, and the associated match quality. Cost matrix is not considered here. The ranked results table will have the schema as shown, and contains the feature ids ranked by top-N.

(case_id        VARCHAR2/NUMBER,
feature_id      NUMBER,
match_quality   NUMBER,
rank            INTEGER)

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if build data was pre-processed in any manner,
 * perform the same pre-processing steps on apply
 * data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
DBMS_DATA_MINING.rank_apply(
  apply_result_table_name => 'census_apply',
  case_id_column_name => 'person_id',
  ranked_apply_result_tab_name => 'census_ranked_apply',
  top_N => 3,
  cost_matrix_table_name => 'census_cost_matrix');
END;
/

-- View Apply Results
SELECT *
  FROM census_ranked_apply;

RENAME_MODEL Procedure

This procedure renames a mining model to a specified new name.

Syntax

DBMS_DATA_MINING.RENAME_MODEL (
     model_name            IN VARCHAR2,
     new_model_name        IN VARCHAR2);

Parameters

Table 23-50 RENAME_MODEL Procedure Parameters

Parameter	Description
`model_name`	Old name of the model (see Rules and Limitations).
`new_model_name`	New name of the model (see Rules and Limitations).

Usage Notes

You can use RENAME_MODEL to rename an existing mining model.

The behavior of the RENAME_MODEL is similar to a SQL DDL RENAME operation. It blocks DROP_MODEL and CREATE_MODEL operations. It does not block APPLY, which is a SQL query-like operation that does not update any model data.

If an APPLY operation is using a model, and you attempt to rename the model during that time, the RENAME will succeed and APPLY will return indeterminate results. This is in line with the conventional behavior in the RDBMS, where DDL operations do not block on query operations.

Examples

Assume the existence of a model census_model. The following example shows how to rename this model.

BEGIN
  DBMS_DATA_MINING.rename_model(
    model_name => 'census_model',
    new_model_name => 'census_new_model');
END;
/

23 DBMS_DATA_MINING

Using DBMS_DATA_MINING

Overview

Constants

Table 23-1 DBMS_DATA_MINING Constants Summary: Mining Function

Table 23-2 DBMS_DATA_MINING Constants Summary: Function Settings

Table 23-3 DBMS_DATA_MINING Constants Summary: Algorithm Settings

Table 23-4 DBMS_DATA_MINING Constants Summary: Adaptive Bayes Network

Table 23-5 DBMS_DATA_MINING Constants Summary: Naive Bayes

Table 23-6 DBMS_DATA_MINING Constants Summary: Support Vector Machines

Table 23-7 DBMS_DATA_MINING Constants Summary: Non-Negative Matrix Factorization

Table 23-8 DBMS_DATA_MINING Constants Summary: k-Means

Data Types

Table 23-9 DBMS_DATA_MINING Summary of Data Types

Exceptions

Table 23-10 DBMS DATA_MINING Errors Summary

User Views

Table 23-11 DBMS_DATA_MINING Summary of User Views

Operational Notes

Settings Table

Table 23-12 DBMS_DATA_MINING Function Settings

Table 23-13 Algorithm Settings for Adaptive Bayes Network

Table 23-14 Algorithm Settings for Naive Bayes

Table 23-15 Algorithm Settings for Support Vector Machines

Table 23-16 Algorithm Settings for Non-Negative Matrix Factorization

Table 23-17 Algorithm Settings for k-Means

Prior Probabilities Table

Cost Matrix Table

Summary of DBMS_DATA_MINING Subprograms

Table 23-18 DBMS_DATA_MINING Package Subprograms

APPLY Procedure

Syntax

Parameters

Table 23-19 APPLY Procedure Parameters

Usage Notes

Classification Algorithms

Regression using Support Vector Machines

Clustering using k-Means

Feature Extraction using NMF

Examples

CREATE_MODEL Procedure

Syntax

Parameters

Table 23-20 CREATE_MODEL Procedure Parameters

Usage Notes

Examples

COMPUTE_CONFUSION_MATRIX Procedure

Syntax

Parameters

Table 23-21 COMPUTE_CONFUSION_MATRIX Procedure Parameters

Usage Notes

Examples

COMPUTE_LIFT Procedure

Syntax

Parameters

Table 23-22 COMPUTE_LIFT Procedure Parameters

Usage Notes

Examples

COMPUTE_ROC Procedure

Syntax

Parameters

Table 23-23 COMPUTE_ROC Procedure Parameters

Usage Notes

Table 23-24 COMPUTE_ROC Output

Examples

DROP_MODEL Procedure

Syntax

Parameters

Table 23-25 DROP_MODEL Procedure Parameters

Usage Notes

Examples

EXPORT_MODEL Procedure

Syntax

Parameters

Table 23-26 EXPORT_MODEL Procedure Parameters

Usage Notes

Table 23-27 Table MODEL_FILTER for Export

Examples

GET_ASSOCIATION_RULES Function

Syntax

23
DBMS_DATA_MINING