PL/SQL Packages and Types Reference 10g Release 1 (10.1) Part Number B10802-01 |
|
|
View PDF |
The DBMS_FREQUENT_ITEMSET package enables frequent itemset counting. The two functions are identical except in the input cursor format difference.
This chapter contains the following topics:
This procedure counts all frequent itemsets given a cursor for input data which is in 'TRANSACTIONAL
' row format, support threshold, minimum itemset length, maximum itemset length, items to be included, items to be excluded. The result will be a table of rows in form of itemset, support, length, total number of transactions.
In 'TRANSACTIONAL
' row format, each transaction is spread across multiple rows. All the rows of a given transaction have the same transaction id, and each row has a different item id. Combining all of the item ids which share a given transaction id results in a single transaction.
DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL ( tranx_cursor IN SYSREFCURSOR, support_threshold IN NUMBER, itemset_length_min IN NUMBER, itemset_length_max IN NUMBER, including_items IN SYS_REFCURSOR DEFAULT NULL, excluding_items IN SYS_REFCURSOR DEFAULT NULL) RETURN TABLE OF ROW ( itemset [Nested Table of Item Type DERIVED FROM tranx_cursor], support NUMBER, length NUMBER, total_tranx NUMBER);
Applications must predefine a nested table type of the input item type and cast the output itemset into this predefined nested table type before further processing, such as loading into a table.
Suppose that the input table tranx_table_in
looks as follows:
(1, 'apple') (1, 'banana') (2, 'apple') (2, 'milk') (2, 'banana') (3, 'orange')
and the user is trying to find itemsets that satisfy a support-threshold of 60% and have the itemset-length greater than 1 (namely, (apple, banana)).
The output of this function would contain the following output row:
itemset=('apple','banana'), support=2, length=2, total_tranx=3
You need to create a nested table of item type before you submit a query to perform the frequent itemset counting. In this example, since item is of VARCHAR2
(30), you must create a nested table of VARCHAR2
(30):
CREATE TYPE fi_varchar_nt AS TABLE OF VARCHAR2(30); SELECT CAST(itemset as FI_VARCHAR_NT) itemset, support, length, total_tranx FROM table(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( cursor(SELECT tid, iid FROM tranx_table_in), 0.6, 2, 5, NULL, NULL));
Here is another example to illustrate how to include certain items and exclude certain items in the counting.
SELECT CAST(itemset as FI_VARCHAR_NT)itemset, support, length, total_tranx FROM table(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( CURSOR(SELECT tid, iid FROM tranx_table_in), 0.6, 2, 5, CURSOR(SELECT * FROM table(FI_VARCHAR_NT ('apple','banana','orange'))), CURSOR(SELECT * FROM table(FI_VARCHAR_NT('milk')))));
Using the including/excluding items parameter, you are able to further optimize the execution by ignoring itemsets that are not expected by application.
You can also use transactional output through collection unnesting:
SELECT bt.setid, nt.* FROM (SELECT cast(Itemset as FI_VARCHAR_NT) itemset, rownum setid FROM table( DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( CURSOR(SELECT tid, iid FROM tranx_table_in), 0.6, 2, 5, NULL, NULL))) bt, table(bt.itemset) nt;
If you want to use an insert statement to load frequent itemsets into a nested table, it is better to use the NESTED_TABLE_FAST_INSERT
hint for performance:
CREATE TABLE fq_nt (coll FI_VARCHAR_NT) NESTED TABLE coll STORE AS coll_nest; INSERT /*+ NESTED_TABLE_FAST_INSERT */ INTO fq_nt SELECT cast(itemset as FI_VARCHAR_NT) FROM table(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( cursor(SELECT tid, iid FROM tranx_table_in), 0.6, 2, 5, NULL, NULL));
Note that if you want to use the package inside a PL/SQL cursor, you must cast the return type of the table function:
CREATE TYPE fi_res AS OBJECT ( itemset FI_VARCHAR_NT, support NUMBER, length NUMBER, total_tranx NUMBER ); / CREATE TYPE fi_coll AS TABLE OF fi_res; / DECLARE cursor freqC is SELECT Itemset FROM table( CAST(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( cursor(SELECT tid, iid FROM tranx_table_in), 0.6, 2, 5, NULL, NULL) AS fi_coll)); coll_nt FI_VARCHAR_NT; num_rows int; num_itms int; BEGIN num_rows := 0; num_itms := 0; OPEN freqC; LOOP FETCH freqC INTO coll_nt; EXIT WHEN freqC%NOTFOUND; num_rows := num_rows + 1; num_itms := num_itms + coll_nt.count; END LOOP; CLOSE freqC; DBMS_OUTPUT.PUT_LINE('Totally ' || num_rows || ' rows ' || num_itms || ' items were produced.'); END; /
The purpose of this table function is to count all frequent itemsets given a cursor for input data which is in 'HORIZONTAL
' row format, support threshold, minimum itemset length, maximum itemset length, items to be included, items to be excluded. The result will be a table of rows in form of itemset, support, length, total transactions counted.
In 'HORIZONTAL
' row format, each row contains all of the item ids for a single transaction. Since all of the items come together, no transaction id is necessary.
The benefit of this table function is that if an application already has data in horizontal format, the database can skip the step of transforming rows that are in transactional format into horizontal format.
DBMS_FREQUENT_ITEMSET.FI_HORIZONTAL( tranx_cursor IN SYSREFCURSOR, support_threshold IN NUMBER, itemset_length_min IN NUMBER, itemset_length_max IN NUMBER, including_items IN SYS_REFCURSOR DEFAULT NULL, excluding_items IN SYS_REFCURSOR DEFAULT NULL) RETURN TABLE OF ROW ( itemset [Nested Table of Item Type DERIVED FROM tranx_cursor], support NUMBER, length NUMBER, total_tranx NUMBER);
Suppose you have a table horiz_table_in
.
horiz_table_in(iid1 VARCHAR2(30), iid2 VARCHAR2(30), iid3 VARCHAR2(30), iid4 VARCHAR2(30), iid5 VARCHAR2(30));
and the data in horiz_table_in looks as follows:
('apple', 'banana', NULL, NULL, NULL) ('apple', 'milk', 'banana', NULL, NULL) ('orange', NULL, NULL, NULL, NULL)
Suppose you want to find out what combinations of items is frequent with a given support threshold of 30%, requiring itemset containing at least one of ('apple','banana','orange'), but excluding any of ('milk') in any itemset. You use the following query:
CREATE TYPE fi_varchar_nt AS TABLE OF VARCHAR2(30); SELECT CAST(itemset as FI_VARCHAR_NT)itemset, support, length, total_tranx FROM table(DBMS_FREQUENT_ITEMSET.FI_HORIZONTAL( CURSOR(SELECT iid1, iid2, iid3, iid4, iid5 FROM horiz_table_in), 0.3, 2, 5, CURSOR(SELECT * FROM table(FI_VARCHAR_NT ('apple','banana','orange'))), CURSOR(SELECT * FROM table(FI_VARCHAR_NT('milk')))));