February 1, 2013
Conference Paper

DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree

Abstract

Itemset mining has recently focused on discovery of frequent itemsets from high-dimensional datasets with relatively few rows and a larger number of items. With exponentially in-creasing running time as average row length increases, mining such datasets renders most conventional algorithms impracti-cal. Unfortunately, large cardinality closed itemsets are likely to be more informative than small cardinality closed itemsets in this type of dataset. This paper proposes an approach, called DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a memory-efficient Compact Row-Tree data structure to represent itemsets during the search process. The search strategy explores the transposed representation of the dataset. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental result shows that DisClose can complete the extraction of colossal closed itemsets in the considered dataset, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found.

Revised: April 19, 2013 | Published: February 1, 2013

Citation

Zulkurnain N.F., J.A. Keane, and D.J. Haglin. 2013. DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree. In PAKDD 2012 International Workshops: DMHM, GeoDoc, 3Clust, and DSDM, May 29 – June 1, 2012, Kuala Lumpur, Malaysia. Lecture Notes in Computer Science, edited by T Washio and J Luo, 7769, 141-156. Berlin:Springer-Verlag. PNNL-SA-85968. doi:10.1007/978-3-642-36778-6