Phone: (+61 8) 6488 2238
Fax: (+61 8) 6488 1089
Mining High Dimensional Data
With the advancements in information technology, it is now easier and economical to capture as well as store the complex data at detailed level. And such data is being generated at enormous rate everyday in almost every conceivable area, ranging from astronomy to biological sciences. Mining such high dimensional datasets for useful patterns is an interesting and challenging area. The focus of this research is on core data mining tasks: Clustering and Outlier detection. Clustering is the process of finding similar data points while outliers identify the underlying dissimilarity structures.
A subset of dimensions is called a subspace and the data group together differently under different subspaces. Thus, it becomes imperative to explore these subspaces for the underlying similarity/dissimilarity structures. But, the exponential growth of the search space with the dimensionality of data, renders the whole process of subspace mining very expensive. The performance of the existing subspace clustering algorithms deteriorates drastically with the increase in the number of dimensions.
The research aims to find a scalable and an efficient solution to this problem by eliminating expensive index structures and multiple database scans.
There is an emergent need for automated analysis of the voluminous high-dimensional data (sometimes called Big Data) for the useful hidden patterns. This research is an endeavour to deal with the ‘Curse of Dimensionality’ and the output can be of special interest to the stakeholders from different domains dealing with such datasets.