Enhanced Hierarchical Clustering for Genome Databases
Clustering techniques find interesting and previously unknown
patterns in large scale data embedded in a large multi
dimensional space and are applied to a wide variety of problems
like customer segmentation, Biology, data mining techniques,
machine Learning and geographical information systems.
Clustering algorithms are used efficiently to scale up with the
dimensionality of the data sets and the data base size.
Hierarchical clustering methods in particular are widely used to
find patterns in multi dimensional data. In this paper, we design
an enhanced hierarchical clustering algorithm which scans the
dataset and calculates distance matrix only once. Our main
contribution is to reduce time, even when a large database is
analyzed. Also, the results of hierarchical clustering are
represented as a binary tree which gives clarity in grouping and
further helps to find clustered objects easily. Our algorithm is
able to retrieve number of clusters with the help of cut distance
and measures the quality with validation index in order to obtain
the best one; does not require initial parameter like number of
clusters.
Keywords: Micro array, Hierarchical clustering, Gene expression data, Binary Tree
Download Full-Text