Impact of Outlier Removal and Normalization Approach in Modified k-Means Clustering Algorithm
Clustering technique is mainly focus on pattern recognition for further organizational design analysis which finds groups of data objects such that objects in a group are similar to one another and dissimilar from the objects in the other group. It is important to preprocess data due to noisy data, errors, inconsistencies, outliers and lack of variable values. Different data preprocessing techniques like cleaning method, outlier detection, data integration and transformation can be carried out before clustering process to achieve successful analysis. Normalization is an important preprocessing step in Data Mining to standardize the values of all variables from dynamic range into specific range. Outliers can significantly affect data mining performance, so outlier detection and removal is an important task in wide variety of data mining applications. k-Means is one of the most well known clustering algorithms yet it suffers major shortcomings like initialize number of clusters and seed values preliminary and converges to local minima. This paper analyzed the performance of modified k-Means clustering algorithm with data preprocessing technique includes cleaning method, normalization approach and outlier detection with automatic initialization of seed values on datasets from UCI dataset repository.
Keywords: Clustering, k-Means, Normalization Approach, Outlier Removal, Preprocessing
Download Full-Text
ABOUT THE AUTHORS
Vaishali Rajeev Patel
M.Tech. Research Scholoar and Assistant Professor in the Department of Computer Engineering at SVMIT, Bharuch, India
Rupa G. Mehta
Rupa G. Mehta is Ph. D. Scholar and currently working as an associate professor in Department of Computer Engineering at Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India. Her research areas of interest include data mining, classification techniques, database management systems, data structures and formal language.
Vaishali Rajeev Patel
M.Tech. Research Scholoar and Assistant Professor in the Department of Computer Engineering at SVMIT, Bharuch, India
Rupa G. Mehta
Rupa G. Mehta is Ph. D. Scholar and currently working as an associate professor in Department of Computer Engineering at Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India. Her research areas of interest include data mining, classification techniques, database management systems, data structures and formal language.