Saturday 20th of April 2024
 

Impact of Outlier Removal and Normalization Approach in Modified k-Means Clustering Algorithm


Vaishali Rajeev Patel and Rupa G. Mehta

Clustering technique is mainly focus on pattern recognition for further organizational design analysis which finds groups of data objects such that objects in a group are similar to one another and dissimilar from the objects in the other group. It is important to preprocess data due to noisy data, errors, inconsistencies, outliers and lack of variable values. Different data preprocessing techniques like cleaning method, outlier detection, data integration and transformation can be carried out before clustering process to achieve successful analysis. Normalization is an important preprocessing step in Data Mining to standardize the values of all variables from dynamic range into specific range. Outliers can significantly affect data mining performance, so outlier detection and removal is an important task in wide variety of data mining applications. k-Means is one of the most well known clustering algorithms yet it suffers major shortcomings like initialize number of clusters and seed values preliminary and converges to local minima. This paper analyzed the performance of modified k-Means clustering algorithm with data preprocessing technique includes cleaning method, normalization approach and outlier detection with automatic initialization of seed values on datasets from UCI dataset repository.

Keywords: Clustering, k-Means, Normalization Approach, Outlier Removal, Preprocessing

Download Full-Text


ABOUT THE AUTHORS

Vaishali Rajeev Patel
M.Tech. Research Scholoar and Assistant Professor in the Department of Computer Engineering at SVMIT, Bharuch, India

Rupa G. Mehta
Rupa G. Mehta is Ph. D. Scholar and currently working as an associate professor in Department of Computer Engineering at Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India. Her research areas of interest include data mining, classification techniques, database management systems, data structures and formal language.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »