Saturday 23rd of September 2017
 

Enhanced Technique for Data Cleaning in Text File


Arup Kumar Bhattacharjee, Atanu Mallick, Arnab Dey and Sananda Bandyopadhyay

Data cleaning is a process of correcting or removing of erroneous data caused by contradictions, disparities, keying mistakes, missing bits, etc to create consistent and reliable information. Text files are used to store simple information and which can be also deceptive in terms of dirty data. In this paper we have provided a solution to cleanup simple text file using some data cleaning processes. Though we use text files so often but there is no such robust method exist to clean up text files. As data cleaning plays a crucial role for decision management which is depend on high quality data. So we have implemented a set of methods to clean text files. Here we use text files to store data in tabular format and our system checks whether there exist any error and finally try to correct or remove the errors according to different algorithms.

Keywords: ETL, Data Dictionary, Metaphone, Date Validation Rules, Gender Validation Rules.

Download Full-Text


ABOUT THE AUTHORS

Arup Kumar Bhattacharjee
Arup Kumar Bhattacharjee received his MCA Degree from University of Kalyani and his M.Tech from West Bengal University of Technology. He has contributed to 15 books and coauthored 2 publications. He is an Assistant Professor of Computer Application at RCC Institute of Information Technology which is affiliated to West Bengal University of Technology in Kolkata, West Bengal. His research interests include Software Engineering, Object Technology and Parallel Computing.

Atanu Mallick
Atanu Mallick has completed his Bachelors in Computer Science (Hons.) from Surendranath College under Calcutta University in Kolkata, West Bengal. Currently he is pursuing his Masters in Computer Application from RCC Institute of Information Technology which is affiliated to West Bengal University of Technology in Kolkata, West Bengal, India.

Arnab Dey
Arnab Dey received his Bachelors degree in Computer Application from Pailan College of Management & Technology, Kolkata under West Bengal University of Technology. Currently he is pursuing his Masters in Computer Application from RCC Institute of Information Technology which is affiliated to West Bengal University of Technology in Kolkata, West Bengal.

Sananda Bandyopadhyay
Sananda Bandyopadhyay has completed her Bachelors in Computer Application from Techno India (salt lake) under West Bengal University Of Technology. Now she is pursuing her Masters in Computer Application from RCC Institute of Information Technology which is affiliated to West Bengal University of Technology in Kolkata, West Bengal.


IJCSI Published Papers Indexed By:

 

 

 

 
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »