Parallel and Scalable Map Reduce and Pipeline Tree Classifiers for Massive Dataset Using Map Reduce and Data Flow Pipeline
One of the important research areas in todays scenario is
classification of Big Data. While there are a lot of traditional
classification methods, extending them to Big Data is quite
challenging. Decision Tree Classifier is one of the effective
traditional classification techniques. The combination of Hadoop
and Map Reduce has been adapted by many researchers both
commercially and academically to process Big Data. Of late,
Google cloud dataflow paradigm has sneaked into the Big Data
scenario that augments the earlier systems with stream processing.
This paper presents two algorithms based on Map Reduce and
Google cloud data flow for implementing decision trees for
classification is presented. The performances of both algorithms on
various parameters have been compared and presented.
Keywords: Decision Tree, Hadoop Distributed File System,Map Reduce Classifier, Pipeline Tree Classifier, Google Dataflow
Download Full-Text
ABOUT THE AUTHORS
A. M. James Raj
A.M.James Raj, received M.Sc. in Computer Science from Bharathidasan University,Trichy and M.Phil from Alagappa Univeristy, Karaikudi from Tamil Nadu, India and M.Tech in information Technology from AAI–DU Allahabad, India. He also cleared National Eligibility Test (NET), a qualifying examination for college/university professors, conducted by central Government of India. Presently he is working as an associate professor in computer science and applications in Pope John Paul II College of Education,affiliated to Pondicherry University. His research interest includes in Data Mining, in particular Web mining and Data Bases
J. Prema
J Prema, received B.Tech. and M.Tech. degrees from Pondicherry University and currently working at Tata Consultancy Services, Chennai.
P. Xavier
P. Xavier obtained his Ph.D. degree from Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University, Kanchipuram and is currently working as a Professor of Computer Applications at Sacred Heart College, Tirupattur, Tamil Nadu, India.
F. Sagayaraj Francis
F. Sagayaraj Francis obtained his Ph.D. degree from Pondicherry University and is currently working as a Professor of Computer Science and Engineering, Pondicherry Engineering College, Pondicherry, India
A. M. James Raj
A.M.James Raj, received M.Sc. in Computer Science from Bharathidasan University,Trichy and M.Phil from Alagappa Univeristy, Karaikudi from Tamil Nadu, India and M.Tech in information Technology from AAI–DU Allahabad, India. He also cleared National Eligibility Test (NET), a qualifying examination for college/university professors, conducted by central Government of India. Presently he is working as an associate professor in computer science and applications in Pope John Paul II College of Education,affiliated to Pondicherry University. His research interest includes in Data Mining, in particular Web mining and Data Bases
J. Prema
J Prema, received B.Tech. and M.Tech. degrees from Pondicherry University and currently working at Tata Consultancy Services, Chennai.
P. Xavier
P. Xavier obtained his Ph.D. degree from Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University, Kanchipuram and is currently working as a Professor of Computer Applications at Sacred Heart College, Tirupattur, Tamil Nadu, India.
F. Sagayaraj Francis
F. Sagayaraj Francis obtained his Ph.D. degree from Pondicherry University and is currently working as a Professor of Computer Science and Engineering, Pondicherry Engineering College, Pondicherry, India