Parallelized text classification algorithm for processing large scale TCM clinical data with MapReduce

Abstract

There are many opportunities and challenges in data analytic research for TCM (Traditional Chinese Medicine) in advent of big data era, like various clinical record sources, different symptom descriptions, lots of collected clinical symptoms, more than one syndrome attached to one clinical record and etc. Novel methods on support vector machines, ensemble learning, feature selection, multi-label learning in machine learning field are proposed to meet the challenges. When dealing with large scale clinical data of TCM, the accuracy of a multi-class classifier is lower. The training process of SVM is difficult to be parallel processing and has a slower computational speed. To improve the efficiency of TCM data processing, we propose a parallelized text classification algorithm for processing large scale TCM clinical data with MapReduce.

Topics

    3 Figures and Tables

    Download Full PDF Version (Non-Commercial Use)