ABSTRACT

In any data classification problem where supervised learning is used, it is presumed that the labelled data can be obtained at ease. In fact, there are several research problems, because it is very expensive and tedious to obtain the labelled data. The situation can be conquered with a new theme called active learning. In this paper, we proposed pool-based active learning method, where user observes the pool of non-labelled instances to access the breast cancer data set. After selecting few samples from the pooled data, the user needs to label them. Here, we suggested three active learning methods with Support Vector Machines (SVMs) as a classifier and three methods, namely Entropy (Entrp), Smallest Margin (SM) and Least Confidence (LC), for choosing uncertain samples from the pooled data. In addition, to avoid redundancy and unwanted samples, we incorporated three feature selection algorithms, namely Fuzzy Preference-Based Rough Set (FPRS), Signal-to-Noise Ratio (SNR) and Neighbourhood Rough Set-based Feature Evaluation and Reduction (fs_con_N) to obtain the optimal number of features from the microarray data set.