TY - JOUR
T1 - Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection
AU - Tahir, Muhammad
AU - Smith, Jim
PY - 2010
Y1 - 2010
N2 - The nearest-neighbour (1NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. A vital consideration in obtaining good results with this technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In recent years there has been an increasing interest in creating ensembles of classifiers in order to improve classification accuracy. This paper proposes a new ensemble technique which combines multiple 1NN classifiers, each using a different distance function, and potentially a different set of features (feature vector).
These feature vectors are determined for each distance metric simultaneously using Tabu Search to minimise the ensemble error rate. We show that this approach implicitly selects for a diverse set of classifiers, and by doing so achieves greater performance improvements than can be achieved by treating the classifiers independently, or using a single feature set. Naturally, optimising the level of ensembles necessitates a much larger solution space, to make this approach tractable, we show how Tabu Search at the ensemble level can be hybridised with local search at the level of individual classifiers. The proposed ensemble classifier with different distance metrics and different feature vectors is evaluated using various benchmark datasets from UCI Machine Learning Repository and a real-world machine-vision application. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without feature selection (FS)).
AB - The nearest-neighbour (1NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. A vital consideration in obtaining good results with this technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In recent years there has been an increasing interest in creating ensembles of classifiers in order to improve classification accuracy. This paper proposes a new ensemble technique which combines multiple 1NN classifiers, each using a different distance function, and potentially a different set of features (feature vector).
These feature vectors are determined for each distance metric simultaneously using Tabu Search to minimise the ensemble error rate. We show that this approach implicitly selects for a diverse set of classifiers, and by doing so achieves greater performance improvements than can be achieved by treating the classifiers independently, or using a single feature set. Naturally, optimising the level of ensembles necessitates a much larger solution space, to make this approach tractable, we show how Tabu Search at the ensemble level can be hybridised with local search at the level of individual classifiers. The proposed ensemble classifier with different distance metrics and different feature vectors is evaluated using various benchmark datasets from UCI Machine Learning Repository and a real-world machine-vision application. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without feature selection (FS)).
KW - Tabu Search
KW - 1NN classifier
KW - ensemble classifiers
U2 - 10.1016/j.patrec.2010.01.030
DO - 10.1016/j.patrec.2010.01.030
M3 - Article
VL - 31
SP - 1470
EP - 1480
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
SN - 0167-8655
IS - 11
ER -