Classification is a special type of machine learning tasks, which is essentially achieved by training a classifier that can be used to classify new instances. In order to train a high performance classifier, it is crucial to extract representative features from raw data, such as text and images. In reality, instances could be highly diverse even if they belong to the same class, which indicates different instances of the same class could represent very different characteristics. For example, in a facial expression recognition task, some instances may be better described by Histogram of Oriented Gradients features, while others may be better presented by Local Binary Patterns features. From this point of view, it is necessary to adopt ensemble learning to train different classifiers on different feature sets and to fuse these classifiers towards more accurate classification of each instance. On the other hand, different algorithms are likely to show different suitability for training classifiers on different feature sets. It shows again the necessity to adopt ensemble learning towards advances in the classification performance. Furthermore, a multi-class classification task would become increasingly more complex when the number of classes is increased, i.e. it would lead to the increased difficulty in terms of discriminating different classes. In this paper, we propose an ensemble learning framework that involves transforming a multi-class classification task into a number of binary classification tasks and fusion of classifiers trained on different feature sets by using different learning algorithms. We report experimental studies on a UCI data set on Sonar and the CK+ data set on facial expression recognition. The results show that our proposed ensemble learning approach leads to considerable advances in classification performance, in comparison with popular learning approaches including decision tree ensembles and deep neural networks. In practice, the proposed approach can be used effectively to build an ensemble of ensembles acting as a group of expert systems, which show the capability to achieve more stable performance of pattern recognition, in comparison with building a single classifier that acts as a single expert system.