Abstract
Rare-category detection helps discover new rare classes in an unlabeled data set by selecting their candidate data examples for labeling. Most of the existing approaches for rare-category detection require prior information about the data set without which they are otherwise not applicable. The prior-free algorithms try to address this problem without prior information about the data set; though, the compensation is high time complexity, which is not lower than O(dN2) where N is the number of data examples in a data set and d is the data set dimension. In this paper, we propose CLOVER a prior-free algorithm by introducing a novel rare-category criterion known as local variation degree (LVD), which utilizes the characteristics of rare classes for identifying rare-class data examples from other types of data examples and passes those data examples with maximum LVD values to CLOVER for labeling. A remarkable improvement is that CLOVER’s time complexity is O(dN2−1/d) for d>1 or O(NlogN) for d=1 . Extensive experimental results on real data sets demonstrate the effectiveness and efficiency of our method in terms of new rare classes discovery and lower time complexity.
Original language | English |
---|---|
Pages (from-to) | 713-736 |
Journal | Knowledge and Information Systems |
Volume | 35 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2013 |
Keywords
- Rare-category detection
- local variation degree
- histogram density estimation