Abstract
Supervised learning is a popular approach to text classification among the research community as well as within software development industry. It enables intelligent systems to solve various text analysis problems such as document organization, spam detection and report scoring. However, the extremely difficult and time intensive process of creating a training corpus makes it inapplicable to many text classification problems. In this research, we explored the opportunities of addressing this pitfall by studying the ontological characteristics of document categories and grouping them under virtual super-categories to narrow down the search for a suitable category. Applying this method showed that classifier performance has greatly improved despite the relatively small size of the training corpus.
Original language | English |
---|---|
Title of host publication | ICIMTR 2012 - 2012 International Conference on Innovation, Management and Technology Research |
Publisher | IEEE |
Pages | 574-579 |
Number of pages | 6 |
ISBN (Electronic) | 9781467306546 |
ISBN (Print) | 9781467306553 |
DOIs | |
Publication status | Published - 12 Jul 2012 |
Event | 2012 International Conference on Innovation, Management and Technology Research, ICIMTR 2012 - Malacca, Malaysia Duration: 21 May 2012 → 22 May 2012 |
Conference
Conference | 2012 International Conference on Innovation, Management and Technology Research, ICIMTR 2012 |
---|---|
Country/Territory | Malaysia |
City | Malacca |
Period | 21/05/12 → 22/05/12 |
Keywords
- Categorization
- Corporate Sustainability Report
- Feature Selection
- Global Reporting Initiative
- Machine Learning
- Supervised Learning
- Text Ontology