Efficient segmentation of sub-words within handwritten arabic words

Faraz Khan, Ahmed Bouridane, Fouad Khelifi, Resheed Almotaeryi, Somaya Al-Maadeed

Research output: Contribution to conferencePaperpeer-review

4 Citations (Scopus)

Abstract

Segmentation is considered as a core step for any recognition or classification method and for the text within any document to be effectively recognized it must be segmented accurately. In this paper a text and writer independent algorithm for the segmentation of sub-words in Arabic words has been presented. The concept is based around the global binarization of an image at various thresholding levels. When each sub-word or Part of Arabic Word (PAW) within the image being investigated is processed at multiple threshold levels a cluster graph is obtained where each cluster represents the individual sub-words of that word. Once the clusters are obtained the task of segmentation is managed by simply selecting the respective cluster automatically which is achieved using the 95% confidence interval on the processed data generated by the accumulated graph. The presented algorithm was tested on 537 randomly selected words from the AHTID/MW database and the results showed that 95.3% of the sub-words or PAW were correctly segmented and extracted. The proposed method has shown considerable improvement over the projection profile method which is commonly used to segment sub-words or PAW.
Original languageEnglish
DOIs
Publication statusPublished - Nov 2014
EventCodit'14 - 2nd International Conference on Control, Decision and Information Technologies - Metz, France
Duration: 1 Nov 2014 → …

Conference

ConferenceCodit'14 - 2nd International Conference on Control, Decision and Information Technologies
Period1/11/14 → …

Keywords

  • handwritten character recognition
  • image segmentation
  • text analysis
  • word processing

Fingerprint

Dive into the research topics of 'Efficient segmentation of sub-words within handwritten arabic words'. Together they form a unique fingerprint.

Cite this