hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Recognition of Modification-based Scripts Using Direction Tensors
Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE).
Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE).ORCID iD: 0000-0002-4929-1262
2004 (English)In: Proc. 4th Indian Conference on Computer Vision, Graphics and Image Processing, 2004, p. 587-592Conference paper, Published paper (Other academic)
Abstract [en]

The research on the OCR technology for the Latin-based scripts has been successful in achieving the status of image scanners with built-in OCR facility. But, a majority of modification-based scripts such as Brahmi descended South Asian or Ethiopic scripts are still progressing to achieve this status. This indicates the difficulties in adopting the recognition methods that have been proposed so far for the Latin-based scripts to modification-based scripts. In this paper we propose a novel method that can be adopted to recognise modification-based printed scripts consisting of a large character set, without the need for prior segmentation. The major strength of this method is that, the direction features that are used as the main principle for recognition, are further used in the separation of confusing characters, detection of skew angle, segmentation of script and graphic objects which substantially improves the computation efficiency. Algorithms developed initially for the Brahmi descended Sinhala script used in Sri Lanka, have been extended successfully for the Ethiopic script which has been evolved in a different geographical region, yielding consistently accurate results. Together, these two scripts are used by a population of ninety million.

Place, publisher, year, edition, pages
2004. p. 587-592
Keywords [en]
OCR technology, language, scripts
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hh:diva-14923ISBN: 81-7764-707-5 (print)OAI: oai:DiVA.org:hh-14923DiVA, id: diva2:408399
Conference
4th Indian Conference on Computer Vision, Graphics and Image, December 16-18, 2004, Kolkata, India
Available from: 2011-04-04 Created: 2011-04-04 Last updated: 2018-03-23Bibliographically approved
In thesis
1. Multifont recognition System for Ethiopic Script
Open this publication in new window or tab >>Multifont recognition System for Ethiopic Script
2006 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, we present a general framework for multi-font, multi-size and multi-style Ethiopic character recognition system. We propose structural and syntactic techniques for recognition of Ethiopic characters where the graphically comnplex characters are represented by less complex primitive structures and their spatial interrelationships. For each Ethiopic character, the primitive structures and their spatial interrelationships form a unique set of patterns.

The interrelationships of primitives are represented by a special tree structure which resembles a binary search tree in the sense that it groups child nodes as left and right, and keeps the spatial position of primitives in orderly manner. For a better computational efficiency, the primitive tree is converted into string pattern using in-order traversal, which generates a base of the alphabet that stores possibly occuring string patterns for each character. The recognition of characters is then achieved by matching the generated patterns with each pattern in a stored knowledge base of characters.

Structural features are extracted using direction field tensor, which is also used for character segmentation. In general, the recognition system does not need size normalization, thinning or other preprocessing procedures. The only parameter that needs to be adjusted during the recognition process is the size of Gaussian window which should be chosen optimally in relation to font sizes. We also constructed an Ethiopic Document Image Database (EDIDB) from real life documents and the recognition system is tested with respect to variations in font type, size, style, document skewness and document type. Experimental results are reported.

Place, publisher, year, edition, pages
Göteborg: Department of Signals and Systems, Chalmers University of Technology, 2006. p. 46
Series
Technical report ; 2006:21
Keywords
Ethiopic character recognition, OCR, Multifont recognition, Amharic, Direction fields, Structural and syntactic pattern recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-1978 (URN)2082/2373 (Local ID)2082/2373 (Archive number)2082/2373 (OAI)
Presentation
(English)
Supervisors
Available from: 2008-09-29 Created: 2008-09-29 Last updated: 2018-03-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Assabie, YaregalBigun, J.

Search in DiVA

By author/editor
Assabie, YaregalBigun, J.
By organisation
School of Information Science, Computer and Electrical Engineering (IDE)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 186 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf