hh.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Ethiopic Document Image Database for Testing Character Recognition Systems
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).ORCID-id: 0000-0002-4929-1262
2006 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

In this paper we describe the acquisition and content of a large database of Ethiopic documents for testing and evaluating character recognition systems. The Ethiopic Document Image Database (EDIDB) contains documents written in Amharic and Geez languages. The database was built from a variety of documents such as printouts, books, newspapers, and magazines. Documents written in various font types, sizes and styles were included in the database. Degraded and poor quality documents were also included in the database to represent the real life situation. A total of 1,204 pages were scanned at a resolution of 300 dpi and saved as grayscale images of JPEG format. We also describe an evaluation protocol for standardizing the comparison of recognition systems and their results. The database is made available to the research community through http://www.hh.se/staff/josef/.

sted, utgiver, år, opplag, sider
Halmstad: Halmstad University , 2006. , s. 6
HSV kategori
Identifikatorer
URN: urn:nbn:se:hh:diva-14930OAI: oai:DiVA.org:hh-14930DiVA, id: diva2:408389
Tilgjengelig fra: 2011-04-04 Laget: 2011-04-04 Sist oppdatert: 2018-03-23bibliografisk kontrollert
Inngår i avhandling
1. Multifont recognition System for Ethiopic Script
Åpne denne publikasjonen i ny fane eller vindu >>Multifont recognition System for Ethiopic Script
2006 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In this thesis, we present a general framework for multi-font, multi-size and multi-style Ethiopic character recognition system. We propose structural and syntactic techniques for recognition of Ethiopic characters where the graphically comnplex characters are represented by less complex primitive structures and their spatial interrelationships. For each Ethiopic character, the primitive structures and their spatial interrelationships form a unique set of patterns.

The interrelationships of primitives are represented by a special tree structure which resembles a binary search tree in the sense that it groups child nodes as left and right, and keeps the spatial position of primitives in orderly manner. For a better computational efficiency, the primitive tree is converted into string pattern using in-order traversal, which generates a base of the alphabet that stores possibly occuring string patterns for each character. The recognition of characters is then achieved by matching the generated patterns with each pattern in a stored knowledge base of characters.

Structural features are extracted using direction field tensor, which is also used for character segmentation. In general, the recognition system does not need size normalization, thinning or other preprocessing procedures. The only parameter that needs to be adjusted during the recognition process is the size of Gaussian window which should be chosen optimally in relation to font sizes. We also constructed an Ethiopic Document Image Database (EDIDB) from real life documents and the recognition system is tested with respect to variations in font type, size, style, document skewness and document type. Experimental results are reported.

sted, utgiver, år, opplag, sider
Göteborg: Department of Signals and Systems, Chalmers University of Technology, 2006. s. 46
Serie
Technical report ; 2006:21
Emneord
Ethiopic character recognition, OCR, Multifont recognition, Amharic, Direction fields, Structural and syntactic pattern recognition
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-1978 (URN)2082/2373 (Lokal ID)2082/2373 (Arkivnummer)2082/2373 (OAI)
Presentation
(engelsk)
Veileder
Tilgjengelig fra: 2008-09-29 Laget: 2008-09-29 Sist oppdatert: 2018-03-23bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Fritt tillgänglig via Högskolan i Halmstads webbplats

Personposter BETA

Assabie, YaregalBigun, Josef

Søk i DiVA

Av forfatter/redaktør
Assabie, YaregalBigun, Josef
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 413 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf