hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Lexicon and hidden Markov model-based optimisation of the recognised Sinhala script
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).ORCID-id: 0000-0002-4929-1262
2006 (Engelska)Ingår i: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 27, nr 6, s. 696-705Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The Brahmi descended Sinhala script is used by 75% of the 18 million population in Sri Lanka. To the best of our knowledge, none of the Brahmi descended scripts used by hundreds of millions of people in South Asia, possess commercial OCR products. In the process of implementation of an OCR system for the printed Sinhala script which is easily adoptable to similar scripts [Premaratne, L., Assabie, Y., Bigun, J., 2004. Recognition of modification-based scripts using direction tensors. In: 4th Indian Conf. on Computer Vision, Graphics and Image Processing (ICVGIP2004), pp. 587–592]; a segmentation-free recognition method using orientation features has been proposed in [Premaratne, H.L., Bigun, J., 2004. A segmentation-free approach to recognise printed Sinhala script using linear symmetry. Pattern Recognition 37, 2081–2089]. Due to the limitations in image analysis techniques the character level accuracy of the results directly produced by the proposed character recognition algorithm saturates at 94%. The false rejections from the recognition algorithm are initially identified only as ‘missing character positions’ or ‘blank characters’. It is necessary to identify suitable substitutes for such ‘missing character positions’ and optimise the accuracy of words to an acceptable level. This paper proposes a novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script. The proposed method could easily be extended with minor changes to other modification-based scripts consisting of confusing characters. The word-level accuracy which was at 81.5% is improved to 88.5% by the proposed optimisation algorithm.

Ort, förlag, år, upplaga, sidor
Amsterdam: Elsevier, 2006. Vol. 27, nr 6, s. 696-705
Nyckelord [en]
Optical character recognition, Hidden Markov models, State transition matrix, Confusion matrix, Word optimisation
Nationell ämneskategori
Teknik och teknologier
Identifikatorer
URN: urn:nbn:se:hh:diva-1316DOI: 10.1016/j.patrec.2005.10.009ISI: 000236286700023Scopus ID: 2-s2.0-32844473524Lokalt ID: 2082/1695OAI: oai:DiVA.org:hh-1316DiVA, id: diva2:238534
Tillgänglig från: 2008-04-15 Skapad: 2008-04-15 Senast uppdaterad: 2018-03-23Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Premaratne, Hemakumar LalithJärpe, EricBigun, Josef

Sök vidare i DiVA

Av författaren/redaktören
Premaratne, Hemakumar LalithJärpe, EricBigun, Josef
Av organisationen
Halmstad Embedded and Intelligent Systems Research (EIS)
I samma tidskrift
Pattern Recognition Letters
Teknik och teknologier

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 244 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf